Weboob 1.0 is out

Added by Florent Fourcot over 2 years ago

After more than four years of work, we are very proud to announce the first stable release of the Weboob project.

We think that the new browser is ready to use, and stable enough to not change the API during the 1.X branch lifetime. Web scraping is a repetitive task, but we believe that Weboob can now improve that. With all the tools of the Browser (ListElement, Filters, etc) a module can be written in only few lines, we factorize the boring scraping part.

Since the 0.j release, the CapHousing part has been improved. A new option is available for the condition options. We provide a better management of the encoding in applications, in the Weboob class allows now a direct access to the modules, like a python dictionary. There are several new modules, one of them is for Citibank.

If you are using out of tree tools based on Weboob, please upgrade very carefully. We changed many paths in the API, and the first browser has been deprecated.

Credits

General

API Big-Bang

  • Rename BaseBackend to Module
  • Rename BACKEND to MODULE
  • Rename backend.py to module.py
  • Rename BaseApplication to Application
  • Rename CapBase to Capability
  • Rename BasePage to Page
  • Rename BaseBrowser to Browser
  • Move CleanHTML to html filters
  • Remove * imports in filters
  • Move weboob.tools.browser2 to weboob.browser
  • Move weboob.tools.exceptions to weboob.exceptions
  • Move weboob.tools.browser to weboob.deprecated.browser
  • Move weboob.tools.parsers to weboob.deprecated.browser.parsers
  • Move weboob.tools.mech to weboob.deprecated.mech
  • Remove the "backend" result in do() calls

Core

  • Catch the proper exception for missing icon
  • Replace usage of os.mknod() by os.open(O_CREAT)
  • Use the print() function everywhere
  • WebNip.iter_backends takes a new optional parameter 'module'
  • Add getitem on WebNip to get a loaded backend by name
  • Create PrintProgress class instead of using IProgress as default one
  • Allow to load a module with config=None
  • A lot of pep8 fixes

Capabilities

  • Let get_currency guess US$ means USD
  • Prevent mess when copying BaseObject instances

Capabilities: bank

  • Add Investment.description field
  • Add Emirati Dirham AEB currency

Capabilities: calendar

  • Add Conference event category

Capabilities: parcel

  • Add parcelnotfound exception

Capabilities: housing

  • Add and handle in flatboob house_types field
  • Add and handle in leboncoin a new house type: UNKNOWN
  • Adding a url field in housing capability and management of it in flatboob

Applications

  • Add a new debug level (-dd option)
  • Add a " LIMIT " keyword in conditions
  • Centralize encoding guesses, default to UTF-8 (#1352)
  • Use class attributes as much as possible for application output
  • Define std* in the proper class
  • Handle datetime in condition argument
  • os.isatty is now forbidden (as stream.fileno() is not implemented by StringIO)
  • logging: Output to stderr, not stdout
  • logging: better colors

Applications: repl

  • When getting an object, if at least one is found, display errors but correctly return the found object

Applications: boobmsg

  • Fix "show" for threads

Applications: flatboob

  • Ask for query.type in flatboob
  • Add load command
  • Fix bug type_of_good does not exist anymore

Applications: Qflatboob

  • Manage count to avoid problems during pagination

Applications: pastoob

  • Add an option to set a custom file encoding

Applications: parceloob

  • Catch parcelnotfound by untracking

Applicatitons: traveloob

  • Fix: crash if departure time is not available

Applications: videoob

  • Set non verbose mode for wget when downloading m3u8 (fix #1643)

Applications: weboobcfg

  • Return correct exit status code for enable and disable commands

Applications: webcontentedit

  • Better checks for vim usage

Browser

  • Add a way to asynchronously handle requests and pages
  • Backporting mergin_hook to support hook's requests in wheezy
  • HTMLPage checks the inner charset and parse again document if it is not the same than Content-Type HTTP header
  • Add a trivial android profile
  • Add has-class xpath function

Browser: filters

  • Add debug informations
  • Raise ParseError only with None/NotAvailable/NotLoaded values, not with empty strings
  • Add a way to customize sign handling for CleanDecimal
  • Regexp: let template be a callable
  • Add some javascript dedicated filters
  • Add an nth parameter to Regexp filter
  • Add str to _Filters

Browser: elements

  • handle_loaders into AbstractElement
  • Ability to select an ItemElement

DeprecatedBrowser

  • Fix: certificate check on servers which don't allow SSLv3

Documentation

  • Update to the new API
  • Show base classes in documentation

Tools

  • American amount to decimal conversion (ref #1641)
  • PDF decompression function (ref #1641)
  • Regexp-based tokenizer (ref #1641)

Tools: html2text

  • Use the class if possible

Tools: make_man

  • Copyright on top of file

Tools: newsfeed

  • No need for workaround with feedparser>=5.1

Tools: tests

  • Allow changing modules path and adding to PYTHONPATH

Tools: pyflakes

  • Add test to prevent usage of prints in modules
  • Detect deprecated has_key function

Tools: values

  • Ability to set value to an empty string if it is available in choices

Packaging: setup

  • Add futures, avoid Py2-only libs under Py3
  • Use Python3-compatible syntax in debpydep
  • Add ignore dirs for flake8

Contrib: boobot

  • Add a check_twitter method

Contrib: videoobmc

  • Force relative imports

Contrib: weboob-generic (munin script)

  • Add category option

Modules: alloresto

  • Fix: website changes (enable https and fix the form xpath)

Modules: arretsurimages

  • Fix: site changed

Modules: aum

  • Remove useless features of module that don't work anymore
  • Enable https
  • Import exceptions from core

Modules: banqueaccord

  • Support canceled transactions
  • Increase timeout because of slow website

Modules: biplan

  • Use the Python SkipTest if possible

Modules: boursorama

  • Remove prints

Modules: bred

  • Limit length of password
  • Remove lot of old code and keep card transactions in separate card accounts
  • Translating accnum description

Modules: carrefourbanque

  • Do not try to parse useless accounts (closes #1432)
  • Fix: login form is now the second form on the page

Modules: cic

  • Fix: new certificate hash
  • Set an unique id

Modules: cmso

  • Fix: parsing of transaction amounts (strip nbsp)
  • Fix: parsing of huge account balances

Modules: colissimo

  • Fix: return the real error message, not "label"
  • Raise ParcelNotFound in colissimo
  • Return the fullid of not found parcel
  • Upgrade to browser2

Modules: cragr

  • Remove prints
  • Add a regexp for checking password

Modules: creditcooperatif

  • Add unique id to creditcooperatif (perso)
  • Update regexps
  • Use find object
  • Upgrade to browser2 (perso)

Modules: creditmutuel

  • Fix: do not lock browser2 anymore (#1635)

Modules: dresdenwetter

  • Add the debug decorator to dresdenwetter filter

Modules: europarl

  • Remove prints

Modules: feedly

  • Use the Python SkipTest if possible
  • Fix: unicode warning

Modules: fortuneo

  • Do exactly the same thing than js to always get accounts list

Modules: gazelle

  • Fix: infinite loop on fail login, and fix error message lookup

Modules: gdcvault

  • Remove prints

Modules: grooveshark

  • Fix: bug when Year field is empty in grooveshark json
  • Use the Python SkipTest if possible

Modules: hds

  • Convert to browser2 and fix it

Modules: hellobank

  • Remove prints

Modules: hybride

  • Use the Python SkipTest if possible

Modules: imgur

  • Restrict URL to imgur domains

Modules: ing

  • Fix: add an Index for some accounts...
  • Add a test to detect loops in the history
  • Fix: testing of saving accounts
  • Fix: crash on coming operations
  • Add loggedPage on bourse.ingdirect.fr
  • Add a @ckeck_bourse decorator for a clean redirect

Modules: kickass

  • Fix: parsing of torrent titles

Modules: lacentrale

  • Fix: deprecated has_key

Modules: lcl

  • Always raise instances of NotImplementedError

Modules: minutes20

  • Fix: parsing insolite pages

Modules: nettokom

  • Add tests

Modules: okc

  • Remove prints

Modules: oney

  • Add a favicon
  • Add missing symbols for the virtual keyboard
  • Fix: do not crash on months with no transactions

Modules: ouifm

  • Fix: new radio names

Modules: ovs

  • Force relative import

Modules: pap

  • Adapt to browser2
  • Exclude adverts from other websites
  • Fix: image retrieving

Modules: pastebin

  • Fix: crash on spam page

Modules: paypal

  • Use AmericanTransaction.decimal_amount in PayPal module. Part of #1641

Modules: quvi

  • Force relative import

Modules: seloger

  • Adapt to browser2
  • Fix: pagination
  • Fix: obj filling

Modules: societegenerale

  • Remove prints
  • PIL is a global requirement, remove the check

Modules: tinder

  • Fix: auth on tinder by correctly set the User-Agent header

Modules: transilien

  • Fix: crash on late departures

Modules: twitter

  • Fix storage system
  • Fix purge system
  • Do not import Browser1 exception

Modules: unsee

  • Restrict URL to unsee domains

Modules: vlille

  • Better description

Modules: wellsfargo

  • Fix: compatibility with old versions of mechanize
  • Add a favicon
  • Rewrite Wells Fargo with browser2 (closes #1624)
  • Improved Wells Fargo module stability.
  • Use AmericanTransaction.decimal_amount, closest_date, decompress_pdf and ReTokenizer in WellsFargo module. Part of #1641

Modules: youjizz

  • Fix: fillobj on video thumbnail

Modules: youtube

  • Update part of the js interpreter

Comments