Feature #1642

Citibank module

Added by Oleg Plakhotniuk over 2 years ago. Updated over 2 years ago.

Status:Resolved Start:2014-10-11
Priority:Normal Due date:
Assigned to:- % Done:

100%

Category:Modules / New Spent time: -
Target version:1.0
Module:citibank Branch:

Description

Citibank module (https://online.citibank.com).

Account balance, transactions history (both recent and in PDF statements).

Only a single credit card account is currently supported.

Laurent, this patch depends on #1641, and is also computed against your branch.

Here I also use a bunch of external dependencies and I'm not sure what's the accepted policy of handling them in Weboob. For now I just listed them in doctext.

Let me know if you have any suggestions. Thanks!


Related issues

blocked by weboob - Feature #1641: Few new utility functions Resolved 2014-10-11

History

Updated by Laurent Bachelier over 2 years ago

selenium is kind of a big thing, as we usually try to make weboob light in dependencies and work on headless machines.

though your reasoning in the comments is sound.

we already have unlisted dependencies for some modules (python-gdata for youtube).

I'm not sure the modules.list can be built without the required dependencies, that's something to check for / work around.

Updated by Laurent Bachelier over 2 years ago

  • Assigned to deleted (Laurent Bachelier)
  • Target version set to 1.0

Updated by Oleg Plakhotniuk over 2 years ago

  • File deleted (0004-Citibank-module-https-online.citibank.com.patch)

Updated by Oleg Plakhotniuk over 2 years ago

  • File 0004-Citibank-module-https-online.citibank.com.patch added

selenium is kind of a big thing, as we usually try to make weboob light in dependencies and work on headless machines. though your reasoning in the comments is sound.

Yeah, I also hated to introduce selenium, but the code would've been a bigger mess and pain to maintain if I continued to reverse-engineer their javascripts...

we already have unlisted dependencies for some modules (python-gdata for youtube).

OK, got it. Perhaps unlisted dependencies are not that big of a problem if they're easy to find in a code (they are now) and if they were handled by whoever packages Weboob for various Linux distros.

I'm not sure the modules.list can be built without the required dependencies, that's something to check for / work around.

Good point about modules list generation. It indeed didn't work w/o dependencies. I worked it around by moving imports inside functions.

I also rebased the patch on stable branch and added an icon.

Updated by Oleg Plakhotniuk over 2 years ago

BTW, this module doesn't require to install whole Selenium thing. It only needs a python bindings for web driver, which are relatively light. It also can run on headless machine, thanks to Xvfb (even a display adapter isn't required). The only heavy things are FireFox itself and Xvfb. And of course it's much slower and memory-hungry than raw HTTP requests...

Updated by Jean-Philippe Dutreve over 2 years ago

With selenium or with another framework, the big choice is : do we have to re-engineer javascript code (fragile & time consuming) or can we use a python binding javscript interpreter (easier to develop but small runtime penalty depending on resources) ?

This is a question I asked some time ago by proposing another (more light) framework like selenium, but Requests was preferred for browser2.

browser3 as an option for some module developers ?

Updated by Oleg Plakhotniuk over 2 years ago

  • File deleted (0004-Citibank-module-https-online.citibank.com.patch)

Updated by Oleg Plakhotniuk over 2 years ago

With selenium or with another framework, the big choice is : do we have to re-engineer javascript code (fragile & time consuming) or can we use a python binding javscript interpreter (easier to develop but small runtime penalty depending on resources) ?

As I see it, it's a trade off between code size and performance. Code size means maintenance cost, which is where most of a project budget usually goes. Performance means the ability to do the job within given resource constraints, and also affects the cost of operation. Usually maintenance cost is bigger, so it seems to me that optimizing code size while fitting within performance requirements is a good rule of thumb.

So, I'd prefer re-engineering if both solutions are of comparable code size, and interpreting if it makes the solution significantly shorter.

browser3 as an option for some module developers ?

Good idea, as for me. Start with browser2, and if things went awry, fallback to heavy native browser-based solution as a last resort.

Updated by Romain Bignon over 2 years ago

Jean-Philippe Dutreve wrote:

This is a question I asked some time ago by proposing another (more light) framework like selenium, but Requests was preferred for browser2.

What kind of light frameworks have you proposed? I'm aware to add another kind of browser which supports javascript execution, the only reason why we have chosen Requests is because, until now, there were no reasons to use more resources to interpret js, and as you can see, most browser2 modules are readable with less size of code.

Updated by Jean-Philippe Dutreve over 2 years ago

https://realpython.com/blog/python/headless-selenium-testing-with-python-and-phantomjs/

Currently, a native GhostDriver/Ghost.py is provided.

One main advantage of using PhantomJS over a browser is that tests are usually much faster and there is no need to use UI/X11, truly headless.

The main reason you gave me for not using this was for security purpose : "malicious code could be executed". I still think it's not a practical issue compared to benefits.

Updated by Oleg Plakhotniuk over 2 years ago

Currently, a native GhostDriver/Ghost.py is provided.

Hehe, I also considered using PhantomJs with GhostDriver or not, for this module. The reason why I picked WebDriver + Firefox + Xvfb is because of PDF statements downloads. PhantomJs is notorious for the lack of download support (https://github.com/ariya/phantomjs/issues/10052). There are some workarounds though, but I decided not to be a little bit pregnant. :-)

One main advantage of using PhantomJS over a browser is that tests are usually much faster and there is no need to use UI/X11, truly headless.

There's a report comparing performance of various solutions (http://stackoverflow.com/questions/14099770/casperjs-phantomjs-vs-selenium/18394282#18394282). Author says there's no difference in speed between full-blown Chromium and PhantomJs. Although PhantomJs indeed has fewer dependencies (no need for X) and is somewhat smaller itself.

The problem is that PhantomJs essentially is yet another (non-mainstream) browser, besides Chromium, with its own bugs and compatibility issues (it was forked from WebKit a while ago and has accumulated some differences over time). In reality, websites are usually designed for mainstream browsers. So, a mainstream browser has better chances than PhantomJs to interact correctly with an arbitrary website.

I'd pick PhantomJs for interacting with a website I have control over. For an arbitrary website that I don't have access to, it seems that general-purpose browser is a better idea. Besides, comparing to pure HTTP requests, both PhantomJs and a mainstream browser are heavy-weight dinosaurs, but mainstream browser has better bang-for-buck ratio I think.

The main reason you gave me for not using this was for security purpose : "malicious code could be executed". I still think it's not a practical issue compared to benefits.

Malicious code is an excellent point. Also there's a broad avenue for man-in-the-middle attacks, if we don't check HTTPS certificates in WebDriver. Though I cannot immediately think of any other way to handle Javascript-heavy websites...

Malicious code execution can be circumvented by running Weboob inside an isolated environment (Virtual machine, Linux containers). Personally, I use Docker on Arch Linux, even for regular internet browsing. Theoretically, in this case, there's no way for malware to "infect" the machine. Although it doesn't address the man-in-the-middle attack scenario and doesn't prevent the malware from stealing your banking account credentials from Weboob config.

Updated by Jean-Philippe Dutreve over 2 years ago

Perhaps you could use another ligth tool for downloading PDF (request, wget, curl) ?

PhantomJS is twice as speed as Firefox ! Not that bad.

"Weboob is a collection of applications able to interact with websites, without requiring the user to open them in a browser"

I do think that the need of X and a visual browser defeats the purpose of Weboob.

I see it as a command line tool only (web shell ?). I think it's more important than performance or security purpose. Not talking about forcing users to install Firefox ...

Unfortunately, I will NOT be able to use this Citibank module because I have no GUI/X11 on my servers (true headless env, as many servers).

Could be interesting to see any real issues with PhantomJS on some websites (pdf downloading excluded).

"doesn't prevent the malware from stealing your banking account credentials from Weboob config"
Well, it's still a good day for a hacker ;-) And your method is not mainstream.

In resume, I care about true headless (ligth environment, CLI approach) and javascript interpreter (lower maintenance costs).

Updated by Florent Fourcot over 2 years ago

  • Status changed from To merge to Resolved

Also available in: Atom PDF