Bug #1855

[francetelevisions] failed ot download URL - Regex broken ?

Added by Mathieu Roche almost 2 years ago. Updated almost 2 years ago.

Status:Resolved Start:2015-04-14
Priority:Normal Due date:
Assigned to:- % Done:

100%

Category:- Spent time: -
Target version:-
Module: Branch:

Description

Hello,

I've the following error when I try to download a video from Pluzz.

videoob --backends francetelevisions

videoob> logging debug
videoob> download http://pluzz.francetv.fr/videos/faites_entrer_l_accuse_,121019373.html

2015-04-14 22:04:35,625:DEBUG:bcall:1.0:bcall.py:81:backend_process <Backend 'francetelevisions'>: Calling function <bound method Videoob._do_complete of <weboob.applications.videoob.videoob.Videoob object at 0x7fa41504fc50>>
http://pluzz.francetv.fr/videos/faites_entrer_l_accuse_,121019373.html
2015-04-14 22:04:35,638:INFO:urllib3.connectionpool:1.0:connectionpool.py:188:_new_conn Starting new HTTP connection (1): pluzz.francetv.fr
2015-04-14 22:04:35,778:DEBUG:urllib3.connectionpool:1.0:connectionpool.py:362:_make_request "GET /videos/faites_entrer_l_accuse_,121019373.html HTTP/1.1" 200 9373
2015-04-14 22:04:35,797:DEBUG:backend.francetelevisions.browser:1.0:browsers.py:596:internal_callback Handle http://pluzz.francetv.fr/videos/faites_entrer_l_accuse_,121019373.html with VideoListPage
2015-04-14 22:04:35,800:WARNING:get_last_video:1.0:elements.py:272:handle_attr Attribute date raises RegexpError("Unable to match 1st .+(\\d{2}-\\d{2}-\\d{2}.+\\d{1,2}h\\d{1,2}).+ in u'Magazine'",)
2015-04-14 22:04:35,800:DEBUG:bcall:1.0:bcall.py:87:backend_process <Backend 'francetelevisions'>: Called function <bound method Videoob._do_complete of <weboob.applications.videoob.videoob.Videoob object at 0x7fa41504fc50>> raised an error: RegexpError("Unable to match 1st .+(\\d{2}-\\d{2}-\\d{2}.+\\d{1,2}h\\d{1,2}).+ in u'Magazine'",)
Bug(francetelevisions): Unable to match 1st .+(\d{2}-\d{2}-\d{2}.+\d{1,2}h\d{1,2}).+ in u'Magazine'
=== [  0%] Getting http://updates.weboob.org/1.0/main/
2015-04-14 22:04:35,827:INFO:urllib3.connectionpool:1.0:connectionpool.py:188:_new_conn Starting new HTTP connection (1): updates.weboob.org
2015-04-14 22:04:36,026:DEBUG:urllib3.connectionpool:1.0:connectionpool.py:362:_make_request "GET /1.0/main/modules.list HTTP/1.1" 200 35970
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/weboob/core/bcall.py", line 83, in backend_process
    result = function(backend, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/weboob/tools/application/base.py", line 305, in _do_complete
    res = getattr(backend, function)(*args, **kwargs)
  File "/home/mathieu/.local/share/weboob/modules/1.0/francetelevisions/module.py", line 46, in get_video
    return self.browser.get_video_from_url(m.group(1))
  File "/home/mathieu/.local/share/weboob/modules/1.0/francetelevisions/browser.py", line 40, in get_video_from_url
    video = self.videos_list_page.go(program=url).get_last_video()
  File "/usr/lib/python2.7/dist-packages/weboob/browser/elements.py", line 47, in inner
    return klass(self)(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/weboob/browser/elements.py", line 245, in __call__
    for obj in self:
  File "/usr/lib/python2.7/dist-packages/weboob/browser/elements.py", line 258, in __iter__
    self.handle_attr(attr, getattr(self, 'obj_%s' % attr))
  File "/usr/lib/python2.7/dist-packages/weboob/browser/elements.py", line 269, in handle_attr
    value = self.use_selector(func, key=key)
  File "/usr/lib/python2.7/dist-packages/weboob/browser/elements.py", line 79, in use_selector
    value = func(self)
  File "/usr/lib/python2.7/dist-packages/weboob/browser/filters/standard.py", line 173, in __call__
    return self.filter(self.select(self.selector, item, key=self._key, obj=self._obj))
  File "/usr/lib/python2.7/dist-packages/weboob/browser/filters/standard.py", line 166, in select
    return selector(item)
  File "/usr/lib/python2.7/dist-packages/weboob/browser/filters/standard.py", line 173, in __call__
    return self.filter(self.select(self.selector, item, key=self._key, obj=self._obj))
  File "/usr/lib/python2.7/dist-packages/weboob/browser/filters/standard.py", line 135, in print_debug
    res = function(self, value)
  File "/usr/lib/python2.7/dist-packages/weboob/browser/filters/standard.py", line 509, in filter
    return self.default_or_raise(RegexpError(msg))
  File "/usr/lib/python2.7/dist-packages/weboob/browser/filters/standard.py", line 91, in default_or_raise
    raise exception
RegexpError: Unable to match 1st .+(\d{2}-\d{2}-\d{2}.+\d{1,2}h\d{1,2}).+ in u'Magazine'

I have changed in pages.py:

obj_date = DateTime(Regexp(CleanText('//div[@id="diffusion-info"]/div/div/span/span[1]',

With:
obj_date = DateTime(Regexp(CleanText('//div[@id="diffusion-info-detail"]/div/h2',

It's seems to be OK after.

There is also an error when I made a search. I'll try to find the right regex.

Thanks.

History

Updated by Mathieu Roche almost 2 years ago

Ouch, maybe I'm missing something but it's seems this is the same regex used when I made a search or direct download.

Updated by Benjamin CARTON almost 2 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF