mirror of
https://github.com/ytdl-org/youtube-dl.git
synced 2024-11-21 17:51:51 +00:00
Merge branch 'master' into oreilly-login
This commit is contained in:
commit
899af2ef61
21 changed files with 812 additions and 346 deletions
24
README.md
24
README.md
|
@ -918,7 +918,7 @@ Either prepend `https://www.youtube.com/watch?v=` or separate the ID from the op
|
|||
|
||||
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
|
||||
|
||||
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [Get cookies.txt](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox).
|
||||
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [Get cookies.txt LOCALLY](https://chrome.google.com/webstore/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox).
|
||||
|
||||
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
|
||||
|
||||
|
@ -1408,7 +1408,11 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
|
|||
|
||||
# BUGS
|
||||
|
||||
Bugs and suggestions should be reported at: <https://github.com/ytdl-org/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](https://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
|
||||
Bugs and suggestions should be reported in the issue tracker: <https://github.com/ytdl-org/youtube-dl/issues> (<https://yt-dl.org/bug> is an alias for this). Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](https://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
|
||||
|
||||
## Opening a bug report or suggestion
|
||||
|
||||
Be sure to follow instructions provided **below** and **in the issue tracker**. Complete the appropriate issue template fully. Consider whether your problem is covered by an existing issue: if so, follow the discussion there. Avoid commenting on existing duplicate issues as such comments do not add to the discussion of the issue and are liable to be treated as spam.
|
||||
|
||||
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
|
||||
```
|
||||
|
@ -1428,17 +1432,17 @@ $ youtube-dl -v <your command line>
|
|||
|
||||
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
|
||||
|
||||
Please re-read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist):
|
||||
Finally please review your issue to avoid various common mistakes (you can and should use this as a checklist) listed below.
|
||||
|
||||
### Is the description of the issue itself sufficient?
|
||||
|
||||
We often get issue reports that we cannot really decipher. While in most cases we eventually get the required information after asking back multiple times, this poses an unnecessary drain on our resources. Many contributors, including myself, are also not native speakers, so we may misread some parts.
|
||||
We often get issue reports that are hard to understand. To avoid subsequent clarifications, and to assist participants who are not native English speakers, please elaborate on what feature you are requesting, or what bug you want to be fixed.
|
||||
|
||||
So please elaborate on what feature you are requesting, or what bug you want to be fixed. Make sure that it's obvious
|
||||
Make sure that it's obvious
|
||||
|
||||
- What the problem is
|
||||
- How it could be fixed
|
||||
- How your proposed solution would look like
|
||||
- How your proposed solution would look
|
||||
|
||||
If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
|
||||
|
||||
|
@ -1448,14 +1452,14 @@ If your server has multiple IPs or you suspect censorship, adding `--call-home`
|
|||
|
||||
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `https://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `https://www.youtube.com/`) is *not* an example URL.
|
||||
|
||||
### Is the issue already documented?
|
||||
|
||||
Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/ytdl-org/youtube-dl/search?type=Issues) of this repository. Initially, at least, use the search term `-label:duplicate` to focus on active issues. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
|
||||
|
||||
### Are you using the latest version?
|
||||
|
||||
Before reporting any issue, type `youtube-dl -U`. This should report that you're up-to-date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
|
||||
|
||||
### Is the issue already documented?
|
||||
|
||||
Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/ytdl-org/youtube-dl/search?type=Issues) of this repository. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
|
||||
|
||||
### Why are existing options not enough?
|
||||
|
||||
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
|
||||
|
|
64
devscripts/cli_to_api.py
Executable file
64
devscripts/cli_to_api.py
Executable file
|
@ -0,0 +1,64 @@
|
|||
#!/usr/bin/env python
|
||||
# coding: utf-8
|
||||
|
||||
from __future__ import unicode_literals
|
||||
|
||||
"""
|
||||
This script displays the API parameters corresponding to a yt-dl command line
|
||||
|
||||
Example:
|
||||
$ ./cli_to_api.py -f best
|
||||
{u'format': 'best'}
|
||||
$
|
||||
"""
|
||||
|
||||
# Allow direct execution
|
||||
import os
|
||||
import sys
|
||||
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
import youtube_dl
|
||||
from types import MethodType
|
||||
|
||||
|
||||
def cli_to_api(*opts):
|
||||
YDL = youtube_dl.YoutubeDL
|
||||
|
||||
# to extract the parsed options, break out of YoutubeDL instantiation
|
||||
|
||||
# return options via this Exception
|
||||
class ParseYTDLResult(Exception):
|
||||
def __init__(self, result):
|
||||
super(ParseYTDLResult, self).__init__('result')
|
||||
self.opts = result
|
||||
|
||||
# replacement constructor that raises ParseYTDLResult
|
||||
def ytdl_init(ydl, ydl_opts):
|
||||
super(YDL, ydl).__init__(ydl_opts)
|
||||
raise ParseYTDLResult(ydl_opts)
|
||||
|
||||
# patch in the constructor
|
||||
YDL.__init__ = MethodType(ytdl_init, YDL)
|
||||
|
||||
# core parser
|
||||
def parsed_options(argv):
|
||||
try:
|
||||
youtube_dl._real_main(list(argv))
|
||||
except ParseYTDLResult as result:
|
||||
return result.opts
|
||||
|
||||
# from https://github.com/yt-dlp/yt-dlp/issues/5859#issuecomment-1363938900
|
||||
default = parsed_options([])
|
||||
diff = dict((k, v) for k, v in parsed_options(opts).items() if default[k] != v)
|
||||
if 'postprocessors' in diff:
|
||||
diff['postprocessors'] = [pp for pp in diff['postprocessors'] if pp not in default['postprocessors']]
|
||||
return diff
|
||||
|
||||
|
||||
def main():
|
||||
from pprint import pprint
|
||||
pprint(cli_to_api(*sys.argv))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -35,13 +35,13 @@ class InfoExtractorTestRequestHandler(compat_http_server.BaseHTTPRequestHandler)
|
|||
assert False
|
||||
|
||||
|
||||
class TestIE(InfoExtractor):
|
||||
class DummyIE(InfoExtractor):
|
||||
pass
|
||||
|
||||
|
||||
class TestInfoExtractor(unittest.TestCase):
|
||||
def setUp(self):
|
||||
self.ie = TestIE(FakeYDL())
|
||||
self.ie = DummyIE(FakeYDL())
|
||||
|
||||
def test_ie_key(self):
|
||||
self.assertEqual(get_info_extractor(YoutubeIE.ie_key()), YoutubeIE)
|
||||
|
@ -62,6 +62,7 @@ class TestInfoExtractor(unittest.TestCase):
|
|||
<meta name="og:test1" content='foo > < bar'/>
|
||||
<meta name="og:test2" content="foo >//< bar"/>
|
||||
<meta property=og-test3 content='Ill-formatted opengraph'/>
|
||||
<meta property=og:test4 content=unquoted-value/>
|
||||
'''
|
||||
self.assertEqual(ie._og_search_title(html), 'Foo')
|
||||
self.assertEqual(ie._og_search_description(html), 'Some video\'s description ')
|
||||
|
@ -74,6 +75,7 @@ class TestInfoExtractor(unittest.TestCase):
|
|||
self.assertEqual(ie._og_search_property(('test0', 'test1'), html), 'foo > < bar')
|
||||
self.assertRaises(RegexNotFoundError, ie._og_search_property, 'test0', html, None, fatal=True)
|
||||
self.assertRaises(RegexNotFoundError, ie._og_search_property, ('test0', 'test00'), html, None, fatal=True)
|
||||
self.assertEqual(ie._og_search_property('test4', html), 'unquoted-value')
|
||||
|
||||
def test_html_search_meta(self):
|
||||
ie = self.ie
|
||||
|
|
|
@ -148,6 +148,7 @@ def generator(test_case, tname):
|
|||
try_rm(tc_filename)
|
||||
try_rm(tc_filename + '.part')
|
||||
try_rm(os.path.splitext(tc_filename)[0] + '.info.json')
|
||||
|
||||
try_rm_tcs_files()
|
||||
try:
|
||||
try_num = 1
|
||||
|
@ -213,7 +214,15 @@ def generator(test_case, tname):
|
|||
# First, check test cases' data against extracted data alone
|
||||
expect_info_dict(self, tc_res_dict, tc.get('info_dict', {}))
|
||||
# Now, check downloaded file consistency
|
||||
# support test-case with volatile ID, signalled by regexp value
|
||||
if tc.get('info_dict', {}).get('id', '').startswith('re:'):
|
||||
test_id = tc['info_dict']['id']
|
||||
tc['info_dict']['id'] = tc_res_dict['id']
|
||||
else:
|
||||
test_id = None
|
||||
tc_filename = get_tc_filename(tc)
|
||||
if test_id:
|
||||
tc['info_dict']['id'] = test_id
|
||||
if not test_case.get('params', {}).get('skip_download', False):
|
||||
self.assertTrue(os.path.exists(tc_filename), msg='Missing file ' + tc_filename)
|
||||
self.assertTrue(tc_filename in finished_hook_called)
|
||||
|
|
|
@ -139,21 +139,16 @@ class TestJSInterpreter(unittest.TestCase):
|
|||
self.assertTrue(math.isnan(jsi.call_function('x')))
|
||||
|
||||
def test_Date(self):
|
||||
jsi = JSInterpreter('''
|
||||
function x() { return new Date('Wednesday 31 December 1969 18:01:26 MDT') - 0; }
|
||||
''')
|
||||
self.assertEqual(jsi.call_function('x'), 86000)
|
||||
|
||||
jsi = JSInterpreter('''
|
||||
function x(dt) { return new Date(dt) - 0; }
|
||||
''')
|
||||
self.assertEqual(jsi.call_function('x', 'Wednesday 31 December 1969 18:01:26 MDT'), 86000)
|
||||
|
||||
# date format m/d/y
|
||||
jsi = JSInterpreter('''
|
||||
function x() { return new Date('12/31/1969 18:01:26 MDT') - 0; }
|
||||
''')
|
||||
self.assertEqual(jsi.call_function('x'), 86000)
|
||||
self.assertEqual(jsi.call_function('x', '12/31/1969 18:01:26 MDT'), 86000)
|
||||
|
||||
# epoch 0
|
||||
self.assertEqual(jsi.call_function('x', '1 January 1970 00:00:00 UTC'), 0)
|
||||
|
||||
def test_call(self):
|
||||
jsi = JSInterpreter('''
|
||||
|
@ -445,7 +440,7 @@ class TestJSInterpreter(unittest.TestCase):
|
|||
self.assertIs(jsi.call_function('x'), None)
|
||||
|
||||
jsi = JSInterpreter('''
|
||||
function x() { let a=/,,[/,913,/](,)}/; return a; }
|
||||
function x() { let a=/,,[/,913,/](,)}/; "".replace(a, ""); return a; }
|
||||
''')
|
||||
attrs = set(('findall', 'finditer', 'flags', 'groupindex',
|
||||
'groups', 'match', 'pattern', 'scanner',
|
||||
|
@ -457,6 +452,31 @@ class TestJSInterpreter(unittest.TestCase):
|
|||
''')
|
||||
self.assertEqual(jsi.call_function('x').flags & ~re.U, re.I)
|
||||
|
||||
jsi = JSInterpreter(r'''
|
||||
function x() { let a="data-name".replace("data-", ""); return a }
|
||||
''')
|
||||
self.assertEqual(jsi.call_function('x'), 'name')
|
||||
|
||||
jsi = JSInterpreter(r'''
|
||||
function x() { let a="data-name".replace(new RegExp("^.+-"), ""); return a; }
|
||||
''')
|
||||
self.assertEqual(jsi.call_function('x'), 'name')
|
||||
|
||||
jsi = JSInterpreter(r'''
|
||||
function x() { let a="data-name".replace(/^.+-/, ""); return a; }
|
||||
''')
|
||||
self.assertEqual(jsi.call_function('x'), 'name')
|
||||
|
||||
jsi = JSInterpreter(r'''
|
||||
function x() { let a="data-name".replace(/a/g, "o"); return a; }
|
||||
''')
|
||||
self.assertEqual(jsi.call_function('x'), 'doto-nome')
|
||||
|
||||
jsi = JSInterpreter(r'''
|
||||
function x() { let a="data-name".replaceAll("a", "o"); return a; }
|
||||
''')
|
||||
self.assertEqual(jsi.call_function('x'), 'doto-nome')
|
||||
|
||||
jsi = JSInterpreter(r'''
|
||||
function x() { let a=[/[)\\]/]; return a[0]; }
|
||||
''')
|
||||
|
@ -485,6 +505,12 @@ class TestJSInterpreter(unittest.TestCase):
|
|||
jsi = JSInterpreter('function x(){return 1236566549 << 5}')
|
||||
self.assertEqual(jsi.call_function('x'), 915423904)
|
||||
|
||||
""" # fails so far
|
||||
def test_packed(self):
|
||||
jsi = JSInterpreter('''function x(p,a,c,k,e,d){while(c--)if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p}''')
|
||||
self.assertEqual(jsi.call_function('x', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("<q />").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|')))
|
||||
"""
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
|
|
|
@ -250,6 +250,7 @@ class TestUtil(unittest.TestCase):
|
|||
self.assertEqual(sanitize_url('httpss://foo.bar'), 'https://foo.bar')
|
||||
self.assertEqual(sanitize_url('rmtps://foo.bar'), 'rtmps://foo.bar')
|
||||
self.assertEqual(sanitize_url('https://foo.bar'), 'https://foo.bar')
|
||||
self.assertEqual(sanitize_url('foo bar'), 'foo bar')
|
||||
|
||||
def test_expand_path(self):
|
||||
def env(var):
|
||||
|
|
|
@ -67,6 +67,10 @@ _SIG_TESTS = [
|
|||
]
|
||||
|
||||
_NSIG_TESTS = [
|
||||
(
|
||||
'https://www.youtube.com/s/player/7862ca1f/player_ias.vflset/en_US/base.js',
|
||||
'X_LCxVDjAavgE5t', 'yxJ1dM6iz5ogUg',
|
||||
),
|
||||
(
|
||||
'https://www.youtube.com/s/player/9216d1f7/player_ias.vflset/en_US/base.js',
|
||||
'SLp9F5bwjAdhE9F-', 'gWnb9IK2DJ8Q1w',
|
||||
|
|
|
@ -39,6 +39,7 @@ from .compat import (
|
|||
compat_str,
|
||||
compat_tokenize_tokenize,
|
||||
compat_urllib_error,
|
||||
compat_urllib_parse,
|
||||
compat_urllib_request,
|
||||
compat_urllib_request_DataHandler,
|
||||
)
|
||||
|
@ -60,6 +61,7 @@ from .utils import (
|
|||
format_bytes,
|
||||
formatSeconds,
|
||||
GeoRestrictedError,
|
||||
HEADRequest,
|
||||
int_or_none,
|
||||
ISO3166Utils,
|
||||
locked_file,
|
||||
|
@ -74,6 +76,7 @@ from .utils import (
|
|||
preferredencoding,
|
||||
prepend_extension,
|
||||
process_communicate_or_kill,
|
||||
PUTRequest,
|
||||
register_socks_protocols,
|
||||
render_table,
|
||||
replace_extension,
|
||||
|
@ -2297,6 +2300,27 @@ class YoutubeDL(object):
|
|||
""" Start an HTTP download """
|
||||
if isinstance(req, compat_basestring):
|
||||
req = sanitized_Request(req)
|
||||
# an embedded /../ sequence is not automatically handled by urllib2
|
||||
# see https://github.com/yt-dlp/yt-dlp/issues/3355
|
||||
url = req.get_full_url()
|
||||
parts = url.partition('/../')
|
||||
if parts[1]:
|
||||
url = compat_urllib_parse.urljoin(parts[0] + parts[1][:1], parts[1][1:] + parts[2])
|
||||
if url:
|
||||
# worse, URL path may have initial /../ against RFCs: work-around
|
||||
# by stripping such prefixes, like eg Firefox
|
||||
parts = compat_urllib_parse.urlsplit(url)
|
||||
path = parts.path
|
||||
while path.startswith('/../'):
|
||||
path = path[3:]
|
||||
url = parts._replace(path=path).geturl()
|
||||
# get a new Request with the munged URL
|
||||
if url != req.get_full_url():
|
||||
req_type = {'HEAD': HEADRequest, 'PUT': PUTRequest}.get(
|
||||
req.get_method(), compat_urllib_request.Request)
|
||||
req = req_type(
|
||||
url, data=req.data, headers=dict(req.header_items()),
|
||||
origin_req_host=req.origin_req_host, unverifiable=req.unverifiable)
|
||||
return self._opener.open(req, timeout=self._socket_timeout)
|
||||
|
||||
def print_debug_header(self):
|
||||
|
|
|
@ -88,17 +88,21 @@ class FileDownloader(object):
|
|||
return '---.-%'
|
||||
return '%6s' % ('%3.1f%%' % percent)
|
||||
|
||||
@staticmethod
|
||||
def calc_eta(start, now, total, current):
|
||||
@classmethod
|
||||
def calc_eta(cls, start_or_rate, now_or_remaining, *args):
|
||||
if len(args) < 2:
|
||||
rate, remaining = (start_or_rate, now_or_remaining)
|
||||
if None in (rate, remaining):
|
||||
return None
|
||||
return int(float(remaining) / rate)
|
||||
start, now = (start_or_rate, now_or_remaining)
|
||||
total, current = args
|
||||
if total is None:
|
||||
return None
|
||||
if now is None:
|
||||
now = time.time()
|
||||
dif = now - start
|
||||
if current == 0 or dif < 0.001: # One millisecond
|
||||
return None
|
||||
rate = float(current) / dif
|
||||
return int((float(total) - float(current)) / rate)
|
||||
rate = cls.calc_speed(start, now, current)
|
||||
return rate and int((float(total) - float(current)) / rate)
|
||||
|
||||
@staticmethod
|
||||
def format_eta(eta):
|
||||
|
@ -123,6 +127,12 @@ class FileDownloader(object):
|
|||
def format_retries(retries):
|
||||
return 'inf' if retries == float('inf') else '%.0f' % retries
|
||||
|
||||
@staticmethod
|
||||
def filesize_or_none(unencoded_filename):
|
||||
fn = encodeFilename(unencoded_filename)
|
||||
if os.path.isfile(fn):
|
||||
return os.path.getsize(fn)
|
||||
|
||||
@staticmethod
|
||||
def best_block_size(elapsed_time, bytes):
|
||||
new_min = max(bytes / 2.0, 1.0)
|
||||
|
|
|
@ -38,8 +38,7 @@ class DashSegmentsFD(FragmentFD):
|
|||
# In DASH, the first segment contains necessary headers to
|
||||
# generate a valid MP4 file, so always abort for the first segment
|
||||
fatal = i == 0 or not skip_unavailable_fragments
|
||||
count = 0
|
||||
while count <= fragment_retries:
|
||||
for count in range(fragment_retries + 1):
|
||||
try:
|
||||
fragment_url = fragment.get('url')
|
||||
if not fragment_url:
|
||||
|
@ -57,9 +56,8 @@ class DashSegmentsFD(FragmentFD):
|
|||
# is usually enough) thus allowing to download the whole file successfully.
|
||||
# To be future-proof we will retry all fragments that fail with any
|
||||
# HTTP error.
|
||||
count += 1
|
||||
if count <= fragment_retries:
|
||||
self.report_retry_fragment(err, frag_index, count, fragment_retries)
|
||||
if count < fragment_retries:
|
||||
self.report_retry_fragment(err, frag_index, count + 1, fragment_retries)
|
||||
except DownloadError:
|
||||
# Don't retry fragment if error occurred during HTTP downloading
|
||||
# itself since it has own retry settings
|
||||
|
@ -68,7 +66,7 @@ class DashSegmentsFD(FragmentFD):
|
|||
break
|
||||
raise
|
||||
|
||||
if count > fragment_retries:
|
||||
if count >= fragment_retries:
|
||||
if not fatal:
|
||||
self.report_skip_fragment(frag_index)
|
||||
continue
|
||||
|
|
|
@ -273,7 +273,7 @@ class HttpieFD(ExternalFD):
|
|||
class FFmpegFD(ExternalFD):
|
||||
@classmethod
|
||||
def supports(cls, info_dict):
|
||||
return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms')
|
||||
return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms', 'http_dash_segments')
|
||||
|
||||
@classmethod
|
||||
def available(cls):
|
||||
|
|
|
@ -71,7 +71,7 @@ class FragmentFD(FileDownloader):
|
|||
|
||||
@staticmethod
|
||||
def __do_ytdl_file(ctx):
|
||||
return not ctx['live'] and not ctx['tmpfilename'] == '-'
|
||||
return ctx['live'] is not True and ctx['tmpfilename'] != '-'
|
||||
|
||||
def _read_ytdl_file(self, ctx):
|
||||
assert 'ytdl_corrupt' not in ctx
|
||||
|
@ -101,6 +101,13 @@ class FragmentFD(FileDownloader):
|
|||
'url': frag_url,
|
||||
'http_headers': headers or info_dict.get('http_headers'),
|
||||
}
|
||||
frag_resume_len = 0
|
||||
if ctx['dl'].params.get('continuedl', True):
|
||||
frag_resume_len = self.filesize_or_none(
|
||||
self.temp_name(fragment_filename))
|
||||
fragment_info_dict['frag_resume_len'] = frag_resume_len
|
||||
ctx['frag_resume_len'] = frag_resume_len or 0
|
||||
|
||||
success = ctx['dl'].download(fragment_filename, fragment_info_dict)
|
||||
if not success:
|
||||
return False, None
|
||||
|
@ -124,9 +131,7 @@ class FragmentFD(FileDownloader):
|
|||
del ctx['fragment_filename_sanitized']
|
||||
|
||||
def _prepare_frag_download(self, ctx):
|
||||
if 'live' not in ctx:
|
||||
ctx['live'] = False
|
||||
if not ctx['live']:
|
||||
if not ctx.setdefault('live', False):
|
||||
total_frags_str = '%d' % ctx['total_frags']
|
||||
ad_frags = ctx.get('ad_frags', 0)
|
||||
if ad_frags:
|
||||
|
@ -136,10 +141,11 @@ class FragmentFD(FileDownloader):
|
|||
self.to_screen(
|
||||
'[%s] Total fragments: %s' % (self.FD_NAME, total_frags_str))
|
||||
self.report_destination(ctx['filename'])
|
||||
continuedl = self.params.get('continuedl', True)
|
||||
dl = HttpQuietDownloader(
|
||||
self.ydl,
|
||||
{
|
||||
'continuedl': True,
|
||||
'continuedl': continuedl,
|
||||
'quiet': True,
|
||||
'noprogress': True,
|
||||
'ratelimit': self.params.get('ratelimit'),
|
||||
|
@ -150,12 +156,11 @@ class FragmentFD(FileDownloader):
|
|||
)
|
||||
tmpfilename = self.temp_name(ctx['filename'])
|
||||
open_mode = 'wb'
|
||||
resume_len = 0
|
||||
|
||||
# Establish possible resume length
|
||||
if os.path.isfile(encodeFilename(tmpfilename)):
|
||||
resume_len = self.filesize_or_none(tmpfilename) or 0
|
||||
if resume_len > 0:
|
||||
open_mode = 'ab'
|
||||
resume_len = os.path.getsize(encodeFilename(tmpfilename))
|
||||
|
||||
# Should be initialized before ytdl file check
|
||||
ctx.update({
|
||||
|
@ -164,7 +169,8 @@ class FragmentFD(FileDownloader):
|
|||
})
|
||||
|
||||
if self.__do_ytdl_file(ctx):
|
||||
if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))):
|
||||
ytdl_file_exists = os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename'])))
|
||||
if continuedl and ytdl_file_exists:
|
||||
self._read_ytdl_file(ctx)
|
||||
is_corrupt = ctx.get('ytdl_corrupt') is True
|
||||
is_inconsistent = ctx['fragment_index'] > 0 and resume_len == 0
|
||||
|
@ -178,7 +184,12 @@ class FragmentFD(FileDownloader):
|
|||
if 'ytdl_corrupt' in ctx:
|
||||
del ctx['ytdl_corrupt']
|
||||
self._write_ytdl_file(ctx)
|
||||
|
||||
else:
|
||||
if not continuedl:
|
||||
if ytdl_file_exists:
|
||||
self._read_ytdl_file(ctx)
|
||||
ctx['fragment_index'] = resume_len = 0
|
||||
self._write_ytdl_file(ctx)
|
||||
assert ctx['fragment_index'] == 0
|
||||
|
||||
|
@ -209,6 +220,7 @@ class FragmentFD(FileDownloader):
|
|||
start = time.time()
|
||||
ctx.update({
|
||||
'started': start,
|
||||
'fragment_started': start,
|
||||
# Amount of fragment's bytes downloaded by the time of the previous
|
||||
# frag progress hook invocation
|
||||
'prev_frag_downloaded_bytes': 0,
|
||||
|
@ -218,6 +230,9 @@ class FragmentFD(FileDownloader):
|
|||
if s['status'] not in ('downloading', 'finished'):
|
||||
return
|
||||
|
||||
if not total_frags and ctx.get('fragment_count'):
|
||||
state['fragment_count'] = ctx['fragment_count']
|
||||
|
||||
time_now = time.time()
|
||||
state['elapsed'] = time_now - start
|
||||
frag_total_bytes = s.get('total_bytes') or 0
|
||||
|
@ -232,16 +247,17 @@ class FragmentFD(FileDownloader):
|
|||
ctx['fragment_index'] = state['fragment_index']
|
||||
state['downloaded_bytes'] += frag_total_bytes - ctx['prev_frag_downloaded_bytes']
|
||||
ctx['complete_frags_downloaded_bytes'] = state['downloaded_bytes']
|
||||
ctx['speed'] = state['speed'] = self.calc_speed(
|
||||
ctx['fragment_started'], time_now, frag_total_bytes)
|
||||
ctx['fragment_started'] = time.time()
|
||||
ctx['prev_frag_downloaded_bytes'] = 0
|
||||
else:
|
||||
frag_downloaded_bytes = s['downloaded_bytes']
|
||||
state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
|
||||
ctx['speed'] = state['speed'] = self.calc_speed(
|
||||
ctx['fragment_started'], time_now, frag_downloaded_bytes - ctx['frag_resume_len'])
|
||||
if not ctx['live']:
|
||||
state['eta'] = self.calc_eta(
|
||||
start, time_now, estimated_size - resume_len,
|
||||
state['downloaded_bytes'] - resume_len)
|
||||
state['speed'] = s.get('speed') or ctx.get('speed')
|
||||
ctx['speed'] = state['speed']
|
||||
state['eta'] = self.calc_eta(state['speed'], estimated_size - state['downloaded_bytes'])
|
||||
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
|
||||
self._hook_progress(state)
|
||||
|
||||
|
@ -268,7 +284,7 @@ class FragmentFD(FileDownloader):
|
|||
os.utime(ctx['filename'], (time.time(), filetime))
|
||||
except Exception:
|
||||
pass
|
||||
downloaded_bytes = os.path.getsize(encodeFilename(ctx['filename']))
|
||||
downloaded_bytes = self.filesize_or_none(ctx['filename']) or 0
|
||||
|
||||
self._hook_progress({
|
||||
'downloaded_bytes': downloaded_bytes,
|
||||
|
|
|
@ -58,9 +58,9 @@ class HttpFD(FileDownloader):
|
|||
|
||||
if self.params.get('continuedl', True):
|
||||
# Establish possible resume length
|
||||
if os.path.isfile(encodeFilename(ctx.tmpfilename)):
|
||||
ctx.resume_len = os.path.getsize(
|
||||
encodeFilename(ctx.tmpfilename))
|
||||
ctx.resume_len = info_dict.get('frag_resume_len')
|
||||
if ctx.resume_len is None:
|
||||
ctx.resume_len = self.filesize_or_none(ctx.tmpfilename) or 0
|
||||
|
||||
ctx.is_resume = ctx.resume_len > 0
|
||||
|
||||
|
@ -115,9 +115,9 @@ class HttpFD(FileDownloader):
|
|||
raise RetryDownload(err)
|
||||
raise err
|
||||
# When trying to resume, Content-Range HTTP header of response has to be checked
|
||||
# to match the value of requested Range HTTP header. This is due to a webservers
|
||||
# to match the value of requested Range HTTP header. This is due to webservers
|
||||
# that don't support resuming and serve a whole file with no Content-Range
|
||||
# set in response despite of requested Range (see
|
||||
# set in response despite requested Range (see
|
||||
# https://github.com/ytdl-org/youtube-dl/issues/6057#issuecomment-126129799)
|
||||
if has_range:
|
||||
content_range = ctx.data.headers.get('Content-Range')
|
||||
|
@ -293,10 +293,7 @@ class HttpFD(FileDownloader):
|
|||
|
||||
# Progress message
|
||||
speed = self.calc_speed(start, now, byte_counter - ctx.resume_len)
|
||||
if ctx.data_len is None:
|
||||
eta = None
|
||||
else:
|
||||
eta = self.calc_eta(start, time.time(), ctx.data_len - ctx.resume_len, byte_counter - ctx.resume_len)
|
||||
eta = self.calc_eta(speed, ctx.data_len and (ctx.data_len - ctx.resume_len))
|
||||
|
||||
self._hook_progress({
|
||||
'status': 'downloading',
|
||||
|
|
|
@ -8,6 +8,8 @@ from ..utils import (
|
|||
ExtractorError,
|
||||
GeoRestrictedError,
|
||||
int_or_none,
|
||||
remove_start,
|
||||
traverse_obj,
|
||||
update_url_query,
|
||||
urlencode_postdata,
|
||||
)
|
||||
|
@ -33,14 +35,17 @@ class AENetworksBaseIE(ThePlatformIE):
|
|||
}
|
||||
|
||||
def _extract_aen_smil(self, smil_url, video_id, auth=None):
|
||||
query = {'mbr': 'true'}
|
||||
query = {
|
||||
'mbr': 'true',
|
||||
'formats': 'M3U+none,MPEG-DASH+none,MPEG4,MP3',
|
||||
}
|
||||
if auth:
|
||||
query['auth'] = auth
|
||||
TP_SMIL_QUERY = [{
|
||||
'assetTypes': 'high_video_ak',
|
||||
'switch': 'hls_high_ak'
|
||||
'switch': 'hls_high_ak',
|
||||
}, {
|
||||
'assetTypes': 'high_video_s3'
|
||||
'assetTypes': 'high_video_s3',
|
||||
}, {
|
||||
'assetTypes': 'high_video_s3',
|
||||
'switch': 'hls_high_fastly',
|
||||
|
@ -75,7 +80,14 @@ class AENetworksBaseIE(ThePlatformIE):
|
|||
requestor_id, brand = self._DOMAIN_MAP[domain]
|
||||
result = self._download_json(
|
||||
'https://feeds.video.aetnd.com/api/v2/%s/videos' % brand,
|
||||
filter_value, query={'filter[%s]' % filter_key: filter_value})['results'][0]
|
||||
filter_value, query={'filter[%s]' % filter_key: filter_value})
|
||||
result = traverse_obj(
|
||||
result, ('results',
|
||||
lambda k, v: k == 0 and v[filter_key] == filter_value),
|
||||
get_all=False)
|
||||
if not result:
|
||||
raise ExtractorError('Show not found in A&E feed (too new?)', expected=True,
|
||||
video_id=remove_start(filter_value, '/'))
|
||||
title = result['title']
|
||||
video_id = result['id']
|
||||
media_url = result['publicUrl']
|
||||
|
@ -126,7 +138,7 @@ class AENetworksIE(AENetworksBaseIE):
|
|||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['ThePlatform'],
|
||||
'skip': 'This video is only available for users of participating TV providers.',
|
||||
'skip': 'Geo-restricted - This content is not available in your location.'
|
||||
}, {
|
||||
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
|
||||
'info_dict': {
|
||||
|
@ -143,6 +155,7 @@ class AENetworksIE(AENetworksBaseIE):
|
|||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['ThePlatform'],
|
||||
'skip': 'This video is only available for users of participating TV providers.',
|
||||
}, {
|
||||
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
|
||||
'only_matching': True
|
||||
|
|
|
@ -1087,7 +1087,7 @@ class InfoExtractor(object):
|
|||
# Helper functions for extracting OpenGraph info
|
||||
@staticmethod
|
||||
def _og_regexes(prop):
|
||||
content_re = r'content=(?:"([^"]+?)"|\'([^\']+?)\'|\s*([^\s"\'=<>`]+?))'
|
||||
content_re = r'content=(?:"([^"]+?)"|\'([^\']+?)\'|\s*([^\s"\'=<>`]+?)(?=\s|/?>))'
|
||||
property_re = (r'(?:name|property)=(?:\'og[:-]%(prop)s\'|"og[:-]%(prop)s"|\s*og[:-]%(prop)s\b)'
|
||||
% {'prop': re.escape(prop)})
|
||||
template = r'<meta[^>]+?%s[^>]+?%s'
|
||||
|
|
|
@ -2320,6 +2320,25 @@ class GenericIE(InfoExtractor):
|
|||
'height': 720,
|
||||
'age_limit': 18,
|
||||
},
|
||||
}, {
|
||||
# would like to use the yt-dl test video but searching for
|
||||
# '"\'/\\ä↭𝕐' fails, so using an old vid from YouTube Korea
|
||||
'note': 'Test default search',
|
||||
'url': 'Shorts로 허락 필요없이 놀자! (BTS편)',
|
||||
'info_dict': {
|
||||
'id': 'usDGO4Zb-dc',
|
||||
'ext': 'mp4',
|
||||
'title': 'YouTube Shorts로 허락 필요없이 놀자! (BTS편)',
|
||||
'description': 'md5:96e31607eba81ab441567b5e289f4716',
|
||||
'upload_date': '20211107',
|
||||
'uploader': 'YouTube Korea',
|
||||
'location': '대한민국',
|
||||
},
|
||||
'params': {
|
||||
'default_search': 'ytsearch',
|
||||
'skip_download': True,
|
||||
},
|
||||
'expected_warnings': ['uploader id'],
|
||||
},
|
||||
]
|
||||
|
||||
|
|
|
@ -1,19 +1,29 @@
|
|||
# coding: utf-8
|
||||
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_filter as filter,
|
||||
compat_HTTPError,
|
||||
compat_parse_qs,
|
||||
compat_urllib_parse_urlparse,
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..utils import (
|
||||
HEADRequest,
|
||||
determine_ext,
|
||||
error_to_compat_str,
|
||||
extract_attributes,
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
merge_dicts,
|
||||
orderedSet,
|
||||
parse_iso8601,
|
||||
strip_or_none,
|
||||
try_get,
|
||||
traverse_obj,
|
||||
url_or_none,
|
||||
urljoin,
|
||||
)
|
||||
|
||||
|
||||
|
@ -22,14 +32,102 @@ class IGNBaseIE(InfoExtractor):
|
|||
return self._download_json(
|
||||
'http://apis.ign.com/{0}/v3/{0}s/slug/{1}'.format(self._PAGE_TYPE, slug), slug)
|
||||
|
||||
def _checked_call_api(self, slug):
|
||||
try:
|
||||
return self._call_api(slug)
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
|
||||
e.cause.args = e.cause.args or [
|
||||
e.cause.geturl(), e.cause.getcode(), e.cause.reason]
|
||||
raise ExtractorError(
|
||||
'Content not found: expired?', cause=e.cause,
|
||||
expected=True)
|
||||
raise
|
||||
|
||||
def _extract_video_info(self, video, fatal=True):
|
||||
video_id = video['videoId']
|
||||
|
||||
formats = []
|
||||
refs = traverse_obj(video, 'refs', expected_type=dict) or {}
|
||||
|
||||
m3u8_url = url_or_none(refs.get('m3uUrl'))
|
||||
if m3u8_url:
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
m3u8_url, video_id, 'mp4', 'm3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
|
||||
f4m_url = url_or_none(refs.get('f4mUrl'))
|
||||
if f4m_url:
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
f4m_url, video_id, f4m_id='hds', fatal=False))
|
||||
|
||||
for asset in (video.get('assets') or []):
|
||||
asset_url = url_or_none(asset.get('url'))
|
||||
if not asset_url:
|
||||
continue
|
||||
formats.append({
|
||||
'url': asset_url,
|
||||
'tbr': int_or_none(asset.get('bitrate'), 1000),
|
||||
'fps': int_or_none(asset.get('frame_rate')),
|
||||
'height': int_or_none(asset.get('height')),
|
||||
'width': int_or_none(asset.get('width')),
|
||||
})
|
||||
|
||||
mezzanine_url = traverse_obj(
|
||||
video, ('system', 'mezzanineUrl'), expected_type=url_or_none)
|
||||
if mezzanine_url:
|
||||
formats.append({
|
||||
'ext': determine_ext(mezzanine_url, 'mp4'),
|
||||
'format_id': 'mezzanine',
|
||||
'preference': 1,
|
||||
'url': mezzanine_url,
|
||||
})
|
||||
|
||||
if formats or fatal:
|
||||
self._sort_formats(formats)
|
||||
else:
|
||||
return
|
||||
|
||||
thumbnails = traverse_obj(
|
||||
video, ('thumbnails', Ellipsis, {'url': 'url'}), expected_type=url_or_none)
|
||||
tags = traverse_obj(
|
||||
video, ('tags', Ellipsis, 'displayName'),
|
||||
expected_type=lambda x: x.strip() or None)
|
||||
|
||||
metadata = traverse_obj(video, 'metadata', expected_type=dict) or {}
|
||||
title = traverse_obj(
|
||||
metadata, 'longTitle', 'title', 'name',
|
||||
expected_type=lambda x: x.strip() or None)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': strip_or_none(metadata.get('description')),
|
||||
'timestamp': parse_iso8601(metadata.get('publishDate')),
|
||||
'duration': int_or_none(metadata.get('duration')),
|
||||
'thumbnails': thumbnails,
|
||||
'formats': formats,
|
||||
'tags': tags,
|
||||
}
|
||||
|
||||
# yt-dlp shim
|
||||
@classmethod
|
||||
def _extract_from_webpage(cls, url, webpage):
|
||||
for embed_url in orderedSet(
|
||||
cls._extract_embed_urls(url, webpage) or [], lazy=True):
|
||||
yield cls.url_result(embed_url, None if cls._VALID_URL is False else cls)
|
||||
|
||||
|
||||
class IGNIE(IGNBaseIE):
|
||||
"""
|
||||
Extractor for some of the IGN sites, like www.ign.com, es.ign.com de.ign.com.
|
||||
Some videos of it.ign.com are also supported
|
||||
"""
|
||||
|
||||
_VALID_URL = r'https?://(?:.+?\.ign|www\.pcmag)\.com/videos/(?:\d{4}/\d{2}/\d{2}/)?(?P<id>[^/?&#]+)'
|
||||
_VIDEO_PATH_RE = r'/(?:\d{4}/\d{2}/\d{2}/)?(?P<id>.+?)'
|
||||
_PLAYLIST_PATH_RE = r'(?:/?\?(?P<filt>[^&#]+))?'
|
||||
_VALID_URL = (
|
||||
r'https?://(?:.+?\.ign|www\.pcmag)\.com/videos(?:%s)'
|
||||
% '|'.join((_VIDEO_PATH_RE + r'(?:[/?&#]|$)', _PLAYLIST_PATH_RE)))
|
||||
IE_NAME = 'ign.com'
|
||||
_PAGE_TYPE = 'video'
|
||||
|
||||
|
@ -44,7 +142,10 @@ class IGNIE(IGNBaseIE):
|
|||
'timestamp': 1370440800,
|
||||
'upload_date': '20130605',
|
||||
'tags': 'count:9',
|
||||
}
|
||||
},
|
||||
'params': {
|
||||
'nocheckcertificate': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.pcmag.com/videos/2015/01/06/010615-whats-new-now-is-gogo-snooping-on-your-data',
|
||||
'md5': 'f1581a6fe8c5121be5b807684aeac3f6',
|
||||
|
@ -56,86 +157,51 @@ class IGNIE(IGNBaseIE):
|
|||
'timestamp': 1420571160,
|
||||
'upload_date': '20150106',
|
||||
'tags': 'count:4',
|
||||
}
|
||||
},
|
||||
'skip': '404 Not Found',
|
||||
}, {
|
||||
'url': 'https://www.ign.com/videos/is-a-resident-evil-4-remake-on-the-way-ign-daily-fix',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
def _extract_embed_urls(cls, url, webpage):
|
||||
grids = re.findall(
|
||||
r'''(?s)<section\b[^>]+\bclass\s*=\s*['"](?:[\w-]+\s+)*?content-feed-grid(?!\B|-)[^>]+>(.+?)</section[^>]*>''',
|
||||
webpage)
|
||||
return filter(None,
|
||||
(urljoin(url, m.group('path')) for m in re.finditer(
|
||||
r'''<a\b[^>]+\bhref\s*=\s*('|")(?P<path>/videos%s)\1'''
|
||||
% cls._VIDEO_PATH_RE, grids[0] if grids else '')))
|
||||
|
||||
def _real_extract(self, url):
|
||||
m = re.match(self._VALID_URL, url)
|
||||
display_id = m.group('id')
|
||||
if display_id:
|
||||
return self._extract_video(url, display_id)
|
||||
display_id = m.group('filt') or 'all'
|
||||
return self._extract_playlist(url, display_id)
|
||||
|
||||
def _extract_playlist(self, url, display_id):
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
return self.playlist_result(
|
||||
(self.url_result(u, ie=self.ie_key())
|
||||
for u in self._extract_embed_urls(url, webpage)),
|
||||
playlist_id=display_id)
|
||||
|
||||
def _extract_video(self, url, display_id):
|
||||
display_id = self._match_id(url)
|
||||
video = self._call_api(display_id)
|
||||
video_id = video['videoId']
|
||||
metadata = video['metadata']
|
||||
title = metadata.get('longTitle') or metadata.get('title') or metadata['name']
|
||||
video = self._checked_call_api(display_id)
|
||||
|
||||
formats = []
|
||||
refs = video.get('refs') or {}
|
||||
info = self._extract_video_info(video)
|
||||
|
||||
m3u8_url = refs.get('m3uUrl')
|
||||
if m3u8_url:
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
m3u8_url, video_id, 'mp4', 'm3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
|
||||
f4m_url = refs.get('f4mUrl')
|
||||
if f4m_url:
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
f4m_url, video_id, f4m_id='hds', fatal=False))
|
||||
|
||||
for asset in (video.get('assets') or []):
|
||||
asset_url = asset.get('url')
|
||||
if not asset_url:
|
||||
continue
|
||||
formats.append({
|
||||
'url': asset_url,
|
||||
'tbr': int_or_none(asset.get('bitrate'), 1000),
|
||||
'fps': int_or_none(asset.get('frame_rate')),
|
||||
'height': int_or_none(asset.get('height')),
|
||||
'width': int_or_none(asset.get('width')),
|
||||
})
|
||||
|
||||
mezzanine_url = try_get(video, lambda x: x['system']['mezzanineUrl'])
|
||||
if mezzanine_url:
|
||||
formats.append({
|
||||
'ext': determine_ext(mezzanine_url, 'mp4'),
|
||||
'format_id': 'mezzanine',
|
||||
'preference': 1,
|
||||
'url': mezzanine_url,
|
||||
})
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
thumbnails = []
|
||||
for thumbnail in (video.get('thumbnails') or []):
|
||||
thumbnail_url = thumbnail.get('url')
|
||||
if not thumbnail_url:
|
||||
continue
|
||||
thumbnails.append({
|
||||
'url': thumbnail_url,
|
||||
})
|
||||
|
||||
tags = []
|
||||
for tag in (video.get('tags') or []):
|
||||
display_name = tag.get('displayName')
|
||||
if not display_name:
|
||||
continue
|
||||
tags.append(display_name)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': strip_or_none(metadata.get('description')),
|
||||
'timestamp': parse_iso8601(metadata.get('publishDate')),
|
||||
'duration': int_or_none(metadata.get('duration')),
|
||||
return merge_dicts({
|
||||
'display_id': display_id,
|
||||
'thumbnails': thumbnails,
|
||||
'formats': formats,
|
||||
'tags': tags,
|
||||
}
|
||||
}, info)
|
||||
|
||||
|
||||
class IGNVideoIE(InfoExtractor):
|
||||
class IGNVideoIE(IGNBaseIE):
|
||||
_VALID_URL = r'https?://.+?\.ign\.com/(?:[a-z]{2}/)?[^/]+/(?P<id>\d+)/(?:video|trailer)/'
|
||||
_TESTS = [{
|
||||
'url': 'http://me.ign.com/en/videos/112203/video/how-hitman-aims-to-be-different-than-every-other-s',
|
||||
|
@ -147,7 +213,8 @@ class IGNVideoIE(InfoExtractor):
|
|||
'description': 'Taking out assassination targets in Hitman has never been more stylish.',
|
||||
'timestamp': 1444665600,
|
||||
'upload_date': '20151012',
|
||||
}
|
||||
},
|
||||
'expected_warnings': ['HTTP Error 400: Bad Request'],
|
||||
}, {
|
||||
'url': 'http://me.ign.com/ar/angry-birds-2/106533/video/lrd-ldyy-lwl-lfylm-angry-birds',
|
||||
'only_matching': True,
|
||||
|
@ -167,22 +234,38 @@ class IGNVideoIE(InfoExtractor):
|
|||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
req = HEADRequest(url.rsplit('/', 1)[0] + '/embed')
|
||||
url = self._request_webpage(req, video_id).geturl()
|
||||
parsed_url = compat_urlparse.urlparse(url)
|
||||
embed_url = compat_urlparse.urlunparse(
|
||||
parsed_url._replace(path=parsed_url.path.rsplit('/', 1)[0] + '/embed'))
|
||||
|
||||
webpage, urlh = self._download_webpage_handle(embed_url, video_id)
|
||||
new_url = urlh.geturl()
|
||||
ign_url = compat_parse_qs(
|
||||
compat_urllib_parse_urlparse(url).query).get('url', [None])[0]
|
||||
compat_urlparse.urlparse(new_url).query).get('url', [None])[-1]
|
||||
if ign_url:
|
||||
return self.url_result(ign_url, IGNIE.ie_key())
|
||||
return self.url_result(url)
|
||||
video = self._search_regex(r'(<div\b[^>]+\bdata-video-id\s*=\s*[^>]+>)', webpage, 'video element', fatal=False)
|
||||
if not video:
|
||||
if new_url == url:
|
||||
raise ExtractorError('Redirect loop: ' + url)
|
||||
return self.url_result(new_url)
|
||||
video = extract_attributes(video)
|
||||
video_data = video.get('data-settings') or '{}'
|
||||
video_data = self._parse_json(video_data, video_id)['video']
|
||||
info = self._extract_video_info(video_data)
|
||||
|
||||
return merge_dicts({
|
||||
'display_id': video_id,
|
||||
}, info)
|
||||
|
||||
|
||||
class IGNArticleIE(IGNBaseIE):
|
||||
_VALID_URL = r'https?://.+?\.ign\.com/(?:articles(?:/\d{4}/\d{2}/\d{2})?|(?:[a-z]{2}/)?feature/\d+)/(?P<id>[^/?&#]+)'
|
||||
_VALID_URL = r'https?://.+?\.ign\.com/(?:articles(?:/\d{4}/\d{2}/\d{2})?|(?:[a-z]{2}/)?(?:[\w-]+/)*?feature/\d+)/(?P<id>[^/?&#]+)'
|
||||
_PAGE_TYPE = 'article'
|
||||
_TESTS = [{
|
||||
'url': 'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
|
||||
'info_dict': {
|
||||
'id': '524497489e4e8ff5848ece34',
|
||||
'id': '72113',
|
||||
'title': '100 Little Things in GTA 5 That Will Blow Your Mind',
|
||||
},
|
||||
'playlist': [
|
||||
|
@ -190,7 +273,7 @@ class IGNArticleIE(IGNBaseIE):
|
|||
'info_dict': {
|
||||
'id': '5ebbd138523268b93c9141af17bec937',
|
||||
'ext': 'mp4',
|
||||
'title': 'GTA 5 Video Review',
|
||||
'title': 'Grand Theft Auto V Video Review',
|
||||
'description': 'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
|
||||
'timestamp': 1379339880,
|
||||
'upload_date': '20130916',
|
||||
|
@ -200,7 +283,7 @@ class IGNArticleIE(IGNBaseIE):
|
|||
'info_dict': {
|
||||
'id': '638672ee848ae4ff108df2a296418ee2',
|
||||
'ext': 'mp4',
|
||||
'title': '26 Twisted Moments from GTA 5 in Slow Motion',
|
||||
'title': 'GTA 5 In Slow Motion',
|
||||
'description': 'The twisted beauty of GTA 5 in stunning slow motion.',
|
||||
'timestamp': 1386878820,
|
||||
'upload_date': '20131212',
|
||||
|
@ -208,16 +291,17 @@ class IGNArticleIE(IGNBaseIE):
|
|||
},
|
||||
],
|
||||
'params': {
|
||||
'playlist_items': '2-3',
|
||||
'skip_download': True,
|
||||
},
|
||||
'expected_warnings': ['Backend fetch failed'],
|
||||
}, {
|
||||
'url': 'http://www.ign.com/articles/2014/08/15/rewind-theater-wild-trailer-gamescom-2014?watch',
|
||||
'info_dict': {
|
||||
'id': '53ee806780a81ec46e0790f8',
|
||||
'title': 'Rewind Theater - Wild Trailer Gamescom 2014',
|
||||
},
|
||||
'playlist_count': 2,
|
||||
'playlist_count': 1,
|
||||
'expected_warnings': ['Backend fetch failed'],
|
||||
}, {
|
||||
# videoId pattern
|
||||
'url': 'http://www.ign.com/articles/2017/06/08/new-ducktales-short-donalds-birthday-doesnt-go-as-planned',
|
||||
|
@ -240,18 +324,91 @@ class IGNArticleIE(IGNBaseIE):
|
|||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _checked_call_api(self, slug):
|
||||
try:
|
||||
return self._call_api(slug)
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError):
|
||||
e.cause.args = e.cause.args or [
|
||||
e.cause.geturl(), e.cause.getcode(), e.cause.reason]
|
||||
if e.cause.code == 404:
|
||||
raise ExtractorError(
|
||||
'Content not found: expired?', cause=e.cause,
|
||||
expected=True)
|
||||
elif e.cause.code == 503:
|
||||
self.report_warning(error_to_compat_str(e.cause))
|
||||
return
|
||||
raise
|
||||
|
||||
def _search_nextjs_data(self, webpage, video_id, **kw):
|
||||
return self._parse_json(
|
||||
self._search_regex(
|
||||
r'(?s)<script[^>]+id=[\'"]__NEXT_DATA__[\'"][^>]*>([^<]+)</script>',
|
||||
webpage, 'next.js data', **kw),
|
||||
video_id, **kw)
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
article = self._call_api(display_id)
|
||||
article = self._checked_call_api(display_id)
|
||||
|
||||
def entries():
|
||||
media_url = try_get(article, lambda x: x['mediaRelations'][0]['media']['metadata']['url'])
|
||||
if media_url:
|
||||
yield self.url_result(media_url, IGNIE.ie_key())
|
||||
for content in (article.get('content') or []):
|
||||
for video_url in re.findall(r'(?:\[(?:ignvideo\s+url|youtube\s+clip_id)|<iframe[^>]+src)="([^"]+)"', content):
|
||||
yield self.url_result(video_url)
|
||||
if article:
|
||||
# obsolete ?
|
||||
def entries():
|
||||
media_url = traverse_obj(
|
||||
article, ('mediaRelations', 0, 'media', 'metadata', 'url'),
|
||||
expected_type=url_or_none)
|
||||
if media_url:
|
||||
yield self.url_result(media_url, IGNIE.ie_key())
|
||||
for content in (article.get('content') or []):
|
||||
for video_url in re.findall(r'(?:\[(?:ignvideo\s+url|youtube\s+clip_id)|<iframe[^>]+src)="([^"]+)"', content):
|
||||
if url_or_none(video_url):
|
||||
yield self.url_result(video_url)
|
||||
|
||||
return self.playlist_result(
|
||||
entries(), article.get('articleId'),
|
||||
traverse_obj(
|
||||
article, ('metadata', 'headline'),
|
||||
expected_type=lambda x: x.strip() or None))
|
||||
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
playlist_id = self._html_search_meta('dable:item_id', webpage, default=None)
|
||||
if playlist_id:
|
||||
|
||||
def entries():
|
||||
for m in re.finditer(
|
||||
r'''(?s)<object\b[^>]+\bclass\s*=\s*("|')ign-videoplayer\1[^>]*>(?P<params>.+?)</object''',
|
||||
webpage):
|
||||
flashvars = self._search_regex(
|
||||
r'''(<param\b[^>]+\bname\s*=\s*("|')flashvars\2[^>]*>)''',
|
||||
m.group('params'), 'flashvars', default='')
|
||||
flashvars = compat_parse_qs(extract_attributes(flashvars).get('value') or '')
|
||||
v_url = url_or_none((flashvars.get('url') or [None])[-1])
|
||||
if v_url:
|
||||
yield self.url_result(v_url)
|
||||
else:
|
||||
playlist_id = self._search_regex(
|
||||
r'''\bdata-post-id\s*=\s*("|')(?P<id>[\da-f]+)\1''',
|
||||
webpage, 'id', group='id', default=None)
|
||||
|
||||
nextjs_data = self._search_nextjs_data(webpage, display_id)
|
||||
|
||||
def entries():
|
||||
for player in traverse_obj(
|
||||
nextjs_data,
|
||||
('props', 'apolloState', 'ROOT_QUERY', lambda k, _: k.startswith('videoPlayerProps('), '__ref')):
|
||||
# skip promo links (which may not always be served, eg GH CI servers)
|
||||
if traverse_obj(nextjs_data,
|
||||
('props', 'apolloState', player.replace('PlayerProps', 'ModernContent')),
|
||||
expected_type=dict):
|
||||
continue
|
||||
video = traverse_obj(nextjs_data, ('props', 'apolloState', player), expected_type=dict) or {}
|
||||
info = self._extract_video_info(video, fatal=False)
|
||||
if info:
|
||||
yield merge_dicts({
|
||||
'display_id': display_id,
|
||||
}, info)
|
||||
|
||||
return self.playlist_result(
|
||||
entries(), article.get('articleId'),
|
||||
strip_or_none(try_get(article, lambda x: x['metadata']['headline'])))
|
||||
entries(), playlist_id or display_id,
|
||||
re.sub(r'\s+-\s+IGN\s*$', '', self._og_search_title(webpage, default='')) or None)
|
||||
|
|
|
@ -261,27 +261,33 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||
|
||||
# _VALID_URL matches Vimeo URLs
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://
|
||||
(?:
|
||||
(?:
|
||||
www|
|
||||
player
|
||||
)
|
||||
\.
|
||||
)?
|
||||
vimeo(?:pro)?\.com/
|
||||
(?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
|
||||
(?:.*?/)??
|
||||
(?:
|
||||
(?:
|
||||
play_redirect_hls|
|
||||
moogaloop\.swf)\?clip_id=
|
||||
)?
|
||||
(?:videos?/)?
|
||||
(?P<id>[0-9]+)
|
||||
(?:/(?P<unlisted_hash>[\da-f]{10}))?
|
||||
/?(?:[?&].*)?(?:[#].*)?$
|
||||
'''
|
||||
https?://
|
||||
(?:
|
||||
(?:
|
||||
www|
|
||||
player
|
||||
)
|
||||
\.
|
||||
)?
|
||||
vimeo(?:pro)?\.com/
|
||||
(?:
|
||||
(?P<u>user)|
|
||||
(?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
|
||||
(?:.*?/)??
|
||||
(?P<q>
|
||||
(?:
|
||||
play_redirect_hls|
|
||||
moogaloop\.swf)\?clip_id=
|
||||
)?
|
||||
(?:videos?/)?
|
||||
)
|
||||
(?P<id>[0-9]+)
|
||||
(?(u)
|
||||
/(?!videos|likes)[^/?#]+/?|
|
||||
(?(q)|/(?P<unlisted_hash>[\da-f]{10}))?
|
||||
)
|
||||
(?:(?(q)[&]|(?(u)|/?)[?]).+?)?(?:[#].*)?$
|
||||
'''
|
||||
IE_NAME = 'vimeo'
|
||||
_TESTS = [
|
||||
{
|
||||
|
@ -539,7 +545,12 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
},
|
||||
{
|
||||
# user playlist alias -> https://vimeo.com/258705797
|
||||
'url': 'https://vimeo.com/user26785108/newspiritualguide',
|
||||
'only_matching': True,
|
||||
},
|
||||
# https://gettingthingsdone.com/workflowmap/
|
||||
# vimeo embed with check-password page protected by Referer header
|
||||
]
|
||||
|
@ -663,7 +674,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||
|
||||
if '//player.vimeo.com/video/' in url:
|
||||
config = self._parse_json(self._search_regex(
|
||||
r'\b(?:playerC|c)onfig\s*=\s*({.+?})\s*;', webpage, 'info section'), video_id)
|
||||
r'(?s)\b(?:playerC|c)onfig\s*=\s*({.+?})\s*[;\n]', webpage, 'info section'), video_id)
|
||||
if config.get('view') == 4:
|
||||
config = self._verify_player_video_password(
|
||||
redirect_url, video_id, headers)
|
||||
|
|
|
@ -31,6 +31,7 @@ from ..utils import (
|
|||
get_element_by_attribute,
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
merge_dicts,
|
||||
mimetype2ext,
|
||||
parse_codecs,
|
||||
parse_duration,
|
||||
|
@ -400,6 +401,62 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
|
|||
break
|
||||
data['continuation'] = token
|
||||
|
||||
@staticmethod
|
||||
def _owner_endpoints_path():
|
||||
return [
|
||||
Ellipsis,
|
||||
lambda k, _: k.endswith('SecondaryInfoRenderer'),
|
||||
('owner', 'videoOwner'), 'videoOwnerRenderer', 'title',
|
||||
'runs', Ellipsis]
|
||||
|
||||
def _extract_channel_id(self, webpage, videodetails={}, metadata={}, renderers=[]):
|
||||
channel_id = None
|
||||
if any((videodetails, metadata, renderers)):
|
||||
channel_id = (
|
||||
traverse_obj(videodetails, 'channelId')
|
||||
or traverse_obj(metadata, 'externalChannelId', 'externalId')
|
||||
or traverse_obj(renderers,
|
||||
self._owner_endpoints_path() + [
|
||||
'navigationEndpoint', 'browseEndpoint', 'browseId'],
|
||||
get_all=False)
|
||||
)
|
||||
return channel_id or self._html_search_meta(
|
||||
'channelId', webpage, 'channel id', default=None)
|
||||
|
||||
def _extract_author_var(self, webpage, var_name,
|
||||
videodetails={}, metadata={}, renderers=[]):
|
||||
result = None
|
||||
paths = {
|
||||
# (HTML, videodetails, metadata, renderers)
|
||||
'name': ('content', 'author', (('ownerChannelName', None), 'title'), ['text']),
|
||||
'url': ('href', 'ownerProfileUrl', 'vanityChannelUrl',
|
||||
['navigationEndpoint', 'browseEndpoint', 'canonicalBaseUrl'])
|
||||
}
|
||||
if any((videodetails, metadata, renderers)):
|
||||
result = (
|
||||
traverse_obj(videodetails, paths[var_name][1], get_all=False)
|
||||
or traverse_obj(metadata, paths[var_name][2], get_all=False)
|
||||
or traverse_obj(renderers,
|
||||
self._owner_endpoints_path() + paths[var_name][3],
|
||||
get_all=False)
|
||||
)
|
||||
return result or traverse_obj(
|
||||
extract_attributes(self._search_regex(
|
||||
r'''(?s)(<link\b[^>]+\bitemprop\s*=\s*("|')%s\2[^>]*>)'''
|
||||
% re.escape(var_name),
|
||||
get_element_by_attribute('itemprop', 'author', webpage) or '',
|
||||
'author link', default='')),
|
||||
paths[var_name][0])
|
||||
|
||||
@staticmethod
|
||||
def _yt_urljoin(url_or_path):
|
||||
return urljoin('https://www.youtube.com', url_or_path)
|
||||
|
||||
def _extract_uploader_id(self, uploader_url):
|
||||
return self._search_regex(
|
||||
r'/(?:(?:channel|user)/|(?=@))([^/?&#]+)', uploader_url or '',
|
||||
'uploader id', default=None)
|
||||
|
||||
|
||||
class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
IE_DESC = 'YouTube.com'
|
||||
|
@ -516,8 +573,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'ext': 'mp4',
|
||||
'title': 'youtube-dl test video "\'/\\ä↭𝕐',
|
||||
'uploader': 'Philipp Hagemeister',
|
||||
'uploader_id': 'phihag',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag',
|
||||
'uploader_id': '@PhilippHagemeister',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@PhilippHagemeister',
|
||||
'channel': 'Philipp Hagemeister',
|
||||
'channel_id': 'UCLqxVugv74EIW3VWh2NOa3Q',
|
||||
'channel_url': r're:https?://(?:www\.)?youtube\.com/channel/UCLqxVugv74EIW3VWh2NOa3Q',
|
||||
|
@ -557,8 +614,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'ext': 'mp4',
|
||||
'title': 'youtube-dl test video "\'/\\ä↭𝕐',
|
||||
'uploader': 'Philipp Hagemeister',
|
||||
'uploader_id': 'phihag',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag',
|
||||
'uploader_id': '@PhilippHagemeister',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@PhilippHagemeister',
|
||||
'upload_date': '20121002',
|
||||
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
|
||||
'categories': ['Science & Technology'],
|
||||
|
@ -588,7 +645,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'youtube_include_dash_manifest': True,
|
||||
'format': '141',
|
||||
},
|
||||
'skip': 'format 141 not served anymore',
|
||||
'skip': 'format 141 not served any more',
|
||||
},
|
||||
# DASH manifest with encrypted signature
|
||||
{
|
||||
|
@ -600,7 +657,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'description': 'md5:8f5e2b82460520b619ccac1f509d43bf',
|
||||
'duration': 244,
|
||||
'uploader': 'AfrojackVEVO',
|
||||
'uploader_id': 'AfrojackVEVO',
|
||||
'uploader_id': '@AfrojackVEVO',
|
||||
'upload_date': '20131011',
|
||||
'abr': 129.495,
|
||||
},
|
||||
|
@ -618,8 +675,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'duration': 219,
|
||||
'upload_date': '20100909',
|
||||
'uploader': 'Amazing Atheist',
|
||||
'uploader_id': 'TheAmazingAtheist',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist',
|
||||
'uploader_id': '@theamazingatheist',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@theamazingatheist',
|
||||
'title': 'Burning Everyone\'s Koran',
|
||||
'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms \r\n\r\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html',
|
||||
}
|
||||
|
@ -635,8 +692,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'description': r're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}',
|
||||
'duration': 142,
|
||||
'uploader': 'The Witcher',
|
||||
'uploader_id': 'WitcherGame',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
|
||||
'uploader_id': '@thewitcher',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@thewitcher',
|
||||
'upload_date': '20140605',
|
||||
'thumbnail': 'https://i.ytimg.com/vi/HtVdAasjOgU/maxresdefault.jpg',
|
||||
'age_limit': 18,
|
||||
|
@ -659,7 +716,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'description': 'md5:bf77e03fcae5529475e500129b05668a',
|
||||
'duration': 177,
|
||||
'uploader': 'FlyingKitty',
|
||||
'uploader_id': 'FlyingKitty900',
|
||||
'uploader_id': '@FlyingKitty900',
|
||||
'upload_date': '20200408',
|
||||
'thumbnail': 'https://i.ytimg.com/vi/HsUATh_Nc2U/maxresdefault.jpg',
|
||||
'age_limit': 18,
|
||||
|
@ -682,7 +739,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'description': 'md5:17eccca93a786d51bc67646756894066',
|
||||
'duration': 106,
|
||||
'uploader': 'Projekt Melody',
|
||||
'uploader_id': 'UC1yoRdFoFJaCY-AGfD9W0wQ',
|
||||
'uploader_id': '@ProjektMelody',
|
||||
'upload_date': '20191227',
|
||||
'age_limit': 18,
|
||||
'thumbnail': 'https://i.ytimg.com/vi/Tq92D6wQ1mg/sddefault.jpg',
|
||||
|
@ -704,10 +761,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'title': 'OOMPH! - Such Mich Find Mich (Lyrics)',
|
||||
'description': 'Fan Video. Music & Lyrics by OOMPH!.',
|
||||
'duration': 210,
|
||||
'uploader': 'Herr Lurik',
|
||||
'uploader_id': 'st3in234',
|
||||
'upload_date': '20130730',
|
||||
'uploader_url': 'http://www.youtube.com/user/st3in234',
|
||||
'uploader': 'Herr Lurik',
|
||||
'uploader_id': '@HerrLurik',
|
||||
'uploader_url': 'http://www.youtube.com/@HerrLurik',
|
||||
'age_limit': 0,
|
||||
'thumbnail': 'https://i.ytimg.com/vi/MeJVWBSsPAY/hqdefault.jpg',
|
||||
'tags': ['oomph', 'such mich find mich', 'lyrics', 'german industrial', 'musica industrial'],
|
||||
|
@ -740,8 +797,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'ext': 'mp4',
|
||||
'duration': 266,
|
||||
'upload_date': '20100430',
|
||||
'uploader_id': 'deadmau5',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5',
|
||||
'uploader_id': '@deadmau5',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@deadmau5',
|
||||
'creator': 'deadmau5',
|
||||
'description': 'md5:6cbcd3a92ce1bc676fc4d6ab4ace2336',
|
||||
'uploader': 'deadmau5',
|
||||
|
@ -762,8 +819,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'description': r're:(?s)(?:.+\s)?HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games\s*',
|
||||
'duration': 6085,
|
||||
'upload_date': '20150827',
|
||||
'uploader_id': 'olympic',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/olympic',
|
||||
'uploader_id': '@Olympics',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@Olympics',
|
||||
'uploader': r're:Olympics?',
|
||||
'age_limit': 0,
|
||||
'thumbnail': 'https://i.ytimg.com/vi/lqQg6PlCWgI/maxresdefault.jpg',
|
||||
|
@ -785,8 +842,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'stretched_ratio': 16 / 9.,
|
||||
'duration': 85,
|
||||
'upload_date': '20110310',
|
||||
'uploader_id': 'AllenMeow',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/AllenMeow',
|
||||
'uploader_id': '@AllenMeow',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@AllenMeow',
|
||||
'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯',
|
||||
'uploader': '孫ᄋᄅ',
|
||||
'title': '[A-made] 變態妍字幕版 太妍 我就是這樣的人',
|
||||
|
@ -824,7 +881,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'uploader': 'dorappi2000',
|
||||
'formats': 'mincount:31',
|
||||
},
|
||||
'skip': 'not actual anymore',
|
||||
'skip': 'not actual any more',
|
||||
},
|
||||
# DASH manifest with segment_list
|
||||
{
|
||||
|
@ -905,6 +962,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'Not multifeed any more',
|
||||
},
|
||||
{
|
||||
# Multifeed video with comma in title (see https://github.com/ytdl-org/youtube-dl/issues/8536)
|
||||
|
@ -914,7 +972,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'title': 'DevConf.cz 2016 Day 2 Workshops 1 14:00 - 15:30',
|
||||
},
|
||||
'playlist_count': 2,
|
||||
'skip': 'Not multifeed anymore',
|
||||
'skip': 'Not multifeed any more',
|
||||
},
|
||||
{
|
||||
'url': 'https://vid.plus/FlRa-iH7PGw',
|
||||
|
@ -938,8 +996,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a',
|
||||
'duration': 133,
|
||||
'upload_date': '20151119',
|
||||
'uploader_id': 'IronSoulElf',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IronSoulElf',
|
||||
'uploader_id': '@IronSoulElf',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@IronSoulElf',
|
||||
'uploader': 'IronSoulElf',
|
||||
'creator': r're:Todd Haberman[;,]\s+Daniel Law Heath and Aaron Kaplan',
|
||||
'track': 'Dark Walk',
|
||||
|
@ -987,8 +1045,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'description': 'md5:a677553cf0840649b731a3024aeff4cc',
|
||||
'duration': 721,
|
||||
'upload_date': '20150127',
|
||||
'uploader_id': 'BerkmanCenter',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter',
|
||||
'uploader_id': '@BKCHarvard',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@BKCHarvard',
|
||||
'uploader': 'The Berkman Klein Center for Internet & Society',
|
||||
'license': 'Creative Commons Attribution license (reuse allowed)',
|
||||
},
|
||||
|
@ -1007,8 +1065,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'duration': 4060,
|
||||
'upload_date': '20151119',
|
||||
'uploader': 'Bernie Sanders',
|
||||
'uploader_id': 'UCH1dpzjCEiGAt8CXkryhkZg',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCH1dpzjCEiGAt8CXkryhkZg',
|
||||
'uploader_id': '@BernieSanders',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@BernieSanders',
|
||||
'license': 'Creative Commons Attribution license (reuse allowed)',
|
||||
},
|
||||
'params': {
|
||||
|
@ -1054,8 +1112,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'duration': 2085,
|
||||
'upload_date': '20170118',
|
||||
'uploader': 'Vsauce',
|
||||
'uploader_id': 'Vsauce',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Vsauce',
|
||||
'uploader_id': '@Vsauce',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@Vsauce',
|
||||
'series': 'Mind Field',
|
||||
'season_number': 1,
|
||||
'episode_number': 1,
|
||||
|
@ -1134,7 +1192,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'skip_download': True,
|
||||
'youtube_include_dash_manifest': False,
|
||||
},
|
||||
'skip': 'not actual anymore',
|
||||
'skip': 'not actual any more',
|
||||
},
|
||||
{
|
||||
# Youtube Music Auto-generated description
|
||||
|
@ -1191,8 +1249,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'title': 'IMG 3456',
|
||||
'description': '',
|
||||
'upload_date': '20170613',
|
||||
'uploader_id': 'ElevageOrVert',
|
||||
'uploader': 'ElevageOrVert',
|
||||
'uploader_id': '@ElevageOrVert',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
|
@ -1210,8 +1268,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'title': 'Part 77 Sort a list of simple types in c#',
|
||||
'description': 'md5:b8746fa52e10cdbf47997903f13b20dc',
|
||||
'upload_date': '20130831',
|
||||
'uploader_id': 'kudvenkat',
|
||||
'uploader': 'kudvenkat',
|
||||
'uploader_id': '@Csharp-video-tutorialsBlogspot',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
|
@ -1263,8 +1321,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'description': 'md5:ea770e474b7cd6722b4c95b833c03630',
|
||||
'upload_date': '20201120',
|
||||
'uploader': 'Walk around Japan',
|
||||
'uploader_id': 'UC3o_t8PzBmXf5S9b7GLx1Mw',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UC3o_t8PzBmXf5S9b7GLx1Mw',
|
||||
'uploader_id': '@walkaroundjapan7124',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@walkaroundjapan7124',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
|
@ -1276,11 +1334,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'info_dict': {
|
||||
'id': '4L2J27mJ3Dc',
|
||||
'ext': 'mp4',
|
||||
'title': 'Midwest Squid Game #Shorts',
|
||||
'description': 'md5:976512b8a29269b93bbd8a61edc45a6d',
|
||||
'upload_date': '20211025',
|
||||
'uploader': 'Charlie Berens',
|
||||
'description': 'md5:976512b8a29269b93bbd8a61edc45a6d',
|
||||
'uploader_id': 'fivedlrmilkshake',
|
||||
'title': 'Midwest Squid Game #Shorts',
|
||||
'uploader_id': '@CharlieBerens',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
|
@ -1636,8 +1694,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
if n_response is None:
|
||||
# give up if descrambling failed
|
||||
break
|
||||
fmt['url'] = update_url(
|
||||
parsed_fmt_url, query_update={'n': [n_response]})
|
||||
for fmt_dct in traverse_obj(fmt, (None, (None, ('fragments', Ellipsis))), expected_type=dict):
|
||||
fmt_dct['url'] = update_url(
|
||||
fmt_dct['url'], query_update={'n': [n_response]})
|
||||
|
||||
# from yt-dlp, with tweaks
|
||||
def _extract_signature_timestamp(self, video_id, player_url, ytcfg=None, fatal=False):
|
||||
|
@ -1989,10 +2048,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
if no_video:
|
||||
dct['abr'] = tbr
|
||||
if no_audio or no_video:
|
||||
dct['downloader_options'] = {
|
||||
# Youtube throttles chunks >~10M
|
||||
'http_chunk_size': 10485760,
|
||||
}
|
||||
CHUNK_SIZE = 10 << 20
|
||||
# avoid Youtube throttling
|
||||
dct.update({
|
||||
'protocol': 'http_dash_segments',
|
||||
'fragments': [{
|
||||
'url': update_url_query(dct['url'], {
|
||||
'range': '{0}-{1}'.format(range_start, min(range_start + CHUNK_SIZE - 1, dct['filesize']))
|
||||
})
|
||||
} for range_start in range(0, dct['filesize'], CHUNK_SIZE)]
|
||||
} if dct['filesize'] else {
|
||||
'downloader_options': {'http_chunk_size': CHUNK_SIZE} # No longer useful?
|
||||
})
|
||||
|
||||
if dct.get('ext'):
|
||||
dct['container'] = dct['ext'] + '_dash'
|
||||
formats.append(dct)
|
||||
|
@ -2088,25 +2156,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
thumbnails = [{'url': thumbnail}]
|
||||
|
||||
category = microformat.get('category') or search_meta('genre')
|
||||
channel_id = video_details.get('channelId') \
|
||||
or microformat.get('externalChannelId') \
|
||||
or search_meta('channelId')
|
||||
channel_id = self._extract_channel_id(
|
||||
webpage, videodetails=video_details, metadata=microformat)
|
||||
duration = int_or_none(
|
||||
video_details.get('lengthSeconds')
|
||||
or microformat.get('lengthSeconds')) \
|
||||
or parse_duration(search_meta('duration'))
|
||||
is_live = video_details.get('isLive')
|
||||
|
||||
def gen_owner_profile_url():
|
||||
yield microformat.get('ownerProfileUrl')
|
||||
yield extract_attributes(self._search_regex(
|
||||
r'''(?s)(<link\b[^>]+\bitemprop\s*=\s*("|')url\2[^>]*>)''',
|
||||
get_element_by_attribute('itemprop', 'author', webpage),
|
||||
'owner_profile_url', default='')).get('href')
|
||||
owner_profile_url = self._yt_urljoin(self._extract_author_var(
|
||||
webpage, 'url', videodetails=video_details, metadata=microformat))
|
||||
|
||||
owner_profile_url = next(
|
||||
(x for x in map(url_or_none, gen_owner_profile_url()) if x),
|
||||
None)
|
||||
uploader = self._extract_author_var(
|
||||
webpage, 'name', videodetails=video_details, metadata=microformat)
|
||||
|
||||
if not player_url:
|
||||
player_url = self._extract_player_url(webpage)
|
||||
|
@ -2121,11 +2183,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
'upload_date': unified_strdate(
|
||||
microformat.get('uploadDate')
|
||||
or search_meta('uploadDate')),
|
||||
'uploader': video_details['author'],
|
||||
'uploader_id': self._search_regex(r'/(?:channel|user)/([^/?&#]+)', owner_profile_url, 'uploader id') if owner_profile_url else None,
|
||||
'uploader_url': owner_profile_url,
|
||||
'uploader': uploader,
|
||||
'channel_id': channel_id,
|
||||
'channel_url': 'https://www.youtube.com/channel/' + channel_id if channel_id else None,
|
||||
'duration': duration,
|
||||
'view_count': int_or_none(
|
||||
video_details.get('viewCount')
|
||||
|
@ -2255,6 +2314,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
initial_data,
|
||||
lambda x: x['contents']['twoColumnWatchNextResults']['results']['results']['contents'],
|
||||
list) or []
|
||||
if not info['channel_id']:
|
||||
channel_id = self._extract_channel_id('', renderers=contents)
|
||||
if not info['uploader']:
|
||||
info['uploader'] = self._extract_author_var('', 'name', renderers=contents)
|
||||
if not owner_profile_url:
|
||||
owner_profile_url = self._yt_urljoin(self._extract_author_var('', 'url', renderers=contents))
|
||||
|
||||
for content in contents:
|
||||
vpir = content.get('videoPrimaryInfoRenderer')
|
||||
if vpir:
|
||||
|
@ -2302,10 +2368,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
})
|
||||
vsir = content.get('videoSecondaryInfoRenderer')
|
||||
if vsir:
|
||||
info['channel'] = get_text(try_get(
|
||||
vsir,
|
||||
lambda x: x['owner']['videoOwnerRenderer']['title'],
|
||||
dict))
|
||||
rows = try_get(
|
||||
vsir,
|
||||
lambda x: x['metadataRowContainer']['metadataRowContainerRenderer']['rows'],
|
||||
|
@ -2363,7 +2425,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
|
||||
self.mark_watched(video_id, player_response)
|
||||
|
||||
return info
|
||||
return merge_dicts(
|
||||
info, {
|
||||
'uploader_id': self._extract_uploader_id(owner_profile_url),
|
||||
'uploader_url': owner_profile_url,
|
||||
'channel_id': channel_id,
|
||||
'channel_url': channel_id and self._yt_urljoin('/channel/' + channel_id),
|
||||
'channel': info['uploader'],
|
||||
})
|
||||
|
||||
|
||||
class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
||||
|
@ -2392,6 +2461,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'description': 'Short clips from Super Cooper Sundays!',
|
||||
'id': 'UCKMA8kHZ8bPYpnMNaUSxfEQ',
|
||||
'title': 'Super Cooper Shorts - Shorts',
|
||||
'uploader': 'Super Cooper Shorts',
|
||||
'uploader_id': '@SuperCooperShorts',
|
||||
}
|
||||
}, {
|
||||
# Channel that does not have a Shorts tab. Test should just download videos on Home tab instead
|
||||
|
@ -2402,14 +2473,17 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'title': 'Emergency Awesome - Home',
|
||||
},
|
||||
'playlist_mincount': 5,
|
||||
'skip': 'new test page needed to replace `Emergency Awesome - Shorts`',
|
||||
}, {
|
||||
# playlists, multipage
|
||||
'url': 'https://www.youtube.com/c/ИгорьКлейнер/playlists?view=1&flow=grid',
|
||||
'playlist_mincount': 94,
|
||||
'info_dict': {
|
||||
'id': 'UCqj7Cz7revf5maW9g5pgNcg',
|
||||
'title': 'Игорь Клейнер - Playlists',
|
||||
'title': 'Igor Kleiner - Playlists',
|
||||
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
|
||||
'uploader': 'Igor Kleiner',
|
||||
'uploader_id': '@IgorDataScience',
|
||||
},
|
||||
}, {
|
||||
# playlists, multipage, different order
|
||||
|
@ -2417,8 +2491,10 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'playlist_mincount': 94,
|
||||
'info_dict': {
|
||||
'id': 'UCqj7Cz7revf5maW9g5pgNcg',
|
||||
'title': 'Игорь Клейнер - Playlists',
|
||||
'title': 'Igor Kleiner - Playlists',
|
||||
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
|
||||
'uploader': 'Igor Kleiner',
|
||||
'uploader_id': '@IgorDataScience',
|
||||
},
|
||||
}, {
|
||||
# playlists, series
|
||||
|
@ -2428,6 +2504,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'id': 'UCYO_jab_esuFRV4b17AJtAw',
|
||||
'title': '3Blue1Brown - Playlists',
|
||||
'description': 'md5:e1384e8a133307dd10edee76e875d62f',
|
||||
'uploader': '3Blue1Brown',
|
||||
'uploader_id': '@3blue1brown',
|
||||
},
|
||||
}, {
|
||||
# playlists, singlepage
|
||||
|
@ -2437,6 +2515,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'id': 'UCAEtajcuhQ6an9WEzY9LEMQ',
|
||||
'title': 'ThirstForScience - Playlists',
|
||||
'description': 'md5:609399d937ea957b0f53cbffb747a14c',
|
||||
'uploader': 'ThirstForScience',
|
||||
'uploader_id': '@ThirstForScience',
|
||||
}
|
||||
}, {
|
||||
'url': 'https://www.youtube.com/c/ChristophLaimer/playlists',
|
||||
|
@ -2445,20 +2525,22 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
# basic, single video playlist
|
||||
'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc',
|
||||
'info_dict': {
|
||||
'uploader_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
|
||||
'uploader': 'Sergey M.',
|
||||
'id': 'PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc',
|
||||
'title': 'youtube-dl public playlist',
|
||||
'uploader': 'Sergey M.',
|
||||
'uploader_id': '@sergeym.6173',
|
||||
'channel_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
|
||||
},
|
||||
'playlist_count': 1,
|
||||
}, {
|
||||
# empty playlist
|
||||
'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf',
|
||||
'info_dict': {
|
||||
'uploader_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
|
||||
'uploader': 'Sergey M.',
|
||||
'id': 'PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf',
|
||||
'title': 'youtube-dl empty playlist',
|
||||
'uploader': 'Sergey M.',
|
||||
'uploader_id': '@sergeym.6173',
|
||||
'channel_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
|
||||
},
|
||||
'playlist_count': 0,
|
||||
}, {
|
||||
|
@ -2468,6 +2550,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
|
||||
'title': 'lex will - Home',
|
||||
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
|
||||
'uploader': 'lex will',
|
||||
'uploader_id': '@lexwill718',
|
||||
},
|
||||
'playlist_mincount': 2,
|
||||
}, {
|
||||
|
@ -2477,6 +2561,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
|
||||
'title': 'lex will - Videos',
|
||||
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
|
||||
'uploader': 'lex will',
|
||||
'uploader_id': '@lexwill718',
|
||||
},
|
||||
'playlist_mincount': 975,
|
||||
}, {
|
||||
|
@ -2486,6 +2572,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
|
||||
'title': 'lex will - Videos',
|
||||
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
|
||||
'uploader': 'lex will',
|
||||
'uploader_id': '@lexwill718',
|
||||
},
|
||||
'playlist_mincount': 199,
|
||||
}, {
|
||||
|
@ -2495,6 +2583,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
|
||||
'title': 'lex will - Playlists',
|
||||
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
|
||||
'uploader': 'lex will',
|
||||
'uploader_id': '@lexwill718',
|
||||
},
|
||||
'playlist_mincount': 17,
|
||||
}, {
|
||||
|
@ -2504,6 +2594,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
|
||||
'title': 'lex will - Community',
|
||||
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
|
||||
'uploader': 'lex will',
|
||||
'uploader_id': '@lexwill718',
|
||||
},
|
||||
'playlist_mincount': 18,
|
||||
}, {
|
||||
|
@ -2513,8 +2605,10 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
|
||||
'title': 'lex will - Channels',
|
||||
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
|
||||
'uploader': 'lex will',
|
||||
'uploader_id': '@lexwill718',
|
||||
},
|
||||
'playlist_mincount': 138,
|
||||
'playlist_mincount': 75,
|
||||
}, {
|
||||
'url': 'https://invidio.us/channel/UCmlqkdCBesrv2Lak1mF_MxA',
|
||||
'only_matching': True,
|
||||
|
@ -2531,7 +2625,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'title': '29C3: Not my department',
|
||||
'id': 'PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
|
||||
'uploader': 'Christiaan008',
|
||||
'uploader_id': 'UCEPzS1rYsrkqzSLNp76nrcg',
|
||||
'uploader_id': '@ChRiStIaAn008',
|
||||
'channel_id': 'UCEPzS1rYsrkqzSLNp76nrcg',
|
||||
},
|
||||
'playlist_count': 96,
|
||||
}, {
|
||||
|
@ -2541,7 +2636,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'title': 'Uploads from Cauchemar',
|
||||
'id': 'UUBABnxM4Ar9ten8Mdjj1j0Q',
|
||||
'uploader': 'Cauchemar',
|
||||
'uploader_id': 'UCBABnxM4Ar9ten8Mdjj1j0Q',
|
||||
'uploader_id': '@Cauchemar89',
|
||||
'channel_id': 'UCBABnxM4Ar9ten8Mdjj1j0Q',
|
||||
},
|
||||
'playlist_mincount': 1123,
|
||||
}, {
|
||||
|
@ -2555,7 +2651,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'title': 'Uploads from Interstellar Movie',
|
||||
'id': 'UUXw-G3eDE9trcvY2sBMM_aA',
|
||||
'uploader': 'Interstellar Movie',
|
||||
'uploader_id': 'UCXw-G3eDE9trcvY2sBMM_aA',
|
||||
'uploader_id': '@InterstellarMovie',
|
||||
'channel_id': 'UCXw-G3eDE9trcvY2sBMM_aA',
|
||||
},
|
||||
'playlist_mincount': 21,
|
||||
}, {
|
||||
|
@ -2564,8 +2661,9 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
'info_dict': {
|
||||
'title': 'Data Analysis with Dr Mike Pound',
|
||||
'id': 'PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba',
|
||||
'uploader_id': 'UC9-y-6csu5WGm29I7JiwpnA',
|
||||
'uploader': 'Computerphile',
|
||||
'uploader_id': '@Computerphile',
|
||||
'channel_id': 'UC9-y-6csu5WGm29I7JiwpnA',
|
||||
},
|
||||
'playlist_mincount': 11,
|
||||
}, {
|
||||
|
@ -2603,14 +2701,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
}, {
|
||||
'url': 'https://www.youtube.com/channel/UCoMdktPbSTixAyNGwb-UYkQ/live',
|
||||
'info_dict': {
|
||||
'id': '9Auq9mYxFEE',
|
||||
'id': r're:[\da-zA-Z_-]{8,}',
|
||||
'ext': 'mp4',
|
||||
'title': 'Watch Sky News live',
|
||||
'title': r're:(?s)[A-Z].{20,}',
|
||||
'uploader': 'Sky News',
|
||||
'uploader_id': 'skynews',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/skynews',
|
||||
'upload_date': '20191102',
|
||||
'description': 'md5:78de4e1c2359d0ea3ed829678e38b662',
|
||||
'uploader_id': '@SkyNews',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@SkyNews',
|
||||
'upload_date': r're:\d{8}',
|
||||
'description': r're:(?s)(?:.*\n)+SUBSCRIBE to our YouTube channel for more videos: http://www\.youtube\.com/skynews *\n.*',
|
||||
'categories': ['News & Politics'],
|
||||
'tags': list,
|
||||
'like_count': int,
|
||||
|
@ -2699,34 +2797,22 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
}, {
|
||||
'note': 'Search tab',
|
||||
'url': 'https://www.youtube.com/c/3blue1brown/search?query=linear%20algebra',
|
||||
'playlist_mincount': 40,
|
||||
'playlist_mincount': 20,
|
||||
'info_dict': {
|
||||
'id': 'UCYO_jab_esuFRV4b17AJtAw',
|
||||
'title': '3Blue1Brown - Search - linear algebra',
|
||||
'description': 'md5:e1384e8a133307dd10edee76e875d62f',
|
||||
'uploader': '3Blue1Brown',
|
||||
'uploader_id': 'UCYO_jab_esuFRV4b17AJtAw',
|
||||
'uploader_id': '@3blue1brown',
|
||||
'channel_id': 'UCYO_jab_esuFRV4b17AJtAw',
|
||||
}
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
def suitable(cls, url):
|
||||
return False if YoutubeIE.suitable(url) else super(
|
||||
return not YoutubeIE.suitable(url) and super(
|
||||
YoutubeTabIE, cls).suitable(url)
|
||||
|
||||
def _extract_channel_id(self, webpage):
|
||||
channel_id = self._html_search_meta(
|
||||
'channelId', webpage, 'channel id', default=None)
|
||||
if channel_id:
|
||||
return channel_id
|
||||
channel_url = self._html_search_meta(
|
||||
('og:url', 'al:ios:url', 'al:android:url', 'al:web:url',
|
||||
'twitter:url', 'twitter:app:url:iphone', 'twitter:app:url:ipad',
|
||||
'twitter:app:url:googleplay'), webpage, 'channel url')
|
||||
return self._search_regex(
|
||||
r'https?://(?:www\.)?youtube\.com/channel/([^/?#&])+',
|
||||
channel_url, 'channel id')
|
||||
|
||||
@staticmethod
|
||||
def _extract_grid_item_renderer(item):
|
||||
assert isinstance(item, dict)
|
||||
|
@ -3114,27 +3200,18 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
else:
|
||||
raise ExtractorError('Unable to find selected tab')
|
||||
|
||||
@staticmethod
|
||||
def _extract_uploader(data):
|
||||
def _extract_uploader(self, metadata, data):
|
||||
uploader = {}
|
||||
sidebar_renderer = try_get(
|
||||
data, lambda x: x['sidebar']['playlistSidebarRenderer']['items'], list)
|
||||
if sidebar_renderer:
|
||||
for item in sidebar_renderer:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
renderer = item.get('playlistSidebarSecondaryInfoRenderer')
|
||||
if not isinstance(renderer, dict):
|
||||
continue
|
||||
owner = try_get(
|
||||
renderer, lambda x: x['videoOwner']['videoOwnerRenderer']['title']['runs'][0], dict)
|
||||
if owner:
|
||||
uploader['uploader'] = owner.get('text')
|
||||
uploader['uploader_id'] = try_get(
|
||||
owner, lambda x: x['navigationEndpoint']['browseEndpoint']['browseId'], compat_str)
|
||||
uploader['uploader_url'] = urljoin(
|
||||
'https://www.youtube.com/',
|
||||
try_get(owner, lambda x: x['navigationEndpoint']['browseEndpoint']['canonicalBaseUrl'], compat_str))
|
||||
renderers = traverse_obj(data,
|
||||
('sidebar', 'playlistSidebarRenderer', 'items'))
|
||||
uploader['channel_id'] = self._extract_channel_id('', metadata=metadata, renderers=renderers)
|
||||
uploader['uploader'] = (
|
||||
self._extract_author_var('', 'name', renderers=renderers)
|
||||
or self._extract_author_var('', 'name', metadata=metadata))
|
||||
uploader['uploader_url'] = self._yt_urljoin(
|
||||
self._extract_author_var('', 'url', metadata=metadata, renderers=renderers))
|
||||
uploader['uploader_id'] = self._extract_uploader_id(uploader['uploader_url'])
|
||||
uploader['channel'] = uploader['uploader']
|
||||
return uploader
|
||||
|
||||
@staticmethod
|
||||
|
@ -3185,8 +3262,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
self._entries(selected_tab, item_id, webpage),
|
||||
playlist_id=playlist_id, playlist_title=title,
|
||||
playlist_description=description)
|
||||
playlist.update(self._extract_uploader(data))
|
||||
return playlist
|
||||
return merge_dicts(playlist, self._extract_uploader(renderer, data))
|
||||
|
||||
def _extract_from_playlist(self, item_id, url, data, playlist):
|
||||
title = playlist.get('title') or try_get(
|
||||
|
@ -3273,8 +3349,9 @@ class YoutubePlaylistIE(InfoExtractor):
|
|||
'info_dict': {
|
||||
'title': '[OLD]Team Fortress 2 (Class-based LP)',
|
||||
'id': 'PLBB231211A4F62143',
|
||||
'uploader': 'Wickydoo',
|
||||
'uploader_id': 'UCKSpbfbl5kRQpTdL7kMc-1Q',
|
||||
'uploader': 'Wickman',
|
||||
'uploader_id': '@WickmanVT',
|
||||
'channel_id': 'UCKSpbfbl5kRQpTdL7kMc-1Q',
|
||||
},
|
||||
'playlist_mincount': 29,
|
||||
}, {
|
||||
|
@ -3288,21 +3365,25 @@ class YoutubePlaylistIE(InfoExtractor):
|
|||
}, {
|
||||
'note': 'embedded',
|
||||
'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
|
||||
'playlist_count': 4,
|
||||
# TODO: full playlist requires _reload_with_unavailable_videos()
|
||||
# 'playlist_count': 4,
|
||||
'playlist_mincount': 1,
|
||||
'info_dict': {
|
||||
'title': 'JODA15',
|
||||
'id': 'PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
|
||||
'uploader': 'milan',
|
||||
'uploader_id': 'UCEI1-PVPcYXjB73Hfelbmaw',
|
||||
'uploader_id': '@milan5503',
|
||||
'channel_id': 'UCEI1-PVPcYXjB73Hfelbmaw',
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.youtube.com/embed/_xDOZElKyNU?list=PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
|
||||
'playlist_mincount': 982,
|
||||
'playlist_mincount': 455,
|
||||
'info_dict': {
|
||||
'title': '2018 Chinese New Singles (11/6 updated)',
|
||||
'id': 'PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
|
||||
'uploader': 'LBK',
|
||||
'uploader_id': 'UC21nz3_MesPLqtDqwdvnoxA',
|
||||
'uploader_id': '@music_king',
|
||||
'channel_id': 'UC21nz3_MesPLqtDqwdvnoxA',
|
||||
}
|
||||
}, {
|
||||
'url': 'TLGGrESM50VT6acwMjAyMjAxNw',
|
||||
|
@ -3340,8 +3421,8 @@ class YoutubeYtBeIE(InfoExtractor):
|
|||
'ext': 'mp4',
|
||||
'title': 'Small Scale Baler and Braiding Rugs',
|
||||
'uploader': 'Backus-Page House Museum',
|
||||
'uploader_id': 'backuspagemuseum',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
|
||||
'uploader_id': '@backuspagemuseum',
|
||||
'uploader_url': r're:https?://(?:www\.)?youtube\.com/@backuspagemuseum',
|
||||
'upload_date': '20161008',
|
||||
'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
|
||||
'categories': ['Nonprofits & Activism'],
|
||||
|
|
|
@ -12,9 +12,11 @@ from .utils import (
|
|||
js_to_json,
|
||||
remove_quotes,
|
||||
unified_timestamp,
|
||||
variadic,
|
||||
)
|
||||
from .compat import (
|
||||
compat_basestring,
|
||||
compat_chr,
|
||||
compat_collections_chain_map as ChainMap,
|
||||
compat_itertools_zip_longest as zip_longest,
|
||||
compat_str,
|
||||
|
@ -201,14 +203,14 @@ class JSInterpreter(object):
|
|||
def __init__(self, msg, *args, **kwargs):
|
||||
expr = kwargs.pop('expr', None)
|
||||
if expr is not None:
|
||||
msg = '{0} in: {1!r}'.format(msg.rstrip(), expr[:100])
|
||||
msg = '{0} in: {1!r:.100}'.format(msg.rstrip(), expr)
|
||||
super(JSInterpreter.Exception, self).__init__(msg, *args, **kwargs)
|
||||
|
||||
class JS_RegExp(object):
|
||||
_RE_FLAGS = {
|
||||
RE_FLAGS = {
|
||||
# special knowledge: Python's re flags are bitmask values, current max 128
|
||||
# invent new bitmask values well above that for literal parsing
|
||||
# TODO: new pattern class to execute matches with these flags
|
||||
# TODO: execute matches with these flags (remaining: d, y)
|
||||
'd': 1024, # Generate indices for substring matches
|
||||
'g': 2048, # Global search
|
||||
'i': re.I, # Case-insensitive search
|
||||
|
@ -218,12 +220,19 @@ class JSInterpreter(object):
|
|||
'y': 4096, # Perform a "sticky" search that matches starting at the current position in the target string
|
||||
}
|
||||
|
||||
def __init__(self, pattern_txt, flags=''):
|
||||
def __init__(self, pattern_txt, flags=0):
|
||||
if isinstance(flags, compat_str):
|
||||
flags, _ = self.regex_flags(flags)
|
||||
# Thx: https://stackoverflow.com/questions/44773522/setattr-on-python2-sre-sre-pattern
|
||||
# First, avoid https://github.com/python/cpython/issues/74534
|
||||
self.__self = re.compile(pattern_txt.replace('[[', r'[\['), flags)
|
||||
self.__self = None
|
||||
self.__pattern_txt = pattern_txt.replace('[[', r'[\[')
|
||||
self.__flags = flags
|
||||
|
||||
def __instantiate(self):
|
||||
if self.__self:
|
||||
return
|
||||
self.__self = re.compile(self.__pattern_txt, self.__flags)
|
||||
# Thx: https://stackoverflow.com/questions/44773522/setattr-on-python2-sre-sre-pattern
|
||||
for name in dir(self.__self):
|
||||
# Only these? Obviously __class__, __init__.
|
||||
# PyPy creates a __weakref__ attribute with value None
|
||||
|
@ -232,15 +241,21 @@ class JSInterpreter(object):
|
|||
continue
|
||||
setattr(self, name, getattr(self.__self, name))
|
||||
|
||||
def __getattr__(self, name):
|
||||
self.__instantiate()
|
||||
if hasattr(self, name):
|
||||
return getattr(self, name)
|
||||
return super(JSInterpreter.JS_RegExp, self).__getattr__(name)
|
||||
|
||||
@classmethod
|
||||
def regex_flags(cls, expr):
|
||||
flags = 0
|
||||
if not expr:
|
||||
return flags, expr
|
||||
for idx, ch in enumerate(expr):
|
||||
if ch not in cls._RE_FLAGS:
|
||||
if ch not in cls.RE_FLAGS:
|
||||
break
|
||||
flags |= cls._RE_FLAGS[ch]
|
||||
flags |= cls.RE_FLAGS[ch]
|
||||
return flags, expr[idx + 1:]
|
||||
|
||||
@classmethod
|
||||
|
@ -262,20 +277,20 @@ class JSInterpreter(object):
|
|||
if not expr:
|
||||
return
|
||||
# collections.Counter() is ~10% slower in both 2.7 and 3.9
|
||||
counters = {k: 0 for k in _MATCHING_PARENS.values()}
|
||||
counters = dict((k, 0) for k in _MATCHING_PARENS.values())
|
||||
start, splits, pos, delim_len = 0, 0, 0, len(delim) - 1
|
||||
in_quote, escaping, skipping = None, False, 0
|
||||
after_op, in_regex_char_group, skip_re = True, False, 0
|
||||
after_op, in_regex_char_group = True, False
|
||||
|
||||
for idx, char in enumerate(expr):
|
||||
if skip_re > 0:
|
||||
skip_re -= 1
|
||||
continue
|
||||
paren_delta = 0
|
||||
if not in_quote:
|
||||
if char in _MATCHING_PARENS:
|
||||
counters[_MATCHING_PARENS[char]] += 1
|
||||
paren_delta = 1
|
||||
elif char in counters:
|
||||
counters[char] -= 1
|
||||
paren_delta = -1
|
||||
if not escaping:
|
||||
if char in _QUOTES and in_quote in (char, None):
|
||||
if in_quote or after_op or char != '/':
|
||||
|
@ -283,7 +298,7 @@ class JSInterpreter(object):
|
|||
elif in_quote == '/' and char in '[]':
|
||||
in_regex_char_group = char == '['
|
||||
escaping = not escaping and in_quote and char == '\\'
|
||||
after_op = not in_quote and (char in cls.OP_CHARS or (char.isspace() and after_op))
|
||||
after_op = not in_quote and (char in cls.OP_CHARS or paren_delta > 0 or (after_op and char.isspace()))
|
||||
|
||||
if char != delim[pos] or any(counters.values()) or in_quote:
|
||||
pos = skipping = 0
|
||||
|
@ -293,7 +308,7 @@ class JSInterpreter(object):
|
|||
continue
|
||||
elif pos == 0 and skip_delims:
|
||||
here = expr[idx:]
|
||||
for s in skip_delims if isinstance(skip_delims, (list, tuple)) else [skip_delims]:
|
||||
for s in variadic(skip_delims):
|
||||
if here.startswith(s) and s:
|
||||
skipping = len(s) - 1
|
||||
break
|
||||
|
@ -316,7 +331,7 @@ class JSInterpreter(object):
|
|||
separated = list(cls._separate(expr, delim, 1))
|
||||
|
||||
if len(separated) < 2:
|
||||
raise cls.Exception('No terminating paren {delim} in {expr}'.format(**locals()))
|
||||
raise cls.Exception('No terminating paren {delim} in {expr!r:.5500}'.format(**locals()))
|
||||
return separated[0][1:].strip(), separated[1].strip()
|
||||
|
||||
@staticmethod
|
||||
|
@ -361,6 +376,20 @@ class JSInterpreter(object):
|
|||
except TypeError:
|
||||
return self._named_object(namespace, obj)
|
||||
|
||||
# used below
|
||||
_VAR_RET_THROW_RE = re.compile(r'''(?x)
|
||||
(?P<var>(?:var|const|let)\s)|return(?:\s+|(?=["'])|$)|(?P<throw>throw\s+)
|
||||
''')
|
||||
_COMPOUND_RE = re.compile(r'''(?x)
|
||||
(?P<try>try)\s*\{|
|
||||
(?P<if>if)\s*\(|
|
||||
(?P<switch>switch)\s*\(|
|
||||
(?P<for>for)\s*\(|
|
||||
(?P<while>while)\s*\(
|
||||
''')
|
||||
_FINALLY_RE = re.compile(r'finally\s*\{')
|
||||
_SWITCH_RE = re.compile(r'switch\s*\(')
|
||||
|
||||
def interpret_statement(self, stmt, local_vars, allow_recursion=100):
|
||||
if allow_recursion < 0:
|
||||
raise self.Exception('Recursion limit reached')
|
||||
|
@ -375,7 +404,7 @@ class JSInterpreter(object):
|
|||
if should_return:
|
||||
return ret, should_return
|
||||
|
||||
m = re.match(r'(?P<var>(?:var|const|let)\s)|return(?:\s+|(?=["\'])|$)|(?P<throw>throw\s+)', stmt)
|
||||
m = self._VAR_RET_THROW_RE.match(stmt)
|
||||
if m:
|
||||
expr = stmt[len(m.group(0)):].strip()
|
||||
if m.group('throw'):
|
||||
|
@ -405,7 +434,7 @@ class JSInterpreter(object):
|
|||
left, right = self._separate_at_paren(obj[len(klass):])
|
||||
argvals = self.interpret_iter(left, local_vars, allow_recursion)
|
||||
expr = konstr(*argvals)
|
||||
if not expr:
|
||||
if expr is None:
|
||||
raise self.Exception('Failed to parse {klass} {left!r:.100}'.format(**locals()), expr=expr)
|
||||
expr = self._dump(expr, local_vars) + right
|
||||
break
|
||||
|
@ -447,13 +476,7 @@ class JSInterpreter(object):
|
|||
for item in self._separate(inner)])
|
||||
expr = name + outer
|
||||
|
||||
m = re.match(r'''(?x)
|
||||
(?P<try>try)\s*\{|
|
||||
(?P<if>if)\s*\(|
|
||||
(?P<switch>switch)\s*\(|
|
||||
(?P<for>for)\s*\(|
|
||||
(?P<while>while)\s*\(
|
||||
''', expr)
|
||||
m = self._COMPOUND_RE.match(expr)
|
||||
md = m.groupdict() if m else {}
|
||||
if md.get('if'):
|
||||
cndn, expr = self._separate_at_paren(expr[m.end() - 1:])
|
||||
|
@ -512,7 +535,7 @@ class JSInterpreter(object):
|
|||
err = None
|
||||
pending = self.interpret_statement(sub_expr, catch_vars, allow_recursion)
|
||||
|
||||
m = re.match(r'finally\s*\{', expr)
|
||||
m = self._FINALLY_RE.match(expr)
|
||||
if m:
|
||||
sub_expr, expr = self._separate_at_paren(expr[m.end() - 1:])
|
||||
ret, should_abort = self.interpret_statement(sub_expr, local_vars, allow_recursion)
|
||||
|
@ -531,7 +554,7 @@ class JSInterpreter(object):
|
|||
if remaining.startswith('{'):
|
||||
body, expr = self._separate_at_paren(remaining)
|
||||
else:
|
||||
switch_m = re.match(r'switch\s*\(', remaining) # FIXME
|
||||
switch_m = self._SWITCH_RE.match(remaining) # FIXME
|
||||
if switch_m:
|
||||
switch_val, remaining = self._separate_at_paren(remaining[switch_m.end() - 1:])
|
||||
body, expr = self._separate_at_paren(remaining, '}')
|
||||
|
@ -699,7 +722,7 @@ class JSInterpreter(object):
|
|||
""" assert, but without risk of getting optimized out """
|
||||
if not cndn:
|
||||
memb = member
|
||||
raise self.Exception('{member} {msg}'.format(**locals()), expr=expr)
|
||||
raise self.Exception('{memb} {msg}'.format(**locals()), expr=expr)
|
||||
|
||||
def eval_method():
|
||||
if (variable, member) == ('console', 'debug'):
|
||||
|
@ -735,7 +758,7 @@ class JSInterpreter(object):
|
|||
if obj == compat_str:
|
||||
if member == 'fromCharCode':
|
||||
assertion(argvals, 'takes one or more arguments')
|
||||
return ''.join(map(chr, argvals))
|
||||
return ''.join(map(compat_chr, argvals))
|
||||
raise self.Exception('Unsupported string method ' + member, expr=expr)
|
||||
elif obj == float:
|
||||
if member == 'pow':
|
||||
|
@ -808,10 +831,17 @@ class JSInterpreter(object):
|
|||
if idx >= len(obj):
|
||||
return None
|
||||
return ord(obj[idx])
|
||||
elif member == 'replace':
|
||||
elif member in ('replace', 'replaceAll'):
|
||||
assertion(isinstance(obj, compat_str), 'must be applied on a string')
|
||||
assertion(len(argvals) == 2, 'takes exactly two arguments')
|
||||
return re.sub(argvals[0], argvals[1], obj)
|
||||
# TODO: argvals[1] callable, other Py vs JS edge cases
|
||||
if isinstance(argvals[0], self.JS_RegExp):
|
||||
count = 0 if argvals[0].flags & self.JS_RegExp.RE_FLAGS['g'] else 1
|
||||
assertion(member != 'replaceAll' or count == 0,
|
||||
'replaceAll must be called with a global RegExp')
|
||||
return argvals[0].sub(argvals[1], obj, count=count)
|
||||
count = ('replaceAll', 'replace').index(member)
|
||||
return re.sub(re.escape(argvals[0]), argvals[1], obj, count=count)
|
||||
|
||||
idx = int(member) if isinstance(obj, list) else member
|
||||
return obj[idx](argvals, allow_recursion=allow_recursion)
|
||||
|
|
|
@ -2176,11 +2176,11 @@ def sanitize_url(url):
|
|||
for mistake, fixup in COMMON_TYPOS:
|
||||
if re.match(mistake, url):
|
||||
return re.sub(mistake, fixup, url)
|
||||
return escape_url(url)
|
||||
return url
|
||||
|
||||
|
||||
def sanitized_Request(url, *args, **kwargs):
|
||||
return compat_urllib_request.Request(sanitize_url(url), *args, **kwargs)
|
||||
return compat_urllib_request.Request(escape_url(sanitize_url(url)), *args, **kwargs)
|
||||
|
||||
|
||||
def expand_path(s):
|
||||
|
|
Loading…
Reference in a new issue