1
0
Fork 0
mirror of https://github.com/ytdl-org/youtube-dl.git synced 2024-11-25 03:32:05 +00:00

Merge branch 'master' into oreilly-login

This commit is contained in:
dirkf 2023-03-31 15:50:47 +01:00 committed by GitHub
commit 899af2ef61
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
21 changed files with 812 additions and 346 deletions

View file

@ -918,7 +918,7 @@ Either prepend `https://www.youtube.com/watch?v=` or separate the ID from the op
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [Get cookies.txt](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox). In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [Get cookies.txt LOCALLY](https://chrome.google.com/webstore/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
@ -1408,7 +1408,11 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
# BUGS # BUGS
Bugs and suggestions should be reported at: <https://github.com/ytdl-org/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](https://webchat.freenode.net/?randomnick=1&channels=youtube-dl)). Bugs and suggestions should be reported in the issue tracker: <https://github.com/ytdl-org/youtube-dl/issues> (<https://yt-dl.org/bug> is an alias for this). Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](https://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
## Opening a bug report or suggestion
Be sure to follow instructions provided **below** and **in the issue tracker**. Complete the appropriate issue template fully. Consider whether your problem is covered by an existing issue: if so, follow the discussion there. Avoid commenting on existing duplicate issues as such comments do not add to the discussion of the issue and are liable to be treated as spam.
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this: **Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
``` ```
@ -1428,17 +1432,17 @@ $ youtube-dl -v <your command line>
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever. The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
Please re-read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist): Finally please review your issue to avoid various common mistakes (you can and should use this as a checklist) listed below.
### Is the description of the issue itself sufficient? ### Is the description of the issue itself sufficient?
We often get issue reports that we cannot really decipher. While in most cases we eventually get the required information after asking back multiple times, this poses an unnecessary drain on our resources. Many contributors, including myself, are also not native speakers, so we may misread some parts. We often get issue reports that are hard to understand. To avoid subsequent clarifications, and to assist participants who are not native English speakers, please elaborate on what feature you are requesting, or what bug you want to be fixed.
So please elaborate on what feature you are requesting, or what bug you want to be fixed. Make sure that it's obvious Make sure that it's obvious
- What the problem is - What the problem is
- How it could be fixed - How it could be fixed
- How your proposed solution would look like - How your proposed solution would look
If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over. If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
@ -1448,14 +1452,14 @@ If your server has multiple IPs or you suspect censorship, adding `--call-home`
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `https://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `https://www.youtube.com/`) is *not* an example URL. **Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `https://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `https://www.youtube.com/`) is *not* an example URL.
### Is the issue already documented?
Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/ytdl-org/youtube-dl/search?type=Issues) of this repository. Initially, at least, use the search term `-label:duplicate` to focus on active issues. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
### Are you using the latest version? ### Are you using the latest version?
Before reporting any issue, type `youtube-dl -U`. This should report that you're up-to-date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well. Before reporting any issue, type `youtube-dl -U`. This should report that you're up-to-date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
### Is the issue already documented?
Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/ytdl-org/youtube-dl/search?type=Issues) of this repository. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
### Why are existing options not enough? ### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem. Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.

64
devscripts/cli_to_api.py Executable file
View file

@ -0,0 +1,64 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
"""
This script displays the API parameters corresponding to a yt-dl command line
Example:
$ ./cli_to_api.py -f best
{u'format': 'best'}
$
"""
# Allow direct execution
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import youtube_dl
from types import MethodType
def cli_to_api(*opts):
YDL = youtube_dl.YoutubeDL
# to extract the parsed options, break out of YoutubeDL instantiation
# return options via this Exception
class ParseYTDLResult(Exception):
def __init__(self, result):
super(ParseYTDLResult, self).__init__('result')
self.opts = result
# replacement constructor that raises ParseYTDLResult
def ytdl_init(ydl, ydl_opts):
super(YDL, ydl).__init__(ydl_opts)
raise ParseYTDLResult(ydl_opts)
# patch in the constructor
YDL.__init__ = MethodType(ytdl_init, YDL)
# core parser
def parsed_options(argv):
try:
youtube_dl._real_main(list(argv))
except ParseYTDLResult as result:
return result.opts
# from https://github.com/yt-dlp/yt-dlp/issues/5859#issuecomment-1363938900
default = parsed_options([])
diff = dict((k, v) for k, v in parsed_options(opts).items() if default[k] != v)
if 'postprocessors' in diff:
diff['postprocessors'] = [pp for pp in diff['postprocessors'] if pp not in default['postprocessors']]
return diff
def main():
from pprint import pprint
pprint(cli_to_api(*sys.argv))
if __name__ == '__main__':
main()

View file

@ -35,13 +35,13 @@ class InfoExtractorTestRequestHandler(compat_http_server.BaseHTTPRequestHandler)
assert False assert False
class TestIE(InfoExtractor): class DummyIE(InfoExtractor):
pass pass
class TestInfoExtractor(unittest.TestCase): class TestInfoExtractor(unittest.TestCase):
def setUp(self): def setUp(self):
self.ie = TestIE(FakeYDL()) self.ie = DummyIE(FakeYDL())
def test_ie_key(self): def test_ie_key(self):
self.assertEqual(get_info_extractor(YoutubeIE.ie_key()), YoutubeIE) self.assertEqual(get_info_extractor(YoutubeIE.ie_key()), YoutubeIE)
@ -62,6 +62,7 @@ class TestInfoExtractor(unittest.TestCase):
<meta name="og:test1" content='foo > < bar'/> <meta name="og:test1" content='foo > < bar'/>
<meta name="og:test2" content="foo >//< bar"/> <meta name="og:test2" content="foo >//< bar"/>
<meta property=og-test3 content='Ill-formatted opengraph'/> <meta property=og-test3 content='Ill-formatted opengraph'/>
<meta property=og:test4 content=unquoted-value/>
''' '''
self.assertEqual(ie._og_search_title(html), 'Foo') self.assertEqual(ie._og_search_title(html), 'Foo')
self.assertEqual(ie._og_search_description(html), 'Some video\'s description ') self.assertEqual(ie._og_search_description(html), 'Some video\'s description ')
@ -74,6 +75,7 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._og_search_property(('test0', 'test1'), html), 'foo > < bar') self.assertEqual(ie._og_search_property(('test0', 'test1'), html), 'foo > < bar')
self.assertRaises(RegexNotFoundError, ie._og_search_property, 'test0', html, None, fatal=True) self.assertRaises(RegexNotFoundError, ie._og_search_property, 'test0', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._og_search_property, ('test0', 'test00'), html, None, fatal=True) self.assertRaises(RegexNotFoundError, ie._og_search_property, ('test0', 'test00'), html, None, fatal=True)
self.assertEqual(ie._og_search_property('test4', html), 'unquoted-value')
def test_html_search_meta(self): def test_html_search_meta(self):
ie = self.ie ie = self.ie

View file

@ -148,6 +148,7 @@ def generator(test_case, tname):
try_rm(tc_filename) try_rm(tc_filename)
try_rm(tc_filename + '.part') try_rm(tc_filename + '.part')
try_rm(os.path.splitext(tc_filename)[0] + '.info.json') try_rm(os.path.splitext(tc_filename)[0] + '.info.json')
try_rm_tcs_files() try_rm_tcs_files()
try: try:
try_num = 1 try_num = 1
@ -213,7 +214,15 @@ def generator(test_case, tname):
# First, check test cases' data against extracted data alone # First, check test cases' data against extracted data alone
expect_info_dict(self, tc_res_dict, tc.get('info_dict', {})) expect_info_dict(self, tc_res_dict, tc.get('info_dict', {}))
# Now, check downloaded file consistency # Now, check downloaded file consistency
# support test-case with volatile ID, signalled by regexp value
if tc.get('info_dict', {}).get('id', '').startswith('re:'):
test_id = tc['info_dict']['id']
tc['info_dict']['id'] = tc_res_dict['id']
else:
test_id = None
tc_filename = get_tc_filename(tc) tc_filename = get_tc_filename(tc)
if test_id:
tc['info_dict']['id'] = test_id
if not test_case.get('params', {}).get('skip_download', False): if not test_case.get('params', {}).get('skip_download', False):
self.assertTrue(os.path.exists(tc_filename), msg='Missing file ' + tc_filename) self.assertTrue(os.path.exists(tc_filename), msg='Missing file ' + tc_filename)
self.assertTrue(tc_filename in finished_hook_called) self.assertTrue(tc_filename in finished_hook_called)

View file

@ -139,21 +139,16 @@ class TestJSInterpreter(unittest.TestCase):
self.assertTrue(math.isnan(jsi.call_function('x'))) self.assertTrue(math.isnan(jsi.call_function('x')))
def test_Date(self): def test_Date(self):
jsi = JSInterpreter('''
function x() { return new Date('Wednesday 31 December 1969 18:01:26 MDT') - 0; }
''')
self.assertEqual(jsi.call_function('x'), 86000)
jsi = JSInterpreter(''' jsi = JSInterpreter('''
function x(dt) { return new Date(dt) - 0; } function x(dt) { return new Date(dt) - 0; }
''') ''')
self.assertEqual(jsi.call_function('x', 'Wednesday 31 December 1969 18:01:26 MDT'), 86000) self.assertEqual(jsi.call_function('x', 'Wednesday 31 December 1969 18:01:26 MDT'), 86000)
# date format m/d/y # date format m/d/y
jsi = JSInterpreter(''' self.assertEqual(jsi.call_function('x', '12/31/1969 18:01:26 MDT'), 86000)
function x() { return new Date('12/31/1969 18:01:26 MDT') - 0; }
''') # epoch 0
self.assertEqual(jsi.call_function('x'), 86000) self.assertEqual(jsi.call_function('x', '1 January 1970 00:00:00 UTC'), 0)
def test_call(self): def test_call(self):
jsi = JSInterpreter(''' jsi = JSInterpreter('''
@ -445,7 +440,7 @@ class TestJSInterpreter(unittest.TestCase):
self.assertIs(jsi.call_function('x'), None) self.assertIs(jsi.call_function('x'), None)
jsi = JSInterpreter(''' jsi = JSInterpreter('''
function x() { let a=/,,[/,913,/](,)}/; return a; } function x() { let a=/,,[/,913,/](,)}/; "".replace(a, ""); return a; }
''') ''')
attrs = set(('findall', 'finditer', 'flags', 'groupindex', attrs = set(('findall', 'finditer', 'flags', 'groupindex',
'groups', 'match', 'pattern', 'scanner', 'groups', 'match', 'pattern', 'scanner',
@ -457,6 +452,31 @@ class TestJSInterpreter(unittest.TestCase):
''') ''')
self.assertEqual(jsi.call_function('x').flags & ~re.U, re.I) self.assertEqual(jsi.call_function('x').flags & ~re.U, re.I)
jsi = JSInterpreter(r'''
function x() { let a="data-name".replace("data-", ""); return a }
''')
self.assertEqual(jsi.call_function('x'), 'name')
jsi = JSInterpreter(r'''
function x() { let a="data-name".replace(new RegExp("^.+-"), ""); return a; }
''')
self.assertEqual(jsi.call_function('x'), 'name')
jsi = JSInterpreter(r'''
function x() { let a="data-name".replace(/^.+-/, ""); return a; }
''')
self.assertEqual(jsi.call_function('x'), 'name')
jsi = JSInterpreter(r'''
function x() { let a="data-name".replace(/a/g, "o"); return a; }
''')
self.assertEqual(jsi.call_function('x'), 'doto-nome')
jsi = JSInterpreter(r'''
function x() { let a="data-name".replaceAll("a", "o"); return a; }
''')
self.assertEqual(jsi.call_function('x'), 'doto-nome')
jsi = JSInterpreter(r''' jsi = JSInterpreter(r'''
function x() { let a=[/[)\\]/]; return a[0]; } function x() { let a=[/[)\\]/]; return a[0]; }
''') ''')
@ -485,6 +505,12 @@ class TestJSInterpreter(unittest.TestCase):
jsi = JSInterpreter('function x(){return 1236566549 << 5}') jsi = JSInterpreter('function x(){return 1236566549 << 5}')
self.assertEqual(jsi.call_function('x'), 915423904) self.assertEqual(jsi.call_function('x'), 915423904)
""" # fails so far
def test_packed(self):
jsi = JSInterpreter('''function x(p,a,c,k,e,d){while(c--)if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p}''')
self.assertEqual(jsi.call_function('x', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("<q />").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|')))
"""
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View file

@ -250,6 +250,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_url('httpss://foo.bar'), 'https://foo.bar') self.assertEqual(sanitize_url('httpss://foo.bar'), 'https://foo.bar')
self.assertEqual(sanitize_url('rmtps://foo.bar'), 'rtmps://foo.bar') self.assertEqual(sanitize_url('rmtps://foo.bar'), 'rtmps://foo.bar')
self.assertEqual(sanitize_url('https://foo.bar'), 'https://foo.bar') self.assertEqual(sanitize_url('https://foo.bar'), 'https://foo.bar')
self.assertEqual(sanitize_url('foo bar'), 'foo bar')
def test_expand_path(self): def test_expand_path(self):
def env(var): def env(var):

View file

@ -67,6 +67,10 @@ _SIG_TESTS = [
] ]
_NSIG_TESTS = [ _NSIG_TESTS = [
(
'https://www.youtube.com/s/player/7862ca1f/player_ias.vflset/en_US/base.js',
'X_LCxVDjAavgE5t', 'yxJ1dM6iz5ogUg',
),
( (
'https://www.youtube.com/s/player/9216d1f7/player_ias.vflset/en_US/base.js', 'https://www.youtube.com/s/player/9216d1f7/player_ias.vflset/en_US/base.js',
'SLp9F5bwjAdhE9F-', 'gWnb9IK2DJ8Q1w', 'SLp9F5bwjAdhE9F-', 'gWnb9IK2DJ8Q1w',

View file

@ -39,6 +39,7 @@ from .compat import (
compat_str, compat_str,
compat_tokenize_tokenize, compat_tokenize_tokenize,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse,
compat_urllib_request, compat_urllib_request,
compat_urllib_request_DataHandler, compat_urllib_request_DataHandler,
) )
@ -60,6 +61,7 @@ from .utils import (
format_bytes, format_bytes,
formatSeconds, formatSeconds,
GeoRestrictedError, GeoRestrictedError,
HEADRequest,
int_or_none, int_or_none,
ISO3166Utils, ISO3166Utils,
locked_file, locked_file,
@ -74,6 +76,7 @@ from .utils import (
preferredencoding, preferredencoding,
prepend_extension, prepend_extension,
process_communicate_or_kill, process_communicate_or_kill,
PUTRequest,
register_socks_protocols, register_socks_protocols,
render_table, render_table,
replace_extension, replace_extension,
@ -2297,6 +2300,27 @@ class YoutubeDL(object):
""" Start an HTTP download """ """ Start an HTTP download """
if isinstance(req, compat_basestring): if isinstance(req, compat_basestring):
req = sanitized_Request(req) req = sanitized_Request(req)
# an embedded /../ sequence is not automatically handled by urllib2
# see https://github.com/yt-dlp/yt-dlp/issues/3355
url = req.get_full_url()
parts = url.partition('/../')
if parts[1]:
url = compat_urllib_parse.urljoin(parts[0] + parts[1][:1], parts[1][1:] + parts[2])
if url:
# worse, URL path may have initial /../ against RFCs: work-around
# by stripping such prefixes, like eg Firefox
parts = compat_urllib_parse.urlsplit(url)
path = parts.path
while path.startswith('/../'):
path = path[3:]
url = parts._replace(path=path).geturl()
# get a new Request with the munged URL
if url != req.get_full_url():
req_type = {'HEAD': HEADRequest, 'PUT': PUTRequest}.get(
req.get_method(), compat_urllib_request.Request)
req = req_type(
url, data=req.data, headers=dict(req.header_items()),
origin_req_host=req.origin_req_host, unverifiable=req.unverifiable)
return self._opener.open(req, timeout=self._socket_timeout) return self._opener.open(req, timeout=self._socket_timeout)
def print_debug_header(self): def print_debug_header(self):

View file

@ -88,17 +88,21 @@ class FileDownloader(object):
return '---.-%' return '---.-%'
return '%6s' % ('%3.1f%%' % percent) return '%6s' % ('%3.1f%%' % percent)
@staticmethod @classmethod
def calc_eta(start, now, total, current): def calc_eta(cls, start_or_rate, now_or_remaining, *args):
if len(args) < 2:
rate, remaining = (start_or_rate, now_or_remaining)
if None in (rate, remaining):
return None
return int(float(remaining) / rate)
start, now = (start_or_rate, now_or_remaining)
total, current = args
if total is None: if total is None:
return None return None
if now is None: if now is None:
now = time.time() now = time.time()
dif = now - start rate = cls.calc_speed(start, now, current)
if current == 0 or dif < 0.001: # One millisecond return rate and int((float(total) - float(current)) / rate)
return None
rate = float(current) / dif
return int((float(total) - float(current)) / rate)
@staticmethod @staticmethod
def format_eta(eta): def format_eta(eta):
@ -123,6 +127,12 @@ class FileDownloader(object):
def format_retries(retries): def format_retries(retries):
return 'inf' if retries == float('inf') else '%.0f' % retries return 'inf' if retries == float('inf') else '%.0f' % retries
@staticmethod
def filesize_or_none(unencoded_filename):
fn = encodeFilename(unencoded_filename)
if os.path.isfile(fn):
return os.path.getsize(fn)
@staticmethod @staticmethod
def best_block_size(elapsed_time, bytes): def best_block_size(elapsed_time, bytes):
new_min = max(bytes / 2.0, 1.0) new_min = max(bytes / 2.0, 1.0)

View file

@ -38,8 +38,7 @@ class DashSegmentsFD(FragmentFD):
# In DASH, the first segment contains necessary headers to # In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment # generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments fatal = i == 0 or not skip_unavailable_fragments
count = 0 for count in range(fragment_retries + 1):
while count <= fragment_retries:
try: try:
fragment_url = fragment.get('url') fragment_url = fragment.get('url')
if not fragment_url: if not fragment_url:
@ -57,9 +56,8 @@ class DashSegmentsFD(FragmentFD):
# is usually enough) thus allowing to download the whole file successfully. # is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any # To be future-proof we will retry all fragments that fail with any
# HTTP error. # HTTP error.
count += 1 if count < fragment_retries:
if count <= fragment_retries: self.report_retry_fragment(err, frag_index, count + 1, fragment_retries)
self.report_retry_fragment(err, frag_index, count, fragment_retries)
except DownloadError: except DownloadError:
# Don't retry fragment if error occurred during HTTP downloading # Don't retry fragment if error occurred during HTTP downloading
# itself since it has own retry settings # itself since it has own retry settings
@ -68,7 +66,7 @@ class DashSegmentsFD(FragmentFD):
break break
raise raise
if count > fragment_retries: if count >= fragment_retries:
if not fatal: if not fatal:
self.report_skip_fragment(frag_index) self.report_skip_fragment(frag_index)
continue continue

View file

@ -273,7 +273,7 @@ class HttpieFD(ExternalFD):
class FFmpegFD(ExternalFD): class FFmpegFD(ExternalFD):
@classmethod @classmethod
def supports(cls, info_dict): def supports(cls, info_dict):
return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms') return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms', 'http_dash_segments')
@classmethod @classmethod
def available(cls): def available(cls):

View file

@ -71,7 +71,7 @@ class FragmentFD(FileDownloader):
@staticmethod @staticmethod
def __do_ytdl_file(ctx): def __do_ytdl_file(ctx):
return not ctx['live'] and not ctx['tmpfilename'] == '-' return ctx['live'] is not True and ctx['tmpfilename'] != '-'
def _read_ytdl_file(self, ctx): def _read_ytdl_file(self, ctx):
assert 'ytdl_corrupt' not in ctx assert 'ytdl_corrupt' not in ctx
@ -101,6 +101,13 @@ class FragmentFD(FileDownloader):
'url': frag_url, 'url': frag_url,
'http_headers': headers or info_dict.get('http_headers'), 'http_headers': headers or info_dict.get('http_headers'),
} }
frag_resume_len = 0
if ctx['dl'].params.get('continuedl', True):
frag_resume_len = self.filesize_or_none(
self.temp_name(fragment_filename))
fragment_info_dict['frag_resume_len'] = frag_resume_len
ctx['frag_resume_len'] = frag_resume_len or 0
success = ctx['dl'].download(fragment_filename, fragment_info_dict) success = ctx['dl'].download(fragment_filename, fragment_info_dict)
if not success: if not success:
return False, None return False, None
@ -124,9 +131,7 @@ class FragmentFD(FileDownloader):
del ctx['fragment_filename_sanitized'] del ctx['fragment_filename_sanitized']
def _prepare_frag_download(self, ctx): def _prepare_frag_download(self, ctx):
if 'live' not in ctx: if not ctx.setdefault('live', False):
ctx['live'] = False
if not ctx['live']:
total_frags_str = '%d' % ctx['total_frags'] total_frags_str = '%d' % ctx['total_frags']
ad_frags = ctx.get('ad_frags', 0) ad_frags = ctx.get('ad_frags', 0)
if ad_frags: if ad_frags:
@ -136,10 +141,11 @@ class FragmentFD(FileDownloader):
self.to_screen( self.to_screen(
'[%s] Total fragments: %s' % (self.FD_NAME, total_frags_str)) '[%s] Total fragments: %s' % (self.FD_NAME, total_frags_str))
self.report_destination(ctx['filename']) self.report_destination(ctx['filename'])
continuedl = self.params.get('continuedl', True)
dl = HttpQuietDownloader( dl = HttpQuietDownloader(
self.ydl, self.ydl,
{ {
'continuedl': True, 'continuedl': continuedl,
'quiet': True, 'quiet': True,
'noprogress': True, 'noprogress': True,
'ratelimit': self.params.get('ratelimit'), 'ratelimit': self.params.get('ratelimit'),
@ -150,12 +156,11 @@ class FragmentFD(FileDownloader):
) )
tmpfilename = self.temp_name(ctx['filename']) tmpfilename = self.temp_name(ctx['filename'])
open_mode = 'wb' open_mode = 'wb'
resume_len = 0
# Establish possible resume length # Establish possible resume length
if os.path.isfile(encodeFilename(tmpfilename)): resume_len = self.filesize_or_none(tmpfilename) or 0
if resume_len > 0:
open_mode = 'ab' open_mode = 'ab'
resume_len = os.path.getsize(encodeFilename(tmpfilename))
# Should be initialized before ytdl file check # Should be initialized before ytdl file check
ctx.update({ ctx.update({
@ -164,7 +169,8 @@ class FragmentFD(FileDownloader):
}) })
if self.__do_ytdl_file(ctx): if self.__do_ytdl_file(ctx):
if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))): ytdl_file_exists = os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename'])))
if continuedl and ytdl_file_exists:
self._read_ytdl_file(ctx) self._read_ytdl_file(ctx)
is_corrupt = ctx.get('ytdl_corrupt') is True is_corrupt = ctx.get('ytdl_corrupt') is True
is_inconsistent = ctx['fragment_index'] > 0 and resume_len == 0 is_inconsistent = ctx['fragment_index'] > 0 and resume_len == 0
@ -178,7 +184,12 @@ class FragmentFD(FileDownloader):
if 'ytdl_corrupt' in ctx: if 'ytdl_corrupt' in ctx:
del ctx['ytdl_corrupt'] del ctx['ytdl_corrupt']
self._write_ytdl_file(ctx) self._write_ytdl_file(ctx)
else: else:
if not continuedl:
if ytdl_file_exists:
self._read_ytdl_file(ctx)
ctx['fragment_index'] = resume_len = 0
self._write_ytdl_file(ctx) self._write_ytdl_file(ctx)
assert ctx['fragment_index'] == 0 assert ctx['fragment_index'] == 0
@ -209,6 +220,7 @@ class FragmentFD(FileDownloader):
start = time.time() start = time.time()
ctx.update({ ctx.update({
'started': start, 'started': start,
'fragment_started': start,
# Amount of fragment's bytes downloaded by the time of the previous # Amount of fragment's bytes downloaded by the time of the previous
# frag progress hook invocation # frag progress hook invocation
'prev_frag_downloaded_bytes': 0, 'prev_frag_downloaded_bytes': 0,
@ -218,6 +230,9 @@ class FragmentFD(FileDownloader):
if s['status'] not in ('downloading', 'finished'): if s['status'] not in ('downloading', 'finished'):
return return
if not total_frags and ctx.get('fragment_count'):
state['fragment_count'] = ctx['fragment_count']
time_now = time.time() time_now = time.time()
state['elapsed'] = time_now - start state['elapsed'] = time_now - start
frag_total_bytes = s.get('total_bytes') or 0 frag_total_bytes = s.get('total_bytes') or 0
@ -232,16 +247,17 @@ class FragmentFD(FileDownloader):
ctx['fragment_index'] = state['fragment_index'] ctx['fragment_index'] = state['fragment_index']
state['downloaded_bytes'] += frag_total_bytes - ctx['prev_frag_downloaded_bytes'] state['downloaded_bytes'] += frag_total_bytes - ctx['prev_frag_downloaded_bytes']
ctx['complete_frags_downloaded_bytes'] = state['downloaded_bytes'] ctx['complete_frags_downloaded_bytes'] = state['downloaded_bytes']
ctx['speed'] = state['speed'] = self.calc_speed(
ctx['fragment_started'], time_now, frag_total_bytes)
ctx['fragment_started'] = time.time()
ctx['prev_frag_downloaded_bytes'] = 0 ctx['prev_frag_downloaded_bytes'] = 0
else: else:
frag_downloaded_bytes = s['downloaded_bytes'] frag_downloaded_bytes = s['downloaded_bytes']
state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes'] state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
ctx['speed'] = state['speed'] = self.calc_speed(
ctx['fragment_started'], time_now, frag_downloaded_bytes - ctx['frag_resume_len'])
if not ctx['live']: if not ctx['live']:
state['eta'] = self.calc_eta( state['eta'] = self.calc_eta(state['speed'], estimated_size - state['downloaded_bytes'])
start, time_now, estimated_size - resume_len,
state['downloaded_bytes'] - resume_len)
state['speed'] = s.get('speed') or ctx.get('speed')
ctx['speed'] = state['speed']
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
self._hook_progress(state) self._hook_progress(state)
@ -268,7 +284,7 @@ class FragmentFD(FileDownloader):
os.utime(ctx['filename'], (time.time(), filetime)) os.utime(ctx['filename'], (time.time(), filetime))
except Exception: except Exception:
pass pass
downloaded_bytes = os.path.getsize(encodeFilename(ctx['filename'])) downloaded_bytes = self.filesize_or_none(ctx['filename']) or 0
self._hook_progress({ self._hook_progress({
'downloaded_bytes': downloaded_bytes, 'downloaded_bytes': downloaded_bytes,

View file

@ -58,9 +58,9 @@ class HttpFD(FileDownloader):
if self.params.get('continuedl', True): if self.params.get('continuedl', True):
# Establish possible resume length # Establish possible resume length
if os.path.isfile(encodeFilename(ctx.tmpfilename)): ctx.resume_len = info_dict.get('frag_resume_len')
ctx.resume_len = os.path.getsize( if ctx.resume_len is None:
encodeFilename(ctx.tmpfilename)) ctx.resume_len = self.filesize_or_none(ctx.tmpfilename) or 0
ctx.is_resume = ctx.resume_len > 0 ctx.is_resume = ctx.resume_len > 0
@ -115,9 +115,9 @@ class HttpFD(FileDownloader):
raise RetryDownload(err) raise RetryDownload(err)
raise err raise err
# When trying to resume, Content-Range HTTP header of response has to be checked # When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers # to match the value of requested Range HTTP header. This is due to webservers
# that don't support resuming and serve a whole file with no Content-Range # that don't support resuming and serve a whole file with no Content-Range
# set in response despite of requested Range (see # set in response despite requested Range (see
# https://github.com/ytdl-org/youtube-dl/issues/6057#issuecomment-126129799) # https://github.com/ytdl-org/youtube-dl/issues/6057#issuecomment-126129799)
if has_range: if has_range:
content_range = ctx.data.headers.get('Content-Range') content_range = ctx.data.headers.get('Content-Range')
@ -293,10 +293,7 @@ class HttpFD(FileDownloader):
# Progress message # Progress message
speed = self.calc_speed(start, now, byte_counter - ctx.resume_len) speed = self.calc_speed(start, now, byte_counter - ctx.resume_len)
if ctx.data_len is None: eta = self.calc_eta(speed, ctx.data_len and (ctx.data_len - ctx.resume_len))
eta = None
else:
eta = self.calc_eta(start, time.time(), ctx.data_len - ctx.resume_len, byte_counter - ctx.resume_len)
self._hook_progress({ self._hook_progress({
'status': 'downloading', 'status': 'downloading',

View file

@ -8,6 +8,8 @@ from ..utils import (
ExtractorError, ExtractorError,
GeoRestrictedError, GeoRestrictedError,
int_or_none, int_or_none,
remove_start,
traverse_obj,
update_url_query, update_url_query,
urlencode_postdata, urlencode_postdata,
) )
@ -33,14 +35,17 @@ class AENetworksBaseIE(ThePlatformIE):
} }
def _extract_aen_smil(self, smil_url, video_id, auth=None): def _extract_aen_smil(self, smil_url, video_id, auth=None):
query = {'mbr': 'true'} query = {
'mbr': 'true',
'formats': 'M3U+none,MPEG-DASH+none,MPEG4,MP3',
}
if auth: if auth:
query['auth'] = auth query['auth'] = auth
TP_SMIL_QUERY = [{ TP_SMIL_QUERY = [{
'assetTypes': 'high_video_ak', 'assetTypes': 'high_video_ak',
'switch': 'hls_high_ak' 'switch': 'hls_high_ak',
}, { }, {
'assetTypes': 'high_video_s3' 'assetTypes': 'high_video_s3',
}, { }, {
'assetTypes': 'high_video_s3', 'assetTypes': 'high_video_s3',
'switch': 'hls_high_fastly', 'switch': 'hls_high_fastly',
@ -75,7 +80,14 @@ class AENetworksBaseIE(ThePlatformIE):
requestor_id, brand = self._DOMAIN_MAP[domain] requestor_id, brand = self._DOMAIN_MAP[domain]
result = self._download_json( result = self._download_json(
'https://feeds.video.aetnd.com/api/v2/%s/videos' % brand, 'https://feeds.video.aetnd.com/api/v2/%s/videos' % brand,
filter_value, query={'filter[%s]' % filter_key: filter_value})['results'][0] filter_value, query={'filter[%s]' % filter_key: filter_value})
result = traverse_obj(
result, ('results',
lambda k, v: k == 0 and v[filter_key] == filter_value),
get_all=False)
if not result:
raise ExtractorError('Show not found in A&E feed (too new?)', expected=True,
video_id=remove_start(filter_value, '/'))
title = result['title'] title = result['title']
video_id = result['id'] video_id = result['id']
media_url = result['publicUrl'] media_url = result['publicUrl']
@ -126,7 +138,7 @@ class AENetworksIE(AENetworksBaseIE):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'skip': 'This video is only available for users of participating TV providers.', 'skip': 'Geo-restricted - This content is not available in your location.'
}, { }, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1', 'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'info_dict': { 'info_dict': {
@ -143,6 +155,7 @@ class AENetworksIE(AENetworksBaseIE):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'skip': 'This video is only available for users of participating TV providers.',
}, { }, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8', 'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True 'only_matching': True

View file

@ -1087,7 +1087,7 @@ class InfoExtractor(object):
# Helper functions for extracting OpenGraph info # Helper functions for extracting OpenGraph info
@staticmethod @staticmethod
def _og_regexes(prop): def _og_regexes(prop):
content_re = r'content=(?:"([^"]+?)"|\'([^\']+?)\'|\s*([^\s"\'=<>`]+?))' content_re = r'content=(?:"([^"]+?)"|\'([^\']+?)\'|\s*([^\s"\'=<>`]+?)(?=\s|/?>))'
property_re = (r'(?:name|property)=(?:\'og[:-]%(prop)s\'|"og[:-]%(prop)s"|\s*og[:-]%(prop)s\b)' property_re = (r'(?:name|property)=(?:\'og[:-]%(prop)s\'|"og[:-]%(prop)s"|\s*og[:-]%(prop)s\b)'
% {'prop': re.escape(prop)}) % {'prop': re.escape(prop)})
template = r'<meta[^>]+?%s[^>]+?%s' template = r'<meta[^>]+?%s[^>]+?%s'

View file

@ -2320,6 +2320,25 @@ class GenericIE(InfoExtractor):
'height': 720, 'height': 720,
'age_limit': 18, 'age_limit': 18,
}, },
}, {
# would like to use the yt-dl test video but searching for
# '"\'/\\ä↭𝕐' fails, so using an old vid from YouTube Korea
'note': 'Test default search',
'url': 'Shorts로 허락 필요없이 놀자! (BTS편)',
'info_dict': {
'id': 'usDGO4Zb-dc',
'ext': 'mp4',
'title': 'YouTube Shorts로 허락 필요없이 놀자! (BTS편)',
'description': 'md5:96e31607eba81ab441567b5e289f4716',
'upload_date': '20211107',
'uploader': 'YouTube Korea',
'location': '대한민국',
},
'params': {
'default_search': 'ytsearch',
'skip_download': True,
},
'expected_warnings': ['uploader id'],
}, },
] ]

View file

@ -1,19 +1,29 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_filter as filter,
compat_HTTPError,
compat_parse_qs, compat_parse_qs,
compat_urllib_parse_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
HEADRequest,
determine_ext, determine_ext,
error_to_compat_str,
extract_attributes,
ExtractorError,
int_or_none, int_or_none,
merge_dicts,
orderedSet,
parse_iso8601, parse_iso8601,
strip_or_none, strip_or_none,
try_get, traverse_obj,
url_or_none,
urljoin,
) )
@ -22,14 +32,102 @@ class IGNBaseIE(InfoExtractor):
return self._download_json( return self._download_json(
'http://apis.ign.com/{0}/v3/{0}s/slug/{1}'.format(self._PAGE_TYPE, slug), slug) 'http://apis.ign.com/{0}/v3/{0}s/slug/{1}'.format(self._PAGE_TYPE, slug), slug)
def _checked_call_api(self, slug):
try:
return self._call_api(slug)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
e.cause.args = e.cause.args or [
e.cause.geturl(), e.cause.getcode(), e.cause.reason]
raise ExtractorError(
'Content not found: expired?', cause=e.cause,
expected=True)
raise
def _extract_video_info(self, video, fatal=True):
video_id = video['videoId']
formats = []
refs = traverse_obj(video, 'refs', expected_type=dict) or {}
m3u8_url = url_or_none(refs.get('m3uUrl'))
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
f4m_url = url_or_none(refs.get('f4mUrl'))
if f4m_url:
formats.extend(self._extract_f4m_formats(
f4m_url, video_id, f4m_id='hds', fatal=False))
for asset in (video.get('assets') or []):
asset_url = url_or_none(asset.get('url'))
if not asset_url:
continue
formats.append({
'url': asset_url,
'tbr': int_or_none(asset.get('bitrate'), 1000),
'fps': int_or_none(asset.get('frame_rate')),
'height': int_or_none(asset.get('height')),
'width': int_or_none(asset.get('width')),
})
mezzanine_url = traverse_obj(
video, ('system', 'mezzanineUrl'), expected_type=url_or_none)
if mezzanine_url:
formats.append({
'ext': determine_ext(mezzanine_url, 'mp4'),
'format_id': 'mezzanine',
'preference': 1,
'url': mezzanine_url,
})
if formats or fatal:
self._sort_formats(formats)
else:
return
thumbnails = traverse_obj(
video, ('thumbnails', Ellipsis, {'url': 'url'}), expected_type=url_or_none)
tags = traverse_obj(
video, ('tags', Ellipsis, 'displayName'),
expected_type=lambda x: x.strip() or None)
metadata = traverse_obj(video, 'metadata', expected_type=dict) or {}
title = traverse_obj(
metadata, 'longTitle', 'title', 'name',
expected_type=lambda x: x.strip() or None)
return {
'id': video_id,
'title': title,
'description': strip_or_none(metadata.get('description')),
'timestamp': parse_iso8601(metadata.get('publishDate')),
'duration': int_or_none(metadata.get('duration')),
'thumbnails': thumbnails,
'formats': formats,
'tags': tags,
}
# yt-dlp shim
@classmethod
def _extract_from_webpage(cls, url, webpage):
for embed_url in orderedSet(
cls._extract_embed_urls(url, webpage) or [], lazy=True):
yield cls.url_result(embed_url, None if cls._VALID_URL is False else cls)
class IGNIE(IGNBaseIE): class IGNIE(IGNBaseIE):
""" """
Extractor for some of the IGN sites, like www.ign.com, es.ign.com de.ign.com. Extractor for some of the IGN sites, like www.ign.com, es.ign.com de.ign.com.
Some videos of it.ign.com are also supported Some videos of it.ign.com are also supported
""" """
_VIDEO_PATH_RE = r'/(?:\d{4}/\d{2}/\d{2}/)?(?P<id>.+?)'
_VALID_URL = r'https?://(?:.+?\.ign|www\.pcmag)\.com/videos/(?:\d{4}/\d{2}/\d{2}/)?(?P<id>[^/?&#]+)' _PLAYLIST_PATH_RE = r'(?:/?\?(?P<filt>[^&#]+))?'
_VALID_URL = (
r'https?://(?:.+?\.ign|www\.pcmag)\.com/videos(?:%s)'
% '|'.join((_VIDEO_PATH_RE + r'(?:[/?&#]|$)', _PLAYLIST_PATH_RE)))
IE_NAME = 'ign.com' IE_NAME = 'ign.com'
_PAGE_TYPE = 'video' _PAGE_TYPE = 'video'
@ -44,7 +142,10 @@ class IGNIE(IGNBaseIE):
'timestamp': 1370440800, 'timestamp': 1370440800,
'upload_date': '20130605', 'upload_date': '20130605',
'tags': 'count:9', 'tags': 'count:9',
} },
'params': {
'nocheckcertificate': True,
},
}, { }, {
'url': 'http://www.pcmag.com/videos/2015/01/06/010615-whats-new-now-is-gogo-snooping-on-your-data', 'url': 'http://www.pcmag.com/videos/2015/01/06/010615-whats-new-now-is-gogo-snooping-on-your-data',
'md5': 'f1581a6fe8c5121be5b807684aeac3f6', 'md5': 'f1581a6fe8c5121be5b807684aeac3f6',
@ -56,86 +157,51 @@ class IGNIE(IGNBaseIE):
'timestamp': 1420571160, 'timestamp': 1420571160,
'upload_date': '20150106', 'upload_date': '20150106',
'tags': 'count:4', 'tags': 'count:4',
} },
'skip': '404 Not Found',
}, { }, {
'url': 'https://www.ign.com/videos/is-a-resident-evil-4-remake-on-the-way-ign-daily-fix', 'url': 'https://www.ign.com/videos/is-a-resident-evil-4-remake-on-the-way-ign-daily-fix',
'only_matching': True, 'only_matching': True,
}] }]
@classmethod
def _extract_embed_urls(cls, url, webpage):
grids = re.findall(
r'''(?s)<section\b[^>]+\bclass\s*=\s*['"](?:[\w-]+\s+)*?content-feed-grid(?!\B|-)[^>]+>(.+?)</section[^>]*>''',
webpage)
return filter(None,
(urljoin(url, m.group('path')) for m in re.finditer(
r'''<a\b[^>]+\bhref\s*=\s*('|")(?P<path>/videos%s)\1'''
% cls._VIDEO_PATH_RE, grids[0] if grids else '')))
def _real_extract(self, url): def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
display_id = m.group('id')
if display_id:
return self._extract_video(url, display_id)
display_id = m.group('filt') or 'all'
return self._extract_playlist(url, display_id)
def _extract_playlist(self, url, display_id):
webpage = self._download_webpage(url, display_id)
return self.playlist_result(
(self.url_result(u, ie=self.ie_key())
for u in self._extract_embed_urls(url, webpage)),
playlist_id=display_id)
def _extract_video(self, url, display_id):
display_id = self._match_id(url) display_id = self._match_id(url)
video = self._call_api(display_id) video = self._checked_call_api(display_id)
video_id = video['videoId']
metadata = video['metadata']
title = metadata.get('longTitle') or metadata.get('title') or metadata['name']
formats = [] info = self._extract_video_info(video)
refs = video.get('refs') or {}
m3u8_url = refs.get('m3uUrl') return merge_dicts({
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
f4m_url = refs.get('f4mUrl')
if f4m_url:
formats.extend(self._extract_f4m_formats(
f4m_url, video_id, f4m_id='hds', fatal=False))
for asset in (video.get('assets') or []):
asset_url = asset.get('url')
if not asset_url:
continue
formats.append({
'url': asset_url,
'tbr': int_or_none(asset.get('bitrate'), 1000),
'fps': int_or_none(asset.get('frame_rate')),
'height': int_or_none(asset.get('height')),
'width': int_or_none(asset.get('width')),
})
mezzanine_url = try_get(video, lambda x: x['system']['mezzanineUrl'])
if mezzanine_url:
formats.append({
'ext': determine_ext(mezzanine_url, 'mp4'),
'format_id': 'mezzanine',
'preference': 1,
'url': mezzanine_url,
})
self._sort_formats(formats)
thumbnails = []
for thumbnail in (video.get('thumbnails') or []):
thumbnail_url = thumbnail.get('url')
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
})
tags = []
for tag in (video.get('tags') or []):
display_name = tag.get('displayName')
if not display_name:
continue
tags.append(display_name)
return {
'id': video_id,
'title': title,
'description': strip_or_none(metadata.get('description')),
'timestamp': parse_iso8601(metadata.get('publishDate')),
'duration': int_or_none(metadata.get('duration')),
'display_id': display_id, 'display_id': display_id,
'thumbnails': thumbnails, }, info)
'formats': formats,
'tags': tags,
}
class IGNVideoIE(InfoExtractor): class IGNVideoIE(IGNBaseIE):
_VALID_URL = r'https?://.+?\.ign\.com/(?:[a-z]{2}/)?[^/]+/(?P<id>\d+)/(?:video|trailer)/' _VALID_URL = r'https?://.+?\.ign\.com/(?:[a-z]{2}/)?[^/]+/(?P<id>\d+)/(?:video|trailer)/'
_TESTS = [{ _TESTS = [{
'url': 'http://me.ign.com/en/videos/112203/video/how-hitman-aims-to-be-different-than-every-other-s', 'url': 'http://me.ign.com/en/videos/112203/video/how-hitman-aims-to-be-different-than-every-other-s',
@ -147,7 +213,8 @@ class IGNVideoIE(InfoExtractor):
'description': 'Taking out assassination targets in Hitman has never been more stylish.', 'description': 'Taking out assassination targets in Hitman has never been more stylish.',
'timestamp': 1444665600, 'timestamp': 1444665600,
'upload_date': '20151012', 'upload_date': '20151012',
} },
'expected_warnings': ['HTTP Error 400: Bad Request'],
}, { }, {
'url': 'http://me.ign.com/ar/angry-birds-2/106533/video/lrd-ldyy-lwl-lfylm-angry-birds', 'url': 'http://me.ign.com/ar/angry-birds-2/106533/video/lrd-ldyy-lwl-lfylm-angry-birds',
'only_matching': True, 'only_matching': True,
@ -167,22 +234,38 @@ class IGNVideoIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
req = HEADRequest(url.rsplit('/', 1)[0] + '/embed') parsed_url = compat_urlparse.urlparse(url)
url = self._request_webpage(req, video_id).geturl() embed_url = compat_urlparse.urlunparse(
parsed_url._replace(path=parsed_url.path.rsplit('/', 1)[0] + '/embed'))
webpage, urlh = self._download_webpage_handle(embed_url, video_id)
new_url = urlh.geturl()
ign_url = compat_parse_qs( ign_url = compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('url', [None])[0] compat_urlparse.urlparse(new_url).query).get('url', [None])[-1]
if ign_url: if ign_url:
return self.url_result(ign_url, IGNIE.ie_key()) return self.url_result(ign_url, IGNIE.ie_key())
return self.url_result(url) video = self._search_regex(r'(<div\b[^>]+\bdata-video-id\s*=\s*[^>]+>)', webpage, 'video element', fatal=False)
if not video:
if new_url == url:
raise ExtractorError('Redirect loop: ' + url)
return self.url_result(new_url)
video = extract_attributes(video)
video_data = video.get('data-settings') or '{}'
video_data = self._parse_json(video_data, video_id)['video']
info = self._extract_video_info(video_data)
return merge_dicts({
'display_id': video_id,
}, info)
class IGNArticleIE(IGNBaseIE): class IGNArticleIE(IGNBaseIE):
_VALID_URL = r'https?://.+?\.ign\.com/(?:articles(?:/\d{4}/\d{2}/\d{2})?|(?:[a-z]{2}/)?feature/\d+)/(?P<id>[^/?&#]+)' _VALID_URL = r'https?://.+?\.ign\.com/(?:articles(?:/\d{4}/\d{2}/\d{2})?|(?:[a-z]{2}/)?(?:[\w-]+/)*?feature/\d+)/(?P<id>[^/?&#]+)'
_PAGE_TYPE = 'article' _PAGE_TYPE = 'article'
_TESTS = [{ _TESTS = [{
'url': 'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind', 'url': 'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
'info_dict': { 'info_dict': {
'id': '524497489e4e8ff5848ece34', 'id': '72113',
'title': '100 Little Things in GTA 5 That Will Blow Your Mind', 'title': '100 Little Things in GTA 5 That Will Blow Your Mind',
}, },
'playlist': [ 'playlist': [
@ -190,7 +273,7 @@ class IGNArticleIE(IGNBaseIE):
'info_dict': { 'info_dict': {
'id': '5ebbd138523268b93c9141af17bec937', 'id': '5ebbd138523268b93c9141af17bec937',
'ext': 'mp4', 'ext': 'mp4',
'title': 'GTA 5 Video Review', 'title': 'Grand Theft Auto V Video Review',
'description': 'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.', 'description': 'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
'timestamp': 1379339880, 'timestamp': 1379339880,
'upload_date': '20130916', 'upload_date': '20130916',
@ -200,7 +283,7 @@ class IGNArticleIE(IGNBaseIE):
'info_dict': { 'info_dict': {
'id': '638672ee848ae4ff108df2a296418ee2', 'id': '638672ee848ae4ff108df2a296418ee2',
'ext': 'mp4', 'ext': 'mp4',
'title': '26 Twisted Moments from GTA 5 in Slow Motion', 'title': 'GTA 5 In Slow Motion',
'description': 'The twisted beauty of GTA 5 in stunning slow motion.', 'description': 'The twisted beauty of GTA 5 in stunning slow motion.',
'timestamp': 1386878820, 'timestamp': 1386878820,
'upload_date': '20131212', 'upload_date': '20131212',
@ -208,16 +291,17 @@ class IGNArticleIE(IGNBaseIE):
}, },
], ],
'params': { 'params': {
'playlist_items': '2-3',
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Backend fetch failed'],
}, { }, {
'url': 'http://www.ign.com/articles/2014/08/15/rewind-theater-wild-trailer-gamescom-2014?watch', 'url': 'http://www.ign.com/articles/2014/08/15/rewind-theater-wild-trailer-gamescom-2014?watch',
'info_dict': { 'info_dict': {
'id': '53ee806780a81ec46e0790f8', 'id': '53ee806780a81ec46e0790f8',
'title': 'Rewind Theater - Wild Trailer Gamescom 2014', 'title': 'Rewind Theater - Wild Trailer Gamescom 2014',
}, },
'playlist_count': 2, 'playlist_count': 1,
'expected_warnings': ['Backend fetch failed'],
}, { }, {
# videoId pattern # videoId pattern
'url': 'http://www.ign.com/articles/2017/06/08/new-ducktales-short-donalds-birthday-doesnt-go-as-planned', 'url': 'http://www.ign.com/articles/2017/06/08/new-ducktales-short-donalds-birthday-doesnt-go-as-planned',
@ -240,18 +324,91 @@ class IGNArticleIE(IGNBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
def _checked_call_api(self, slug):
try:
return self._call_api(slug)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
e.cause.args = e.cause.args or [
e.cause.geturl(), e.cause.getcode(), e.cause.reason]
if e.cause.code == 404:
raise ExtractorError(
'Content not found: expired?', cause=e.cause,
expected=True)
elif e.cause.code == 503:
self.report_warning(error_to_compat_str(e.cause))
return
raise
def _search_nextjs_data(self, webpage, video_id, **kw):
return self._parse_json(
self._search_regex(
r'(?s)<script[^>]+id=[\'"]__NEXT_DATA__[\'"][^>]*>([^<]+)</script>',
webpage, 'next.js data', **kw),
video_id, **kw)
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
article = self._call_api(display_id) article = self._checked_call_api(display_id)
def entries(): if article:
media_url = try_get(article, lambda x: x['mediaRelations'][0]['media']['metadata']['url']) # obsolete ?
if media_url: def entries():
yield self.url_result(media_url, IGNIE.ie_key()) media_url = traverse_obj(
for content in (article.get('content') or []): article, ('mediaRelations', 0, 'media', 'metadata', 'url'),
for video_url in re.findall(r'(?:\[(?:ignvideo\s+url|youtube\s+clip_id)|<iframe[^>]+src)="([^"]+)"', content): expected_type=url_or_none)
yield self.url_result(video_url) if media_url:
yield self.url_result(media_url, IGNIE.ie_key())
for content in (article.get('content') or []):
for video_url in re.findall(r'(?:\[(?:ignvideo\s+url|youtube\s+clip_id)|<iframe[^>]+src)="([^"]+)"', content):
if url_or_none(video_url):
yield self.url_result(video_url)
return self.playlist_result(
entries(), article.get('articleId'),
traverse_obj(
article, ('metadata', 'headline'),
expected_type=lambda x: x.strip() or None))
webpage = self._download_webpage(url, display_id)
playlist_id = self._html_search_meta('dable:item_id', webpage, default=None)
if playlist_id:
def entries():
for m in re.finditer(
r'''(?s)<object\b[^>]+\bclass\s*=\s*("|')ign-videoplayer\1[^>]*>(?P<params>.+?)</object''',
webpage):
flashvars = self._search_regex(
r'''(<param\b[^>]+\bname\s*=\s*("|')flashvars\2[^>]*>)''',
m.group('params'), 'flashvars', default='')
flashvars = compat_parse_qs(extract_attributes(flashvars).get('value') or '')
v_url = url_or_none((flashvars.get('url') or [None])[-1])
if v_url:
yield self.url_result(v_url)
else:
playlist_id = self._search_regex(
r'''\bdata-post-id\s*=\s*("|')(?P<id>[\da-f]+)\1''',
webpage, 'id', group='id', default=None)
nextjs_data = self._search_nextjs_data(webpage, display_id)
def entries():
for player in traverse_obj(
nextjs_data,
('props', 'apolloState', 'ROOT_QUERY', lambda k, _: k.startswith('videoPlayerProps('), '__ref')):
# skip promo links (which may not always be served, eg GH CI servers)
if traverse_obj(nextjs_data,
('props', 'apolloState', player.replace('PlayerProps', 'ModernContent')),
expected_type=dict):
continue
video = traverse_obj(nextjs_data, ('props', 'apolloState', player), expected_type=dict) or {}
info = self._extract_video_info(video, fatal=False)
if info:
yield merge_dicts({
'display_id': display_id,
}, info)
return self.playlist_result( return self.playlist_result(
entries(), article.get('articleId'), entries(), playlist_id or display_id,
strip_or_none(try_get(article, lambda x: x['metadata']['headline']))) re.sub(r'\s+-\s+IGN\s*$', '', self._og_search_title(webpage, default='')) or None)

View file

@ -261,27 +261,33 @@ class VimeoIE(VimeoBaseInfoExtractor):
# _VALID_URL matches Vimeo URLs # _VALID_URL matches Vimeo URLs
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?: (?:
www| www|
player player
) )
\. \.
)? )?
vimeo(?:pro)?\.com/ vimeo(?:pro)?\.com/
(?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/) (?:
(?:.*?/)?? (?P<u>user)|
(?: (?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
(?: (?:.*?/)??
play_redirect_hls| (?P<q>
moogaloop\.swf)\?clip_id= (?:
)? play_redirect_hls|
(?:videos?/)? moogaloop\.swf)\?clip_id=
(?P<id>[0-9]+) )?
(?:/(?P<unlisted_hash>[\da-f]{10}))? (?:videos?/)?
/?(?:[?&].*)?(?:[#].*)?$ )
''' (?P<id>[0-9]+)
(?(u)
/(?!videos|likes)[^/?#]+/?|
(?(q)|/(?P<unlisted_hash>[\da-f]{10}))?
)
(?:(?(q)[&]|(?(u)|/?)[?]).+?)?(?:[#].*)?$
'''
IE_NAME = 'vimeo' IE_NAME = 'vimeo'
_TESTS = [ _TESTS = [
{ {
@ -539,7 +545,12 @@ class VimeoIE(VimeoBaseInfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
} },
{
# user playlist alias -> https://vimeo.com/258705797
'url': 'https://vimeo.com/user26785108/newspiritualguide',
'only_matching': True,
},
# https://gettingthingsdone.com/workflowmap/ # https://gettingthingsdone.com/workflowmap/
# vimeo embed with check-password page protected by Referer header # vimeo embed with check-password page protected by Referer header
] ]
@ -663,7 +674,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
if '//player.vimeo.com/video/' in url: if '//player.vimeo.com/video/' in url:
config = self._parse_json(self._search_regex( config = self._parse_json(self._search_regex(
r'\b(?:playerC|c)onfig\s*=\s*({.+?})\s*;', webpage, 'info section'), video_id) r'(?s)\b(?:playerC|c)onfig\s*=\s*({.+?})\s*[;\n]', webpage, 'info section'), video_id)
if config.get('view') == 4: if config.get('view') == 4:
config = self._verify_player_video_password( config = self._verify_player_video_password(
redirect_url, video_id, headers) redirect_url, video_id, headers)

View file

@ -31,6 +31,7 @@ from ..utils import (
get_element_by_attribute, get_element_by_attribute,
int_or_none, int_or_none,
js_to_json, js_to_json,
merge_dicts,
mimetype2ext, mimetype2ext,
parse_codecs, parse_codecs,
parse_duration, parse_duration,
@ -400,6 +401,62 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
break break
data['continuation'] = token data['continuation'] = token
@staticmethod
def _owner_endpoints_path():
return [
Ellipsis,
lambda k, _: k.endswith('SecondaryInfoRenderer'),
('owner', 'videoOwner'), 'videoOwnerRenderer', 'title',
'runs', Ellipsis]
def _extract_channel_id(self, webpage, videodetails={}, metadata={}, renderers=[]):
channel_id = None
if any((videodetails, metadata, renderers)):
channel_id = (
traverse_obj(videodetails, 'channelId')
or traverse_obj(metadata, 'externalChannelId', 'externalId')
or traverse_obj(renderers,
self._owner_endpoints_path() + [
'navigationEndpoint', 'browseEndpoint', 'browseId'],
get_all=False)
)
return channel_id or self._html_search_meta(
'channelId', webpage, 'channel id', default=None)
def _extract_author_var(self, webpage, var_name,
videodetails={}, metadata={}, renderers=[]):
result = None
paths = {
# (HTML, videodetails, metadata, renderers)
'name': ('content', 'author', (('ownerChannelName', None), 'title'), ['text']),
'url': ('href', 'ownerProfileUrl', 'vanityChannelUrl',
['navigationEndpoint', 'browseEndpoint', 'canonicalBaseUrl'])
}
if any((videodetails, metadata, renderers)):
result = (
traverse_obj(videodetails, paths[var_name][1], get_all=False)
or traverse_obj(metadata, paths[var_name][2], get_all=False)
or traverse_obj(renderers,
self._owner_endpoints_path() + paths[var_name][3],
get_all=False)
)
return result or traverse_obj(
extract_attributes(self._search_regex(
r'''(?s)(<link\b[^>]+\bitemprop\s*=\s*("|')%s\2[^>]*>)'''
% re.escape(var_name),
get_element_by_attribute('itemprop', 'author', webpage) or '',
'author link', default='')),
paths[var_name][0])
@staticmethod
def _yt_urljoin(url_or_path):
return urljoin('https://www.youtube.com', url_or_path)
def _extract_uploader_id(self, uploader_url):
return self._search_regex(
r'/(?:(?:channel|user)/|(?=@))([^/?&#]+)', uploader_url or '',
'uploader id', default=None)
class YoutubeIE(YoutubeBaseInfoExtractor): class YoutubeIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube.com' IE_DESC = 'YouTube.com'
@ -516,8 +573,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'youtube-dl test video "\'/\\ä↭𝕐', 'title': 'youtube-dl test video "\'/\\ä↭𝕐',
'uploader': 'Philipp Hagemeister', 'uploader': 'Philipp Hagemeister',
'uploader_id': 'phihag', 'uploader_id': '@PhilippHagemeister',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@PhilippHagemeister',
'channel': 'Philipp Hagemeister', 'channel': 'Philipp Hagemeister',
'channel_id': 'UCLqxVugv74EIW3VWh2NOa3Q', 'channel_id': 'UCLqxVugv74EIW3VWh2NOa3Q',
'channel_url': r're:https?://(?:www\.)?youtube\.com/channel/UCLqxVugv74EIW3VWh2NOa3Q', 'channel_url': r're:https?://(?:www\.)?youtube\.com/channel/UCLqxVugv74EIW3VWh2NOa3Q',
@ -557,8 +614,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'youtube-dl test video "\'/\\ä↭𝕐', 'title': 'youtube-dl test video "\'/\\ä↭𝕐',
'uploader': 'Philipp Hagemeister', 'uploader': 'Philipp Hagemeister',
'uploader_id': 'phihag', 'uploader_id': '@PhilippHagemeister',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@PhilippHagemeister',
'upload_date': '20121002', 'upload_date': '20121002',
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .', 'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
'categories': ['Science & Technology'], 'categories': ['Science & Technology'],
@ -588,7 +645,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'youtube_include_dash_manifest': True, 'youtube_include_dash_manifest': True,
'format': '141', 'format': '141',
}, },
'skip': 'format 141 not served anymore', 'skip': 'format 141 not served any more',
}, },
# DASH manifest with encrypted signature # DASH manifest with encrypted signature
{ {
@ -600,7 +657,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:8f5e2b82460520b619ccac1f509d43bf', 'description': 'md5:8f5e2b82460520b619ccac1f509d43bf',
'duration': 244, 'duration': 244,
'uploader': 'AfrojackVEVO', 'uploader': 'AfrojackVEVO',
'uploader_id': 'AfrojackVEVO', 'uploader_id': '@AfrojackVEVO',
'upload_date': '20131011', 'upload_date': '20131011',
'abr': 129.495, 'abr': 129.495,
}, },
@ -618,8 +675,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'duration': 219, 'duration': 219,
'upload_date': '20100909', 'upload_date': '20100909',
'uploader': 'Amazing Atheist', 'uploader': 'Amazing Atheist',
'uploader_id': 'TheAmazingAtheist', 'uploader_id': '@theamazingatheist',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@theamazingatheist',
'title': 'Burning Everyone\'s Koran', 'title': 'Burning Everyone\'s Koran',
'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms \r\n\r\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html', 'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms \r\n\r\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html',
} }
@ -635,8 +692,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': r're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}', 'description': r're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}',
'duration': 142, 'duration': 142,
'uploader': 'The Witcher', 'uploader': 'The Witcher',
'uploader_id': 'WitcherGame', 'uploader_id': '@thewitcher',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@thewitcher',
'upload_date': '20140605', 'upload_date': '20140605',
'thumbnail': 'https://i.ytimg.com/vi/HtVdAasjOgU/maxresdefault.jpg', 'thumbnail': 'https://i.ytimg.com/vi/HtVdAasjOgU/maxresdefault.jpg',
'age_limit': 18, 'age_limit': 18,
@ -659,7 +716,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:bf77e03fcae5529475e500129b05668a', 'description': 'md5:bf77e03fcae5529475e500129b05668a',
'duration': 177, 'duration': 177,
'uploader': 'FlyingKitty', 'uploader': 'FlyingKitty',
'uploader_id': 'FlyingKitty900', 'uploader_id': '@FlyingKitty900',
'upload_date': '20200408', 'upload_date': '20200408',
'thumbnail': 'https://i.ytimg.com/vi/HsUATh_Nc2U/maxresdefault.jpg', 'thumbnail': 'https://i.ytimg.com/vi/HsUATh_Nc2U/maxresdefault.jpg',
'age_limit': 18, 'age_limit': 18,
@ -682,7 +739,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:17eccca93a786d51bc67646756894066', 'description': 'md5:17eccca93a786d51bc67646756894066',
'duration': 106, 'duration': 106,
'uploader': 'Projekt Melody', 'uploader': 'Projekt Melody',
'uploader_id': 'UC1yoRdFoFJaCY-AGfD9W0wQ', 'uploader_id': '@ProjektMelody',
'upload_date': '20191227', 'upload_date': '20191227',
'age_limit': 18, 'age_limit': 18,
'thumbnail': 'https://i.ytimg.com/vi/Tq92D6wQ1mg/sddefault.jpg', 'thumbnail': 'https://i.ytimg.com/vi/Tq92D6wQ1mg/sddefault.jpg',
@ -704,10 +761,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'OOMPH! - Such Mich Find Mich (Lyrics)', 'title': 'OOMPH! - Such Mich Find Mich (Lyrics)',
'description': 'Fan Video. Music & Lyrics by OOMPH!.', 'description': 'Fan Video. Music & Lyrics by OOMPH!.',
'duration': 210, 'duration': 210,
'uploader': 'Herr Lurik',
'uploader_id': 'st3in234',
'upload_date': '20130730', 'upload_date': '20130730',
'uploader_url': 'http://www.youtube.com/user/st3in234', 'uploader': 'Herr Lurik',
'uploader_id': '@HerrLurik',
'uploader_url': 'http://www.youtube.com/@HerrLurik',
'age_limit': 0, 'age_limit': 0,
'thumbnail': 'https://i.ytimg.com/vi/MeJVWBSsPAY/hqdefault.jpg', 'thumbnail': 'https://i.ytimg.com/vi/MeJVWBSsPAY/hqdefault.jpg',
'tags': ['oomph', 'such mich find mich', 'lyrics', 'german industrial', 'musica industrial'], 'tags': ['oomph', 'such mich find mich', 'lyrics', 'german industrial', 'musica industrial'],
@ -740,8 +797,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'duration': 266, 'duration': 266,
'upload_date': '20100430', 'upload_date': '20100430',
'uploader_id': 'deadmau5', 'uploader_id': '@deadmau5',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@deadmau5',
'creator': 'deadmau5', 'creator': 'deadmau5',
'description': 'md5:6cbcd3a92ce1bc676fc4d6ab4ace2336', 'description': 'md5:6cbcd3a92ce1bc676fc4d6ab4ace2336',
'uploader': 'deadmau5', 'uploader': 'deadmau5',
@ -762,8 +819,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': r're:(?s)(?:.+\s)?HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games\s*', 'description': r're:(?s)(?:.+\s)?HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games\s*',
'duration': 6085, 'duration': 6085,
'upload_date': '20150827', 'upload_date': '20150827',
'uploader_id': 'olympic', 'uploader_id': '@Olympics',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/olympic', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@Olympics',
'uploader': r're:Olympics?', 'uploader': r're:Olympics?',
'age_limit': 0, 'age_limit': 0,
'thumbnail': 'https://i.ytimg.com/vi/lqQg6PlCWgI/maxresdefault.jpg', 'thumbnail': 'https://i.ytimg.com/vi/lqQg6PlCWgI/maxresdefault.jpg',
@ -785,8 +842,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'stretched_ratio': 16 / 9., 'stretched_ratio': 16 / 9.,
'duration': 85, 'duration': 85,
'upload_date': '20110310', 'upload_date': '20110310',
'uploader_id': 'AllenMeow', 'uploader_id': '@AllenMeow',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/AllenMeow', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@AllenMeow',
'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯', 'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯',
'uploader': '孫ᄋᄅ', 'uploader': '孫ᄋᄅ',
'title': '[A-made] 變態妍字幕版 太妍 我就是這樣的人', 'title': '[A-made] 變態妍字幕版 太妍 我就是這樣的人',
@ -824,7 +881,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'dorappi2000', 'uploader': 'dorappi2000',
'formats': 'mincount:31', 'formats': 'mincount:31',
}, },
'skip': 'not actual anymore', 'skip': 'not actual any more',
}, },
# DASH manifest with segment_list # DASH manifest with segment_list
{ {
@ -905,6 +962,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Not multifeed any more',
}, },
{ {
# Multifeed video with comma in title (see https://github.com/ytdl-org/youtube-dl/issues/8536) # Multifeed video with comma in title (see https://github.com/ytdl-org/youtube-dl/issues/8536)
@ -914,7 +972,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'DevConf.cz 2016 Day 2 Workshops 1 14:00 - 15:30', 'title': 'DevConf.cz 2016 Day 2 Workshops 1 14:00 - 15:30',
}, },
'playlist_count': 2, 'playlist_count': 2,
'skip': 'Not multifeed anymore', 'skip': 'Not multifeed any more',
}, },
{ {
'url': 'https://vid.plus/FlRa-iH7PGw', 'url': 'https://vid.plus/FlRa-iH7PGw',
@ -938,8 +996,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a', 'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a',
'duration': 133, 'duration': 133,
'upload_date': '20151119', 'upload_date': '20151119',
'uploader_id': 'IronSoulElf', 'uploader_id': '@IronSoulElf',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IronSoulElf', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@IronSoulElf',
'uploader': 'IronSoulElf', 'uploader': 'IronSoulElf',
'creator': r're:Todd Haberman[;,]\s+Daniel Law Heath and Aaron Kaplan', 'creator': r're:Todd Haberman[;,]\s+Daniel Law Heath and Aaron Kaplan',
'track': 'Dark Walk', 'track': 'Dark Walk',
@ -987,8 +1045,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:a677553cf0840649b731a3024aeff4cc', 'description': 'md5:a677553cf0840649b731a3024aeff4cc',
'duration': 721, 'duration': 721,
'upload_date': '20150127', 'upload_date': '20150127',
'uploader_id': 'BerkmanCenter', 'uploader_id': '@BKCHarvard',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@BKCHarvard',
'uploader': 'The Berkman Klein Center for Internet & Society', 'uploader': 'The Berkman Klein Center for Internet & Society',
'license': 'Creative Commons Attribution license (reuse allowed)', 'license': 'Creative Commons Attribution license (reuse allowed)',
}, },
@ -1007,8 +1065,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'duration': 4060, 'duration': 4060,
'upload_date': '20151119', 'upload_date': '20151119',
'uploader': 'Bernie Sanders', 'uploader': 'Bernie Sanders',
'uploader_id': 'UCH1dpzjCEiGAt8CXkryhkZg', 'uploader_id': '@BernieSanders',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCH1dpzjCEiGAt8CXkryhkZg', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@BernieSanders',
'license': 'Creative Commons Attribution license (reuse allowed)', 'license': 'Creative Commons Attribution license (reuse allowed)',
}, },
'params': { 'params': {
@ -1054,8 +1112,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'duration': 2085, 'duration': 2085,
'upload_date': '20170118', 'upload_date': '20170118',
'uploader': 'Vsauce', 'uploader': 'Vsauce',
'uploader_id': 'Vsauce', 'uploader_id': '@Vsauce',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Vsauce', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@Vsauce',
'series': 'Mind Field', 'series': 'Mind Field',
'season_number': 1, 'season_number': 1,
'episode_number': 1, 'episode_number': 1,
@ -1134,7 +1192,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'skip_download': True, 'skip_download': True,
'youtube_include_dash_manifest': False, 'youtube_include_dash_manifest': False,
}, },
'skip': 'not actual anymore', 'skip': 'not actual any more',
}, },
{ {
# Youtube Music Auto-generated description # Youtube Music Auto-generated description
@ -1191,8 +1249,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'IMG 3456', 'title': 'IMG 3456',
'description': '', 'description': '',
'upload_date': '20170613', 'upload_date': '20170613',
'uploader_id': 'ElevageOrVert',
'uploader': 'ElevageOrVert', 'uploader': 'ElevageOrVert',
'uploader_id': '@ElevageOrVert',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1210,8 +1268,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'Part 77 Sort a list of simple types in c#', 'title': 'Part 77 Sort a list of simple types in c#',
'description': 'md5:b8746fa52e10cdbf47997903f13b20dc', 'description': 'md5:b8746fa52e10cdbf47997903f13b20dc',
'upload_date': '20130831', 'upload_date': '20130831',
'uploader_id': 'kudvenkat',
'uploader': 'kudvenkat', 'uploader': 'kudvenkat',
'uploader_id': '@Csharp-video-tutorialsBlogspot',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1263,8 +1321,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:ea770e474b7cd6722b4c95b833c03630', 'description': 'md5:ea770e474b7cd6722b4c95b833c03630',
'upload_date': '20201120', 'upload_date': '20201120',
'uploader': 'Walk around Japan', 'uploader': 'Walk around Japan',
'uploader_id': 'UC3o_t8PzBmXf5S9b7GLx1Mw', 'uploader_id': '@walkaroundjapan7124',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UC3o_t8PzBmXf5S9b7GLx1Mw', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@walkaroundjapan7124',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1276,11 +1334,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'info_dict': { 'info_dict': {
'id': '4L2J27mJ3Dc', 'id': '4L2J27mJ3Dc',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Midwest Squid Game #Shorts',
'description': 'md5:976512b8a29269b93bbd8a61edc45a6d',
'upload_date': '20211025', 'upload_date': '20211025',
'uploader': 'Charlie Berens', 'uploader': 'Charlie Berens',
'description': 'md5:976512b8a29269b93bbd8a61edc45a6d', 'uploader_id': '@CharlieBerens',
'uploader_id': 'fivedlrmilkshake',
'title': 'Midwest Squid Game #Shorts',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1636,8 +1694,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if n_response is None: if n_response is None:
# give up if descrambling failed # give up if descrambling failed
break break
fmt['url'] = update_url( for fmt_dct in traverse_obj(fmt, (None, (None, ('fragments', Ellipsis))), expected_type=dict):
parsed_fmt_url, query_update={'n': [n_response]}) fmt_dct['url'] = update_url(
fmt_dct['url'], query_update={'n': [n_response]})
# from yt-dlp, with tweaks # from yt-dlp, with tweaks
def _extract_signature_timestamp(self, video_id, player_url, ytcfg=None, fatal=False): def _extract_signature_timestamp(self, video_id, player_url, ytcfg=None, fatal=False):
@ -1989,10 +2048,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if no_video: if no_video:
dct['abr'] = tbr dct['abr'] = tbr
if no_audio or no_video: if no_audio or no_video:
dct['downloader_options'] = { CHUNK_SIZE = 10 << 20
# Youtube throttles chunks >~10M # avoid Youtube throttling
'http_chunk_size': 10485760, dct.update({
} 'protocol': 'http_dash_segments',
'fragments': [{
'url': update_url_query(dct['url'], {
'range': '{0}-{1}'.format(range_start, min(range_start + CHUNK_SIZE - 1, dct['filesize']))
})
} for range_start in range(0, dct['filesize'], CHUNK_SIZE)]
} if dct['filesize'] else {
'downloader_options': {'http_chunk_size': CHUNK_SIZE} # No longer useful?
})
if dct.get('ext'): if dct.get('ext'):
dct['container'] = dct['ext'] + '_dash' dct['container'] = dct['ext'] + '_dash'
formats.append(dct) formats.append(dct)
@ -2088,25 +2156,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
thumbnails = [{'url': thumbnail}] thumbnails = [{'url': thumbnail}]
category = microformat.get('category') or search_meta('genre') category = microformat.get('category') or search_meta('genre')
channel_id = video_details.get('channelId') \ channel_id = self._extract_channel_id(
or microformat.get('externalChannelId') \ webpage, videodetails=video_details, metadata=microformat)
or search_meta('channelId')
duration = int_or_none( duration = int_or_none(
video_details.get('lengthSeconds') video_details.get('lengthSeconds')
or microformat.get('lengthSeconds')) \ or microformat.get('lengthSeconds')) \
or parse_duration(search_meta('duration')) or parse_duration(search_meta('duration'))
is_live = video_details.get('isLive') is_live = video_details.get('isLive')
def gen_owner_profile_url(): owner_profile_url = self._yt_urljoin(self._extract_author_var(
yield microformat.get('ownerProfileUrl') webpage, 'url', videodetails=video_details, metadata=microformat))
yield extract_attributes(self._search_regex(
r'''(?s)(<link\b[^>]+\bitemprop\s*=\s*("|')url\2[^>]*>)''',
get_element_by_attribute('itemprop', 'author', webpage),
'owner_profile_url', default='')).get('href')
owner_profile_url = next( uploader = self._extract_author_var(
(x for x in map(url_or_none, gen_owner_profile_url()) if x), webpage, 'name', videodetails=video_details, metadata=microformat)
None)
if not player_url: if not player_url:
player_url = self._extract_player_url(webpage) player_url = self._extract_player_url(webpage)
@ -2121,11 +2183,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'upload_date': unified_strdate( 'upload_date': unified_strdate(
microformat.get('uploadDate') microformat.get('uploadDate')
or search_meta('uploadDate')), or search_meta('uploadDate')),
'uploader': video_details['author'], 'uploader': uploader,
'uploader_id': self._search_regex(r'/(?:channel|user)/([^/?&#]+)', owner_profile_url, 'uploader id') if owner_profile_url else None,
'uploader_url': owner_profile_url,
'channel_id': channel_id, 'channel_id': channel_id,
'channel_url': 'https://www.youtube.com/channel/' + channel_id if channel_id else None,
'duration': duration, 'duration': duration,
'view_count': int_or_none( 'view_count': int_or_none(
video_details.get('viewCount') video_details.get('viewCount')
@ -2255,6 +2314,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
initial_data, initial_data,
lambda x: x['contents']['twoColumnWatchNextResults']['results']['results']['contents'], lambda x: x['contents']['twoColumnWatchNextResults']['results']['results']['contents'],
list) or [] list) or []
if not info['channel_id']:
channel_id = self._extract_channel_id('', renderers=contents)
if not info['uploader']:
info['uploader'] = self._extract_author_var('', 'name', renderers=contents)
if not owner_profile_url:
owner_profile_url = self._yt_urljoin(self._extract_author_var('', 'url', renderers=contents))
for content in contents: for content in contents:
vpir = content.get('videoPrimaryInfoRenderer') vpir = content.get('videoPrimaryInfoRenderer')
if vpir: if vpir:
@ -2302,10 +2368,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}) })
vsir = content.get('videoSecondaryInfoRenderer') vsir = content.get('videoSecondaryInfoRenderer')
if vsir: if vsir:
info['channel'] = get_text(try_get(
vsir,
lambda x: x['owner']['videoOwnerRenderer']['title'],
dict))
rows = try_get( rows = try_get(
vsir, vsir,
lambda x: x['metadataRowContainer']['metadataRowContainerRenderer']['rows'], lambda x: x['metadataRowContainer']['metadataRowContainerRenderer']['rows'],
@ -2363,7 +2425,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.mark_watched(video_id, player_response) self.mark_watched(video_id, player_response)
return info return merge_dicts(
info, {
'uploader_id': self._extract_uploader_id(owner_profile_url),
'uploader_url': owner_profile_url,
'channel_id': channel_id,
'channel_url': channel_id and self._yt_urljoin('/channel/' + channel_id),
'channel': info['uploader'],
})
class YoutubeTabIE(YoutubeBaseInfoExtractor): class YoutubeTabIE(YoutubeBaseInfoExtractor):
@ -2392,6 +2461,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'description': 'Short clips from Super Cooper Sundays!', 'description': 'Short clips from Super Cooper Sundays!',
'id': 'UCKMA8kHZ8bPYpnMNaUSxfEQ', 'id': 'UCKMA8kHZ8bPYpnMNaUSxfEQ',
'title': 'Super Cooper Shorts - Shorts', 'title': 'Super Cooper Shorts - Shorts',
'uploader': 'Super Cooper Shorts',
'uploader_id': '@SuperCooperShorts',
} }
}, { }, {
# Channel that does not have a Shorts tab. Test should just download videos on Home tab instead # Channel that does not have a Shorts tab. Test should just download videos on Home tab instead
@ -2402,14 +2473,17 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'title': 'Emergency Awesome - Home', 'title': 'Emergency Awesome - Home',
}, },
'playlist_mincount': 5, 'playlist_mincount': 5,
'skip': 'new test page needed to replace `Emergency Awesome - Shorts`',
}, { }, {
# playlists, multipage # playlists, multipage
'url': 'https://www.youtube.com/c/ИгорьКлейнер/playlists?view=1&flow=grid', 'url': 'https://www.youtube.com/c/ИгорьКлейнер/playlists?view=1&flow=grid',
'playlist_mincount': 94, 'playlist_mincount': 94,
'info_dict': { 'info_dict': {
'id': 'UCqj7Cz7revf5maW9g5pgNcg', 'id': 'UCqj7Cz7revf5maW9g5pgNcg',
'title': 'Игорь Клейнер - Playlists', 'title': 'Igor Kleiner - Playlists',
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2', 'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
'uploader': 'Igor Kleiner',
'uploader_id': '@IgorDataScience',
}, },
}, { }, {
# playlists, multipage, different order # playlists, multipage, different order
@ -2417,8 +2491,10 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'playlist_mincount': 94, 'playlist_mincount': 94,
'info_dict': { 'info_dict': {
'id': 'UCqj7Cz7revf5maW9g5pgNcg', 'id': 'UCqj7Cz7revf5maW9g5pgNcg',
'title': 'Игорь Клейнер - Playlists', 'title': 'Igor Kleiner - Playlists',
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2', 'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
'uploader': 'Igor Kleiner',
'uploader_id': '@IgorDataScience',
}, },
}, { }, {
# playlists, series # playlists, series
@ -2428,6 +2504,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'id': 'UCYO_jab_esuFRV4b17AJtAw', 'id': 'UCYO_jab_esuFRV4b17AJtAw',
'title': '3Blue1Brown - Playlists', 'title': '3Blue1Brown - Playlists',
'description': 'md5:e1384e8a133307dd10edee76e875d62f', 'description': 'md5:e1384e8a133307dd10edee76e875d62f',
'uploader': '3Blue1Brown',
'uploader_id': '@3blue1brown',
}, },
}, { }, {
# playlists, singlepage # playlists, singlepage
@ -2437,6 +2515,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'id': 'UCAEtajcuhQ6an9WEzY9LEMQ', 'id': 'UCAEtajcuhQ6an9WEzY9LEMQ',
'title': 'ThirstForScience - Playlists', 'title': 'ThirstForScience - Playlists',
'description': 'md5:609399d937ea957b0f53cbffb747a14c', 'description': 'md5:609399d937ea957b0f53cbffb747a14c',
'uploader': 'ThirstForScience',
'uploader_id': '@ThirstForScience',
} }
}, { }, {
'url': 'https://www.youtube.com/c/ChristophLaimer/playlists', 'url': 'https://www.youtube.com/c/ChristophLaimer/playlists',
@ -2445,20 +2525,22 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
# basic, single video playlist # basic, single video playlist
'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc', 'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc',
'info_dict': { 'info_dict': {
'uploader_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
'uploader': 'Sergey M.',
'id': 'PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc', 'id': 'PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc',
'title': 'youtube-dl public playlist', 'title': 'youtube-dl public playlist',
'uploader': 'Sergey M.',
'uploader_id': '@sergeym.6173',
'channel_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
}, },
'playlist_count': 1, 'playlist_count': 1,
}, { }, {
# empty playlist # empty playlist
'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf', 'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf',
'info_dict': { 'info_dict': {
'uploader_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
'uploader': 'Sergey M.',
'id': 'PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf', 'id': 'PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf',
'title': 'youtube-dl empty playlist', 'title': 'youtube-dl empty playlist',
'uploader': 'Sergey M.',
'uploader_id': '@sergeym.6173',
'channel_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
}, },
'playlist_count': 0, 'playlist_count': 0,
}, { }, {
@ -2468,6 +2550,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w', 'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
'title': 'lex will - Home', 'title': 'lex will - Home',
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488', 'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
'uploader': 'lex will',
'uploader_id': '@lexwill718',
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
}, { }, {
@ -2477,6 +2561,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w', 'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
'title': 'lex will - Videos', 'title': 'lex will - Videos',
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488', 'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
'uploader': 'lex will',
'uploader_id': '@lexwill718',
}, },
'playlist_mincount': 975, 'playlist_mincount': 975,
}, { }, {
@ -2486,6 +2572,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w', 'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
'title': 'lex will - Videos', 'title': 'lex will - Videos',
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488', 'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
'uploader': 'lex will',
'uploader_id': '@lexwill718',
}, },
'playlist_mincount': 199, 'playlist_mincount': 199,
}, { }, {
@ -2495,6 +2583,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w', 'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
'title': 'lex will - Playlists', 'title': 'lex will - Playlists',
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488', 'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
'uploader': 'lex will',
'uploader_id': '@lexwill718',
}, },
'playlist_mincount': 17, 'playlist_mincount': 17,
}, { }, {
@ -2504,6 +2594,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w', 'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
'title': 'lex will - Community', 'title': 'lex will - Community',
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488', 'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
'uploader': 'lex will',
'uploader_id': '@lexwill718',
}, },
'playlist_mincount': 18, 'playlist_mincount': 18,
}, { }, {
@ -2513,8 +2605,10 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w', 'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
'title': 'lex will - Channels', 'title': 'lex will - Channels',
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488', 'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
'uploader': 'lex will',
'uploader_id': '@lexwill718',
}, },
'playlist_mincount': 138, 'playlist_mincount': 75,
}, { }, {
'url': 'https://invidio.us/channel/UCmlqkdCBesrv2Lak1mF_MxA', 'url': 'https://invidio.us/channel/UCmlqkdCBesrv2Lak1mF_MxA',
'only_matching': True, 'only_matching': True,
@ -2531,7 +2625,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'title': '29C3: Not my department', 'title': '29C3: Not my department',
'id': 'PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC', 'id': 'PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
'uploader': 'Christiaan008', 'uploader': 'Christiaan008',
'uploader_id': 'UCEPzS1rYsrkqzSLNp76nrcg', 'uploader_id': '@ChRiStIaAn008',
'channel_id': 'UCEPzS1rYsrkqzSLNp76nrcg',
}, },
'playlist_count': 96, 'playlist_count': 96,
}, { }, {
@ -2541,7 +2636,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'title': 'Uploads from Cauchemar', 'title': 'Uploads from Cauchemar',
'id': 'UUBABnxM4Ar9ten8Mdjj1j0Q', 'id': 'UUBABnxM4Ar9ten8Mdjj1j0Q',
'uploader': 'Cauchemar', 'uploader': 'Cauchemar',
'uploader_id': 'UCBABnxM4Ar9ten8Mdjj1j0Q', 'uploader_id': '@Cauchemar89',
'channel_id': 'UCBABnxM4Ar9ten8Mdjj1j0Q',
}, },
'playlist_mincount': 1123, 'playlist_mincount': 1123,
}, { }, {
@ -2555,7 +2651,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'title': 'Uploads from Interstellar Movie', 'title': 'Uploads from Interstellar Movie',
'id': 'UUXw-G3eDE9trcvY2sBMM_aA', 'id': 'UUXw-G3eDE9trcvY2sBMM_aA',
'uploader': 'Interstellar Movie', 'uploader': 'Interstellar Movie',
'uploader_id': 'UCXw-G3eDE9trcvY2sBMM_aA', 'uploader_id': '@InterstellarMovie',
'channel_id': 'UCXw-G3eDE9trcvY2sBMM_aA',
}, },
'playlist_mincount': 21, 'playlist_mincount': 21,
}, { }, {
@ -2564,8 +2661,9 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'info_dict': { 'info_dict': {
'title': 'Data Analysis with Dr Mike Pound', 'title': 'Data Analysis with Dr Mike Pound',
'id': 'PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba', 'id': 'PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba',
'uploader_id': 'UC9-y-6csu5WGm29I7JiwpnA',
'uploader': 'Computerphile', 'uploader': 'Computerphile',
'uploader_id': '@Computerphile',
'channel_id': 'UC9-y-6csu5WGm29I7JiwpnA',
}, },
'playlist_mincount': 11, 'playlist_mincount': 11,
}, { }, {
@ -2603,14 +2701,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
}, { }, {
'url': 'https://www.youtube.com/channel/UCoMdktPbSTixAyNGwb-UYkQ/live', 'url': 'https://www.youtube.com/channel/UCoMdktPbSTixAyNGwb-UYkQ/live',
'info_dict': { 'info_dict': {
'id': '9Auq9mYxFEE', 'id': r're:[\da-zA-Z_-]{8,}',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Watch Sky News live', 'title': r're:(?s)[A-Z].{20,}',
'uploader': 'Sky News', 'uploader': 'Sky News',
'uploader_id': 'skynews', 'uploader_id': '@SkyNews',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/skynews', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@SkyNews',
'upload_date': '20191102', 'upload_date': r're:\d{8}',
'description': 'md5:78de4e1c2359d0ea3ed829678e38b662', 'description': r're:(?s)(?:.*\n)+SUBSCRIBE to our YouTube channel for more videos: http://www\.youtube\.com/skynews *\n.*',
'categories': ['News & Politics'], 'categories': ['News & Politics'],
'tags': list, 'tags': list,
'like_count': int, 'like_count': int,
@ -2699,34 +2797,22 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
}, { }, {
'note': 'Search tab', 'note': 'Search tab',
'url': 'https://www.youtube.com/c/3blue1brown/search?query=linear%20algebra', 'url': 'https://www.youtube.com/c/3blue1brown/search?query=linear%20algebra',
'playlist_mincount': 40, 'playlist_mincount': 20,
'info_dict': { 'info_dict': {
'id': 'UCYO_jab_esuFRV4b17AJtAw', 'id': 'UCYO_jab_esuFRV4b17AJtAw',
'title': '3Blue1Brown - Search - linear algebra', 'title': '3Blue1Brown - Search - linear algebra',
'description': 'md5:e1384e8a133307dd10edee76e875d62f', 'description': 'md5:e1384e8a133307dd10edee76e875d62f',
'uploader': '3Blue1Brown', 'uploader': '3Blue1Brown',
'uploader_id': 'UCYO_jab_esuFRV4b17AJtAw', 'uploader_id': '@3blue1brown',
'channel_id': 'UCYO_jab_esuFRV4b17AJtAw',
} }
}] }]
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if YoutubeIE.suitable(url) else super( return not YoutubeIE.suitable(url) and super(
YoutubeTabIE, cls).suitable(url) YoutubeTabIE, cls).suitable(url)
def _extract_channel_id(self, webpage):
channel_id = self._html_search_meta(
'channelId', webpage, 'channel id', default=None)
if channel_id:
return channel_id
channel_url = self._html_search_meta(
('og:url', 'al:ios:url', 'al:android:url', 'al:web:url',
'twitter:url', 'twitter:app:url:iphone', 'twitter:app:url:ipad',
'twitter:app:url:googleplay'), webpage, 'channel url')
return self._search_regex(
r'https?://(?:www\.)?youtube\.com/channel/([^/?#&])+',
channel_url, 'channel id')
@staticmethod @staticmethod
def _extract_grid_item_renderer(item): def _extract_grid_item_renderer(item):
assert isinstance(item, dict) assert isinstance(item, dict)
@ -3114,27 +3200,18 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
else: else:
raise ExtractorError('Unable to find selected tab') raise ExtractorError('Unable to find selected tab')
@staticmethod def _extract_uploader(self, metadata, data):
def _extract_uploader(data):
uploader = {} uploader = {}
sidebar_renderer = try_get( renderers = traverse_obj(data,
data, lambda x: x['sidebar']['playlistSidebarRenderer']['items'], list) ('sidebar', 'playlistSidebarRenderer', 'items'))
if sidebar_renderer: uploader['channel_id'] = self._extract_channel_id('', metadata=metadata, renderers=renderers)
for item in sidebar_renderer: uploader['uploader'] = (
if not isinstance(item, dict): self._extract_author_var('', 'name', renderers=renderers)
continue or self._extract_author_var('', 'name', metadata=metadata))
renderer = item.get('playlistSidebarSecondaryInfoRenderer') uploader['uploader_url'] = self._yt_urljoin(
if not isinstance(renderer, dict): self._extract_author_var('', 'url', metadata=metadata, renderers=renderers))
continue uploader['uploader_id'] = self._extract_uploader_id(uploader['uploader_url'])
owner = try_get( uploader['channel'] = uploader['uploader']
renderer, lambda x: x['videoOwner']['videoOwnerRenderer']['title']['runs'][0], dict)
if owner:
uploader['uploader'] = owner.get('text')
uploader['uploader_id'] = try_get(
owner, lambda x: x['navigationEndpoint']['browseEndpoint']['browseId'], compat_str)
uploader['uploader_url'] = urljoin(
'https://www.youtube.com/',
try_get(owner, lambda x: x['navigationEndpoint']['browseEndpoint']['canonicalBaseUrl'], compat_str))
return uploader return uploader
@staticmethod @staticmethod
@ -3185,8 +3262,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
self._entries(selected_tab, item_id, webpage), self._entries(selected_tab, item_id, webpage),
playlist_id=playlist_id, playlist_title=title, playlist_id=playlist_id, playlist_title=title,
playlist_description=description) playlist_description=description)
playlist.update(self._extract_uploader(data)) return merge_dicts(playlist, self._extract_uploader(renderer, data))
return playlist
def _extract_from_playlist(self, item_id, url, data, playlist): def _extract_from_playlist(self, item_id, url, data, playlist):
title = playlist.get('title') or try_get( title = playlist.get('title') or try_get(
@ -3273,8 +3349,9 @@ class YoutubePlaylistIE(InfoExtractor):
'info_dict': { 'info_dict': {
'title': '[OLD]Team Fortress 2 (Class-based LP)', 'title': '[OLD]Team Fortress 2 (Class-based LP)',
'id': 'PLBB231211A4F62143', 'id': 'PLBB231211A4F62143',
'uploader': 'Wickydoo', 'uploader': 'Wickman',
'uploader_id': 'UCKSpbfbl5kRQpTdL7kMc-1Q', 'uploader_id': '@WickmanVT',
'channel_id': 'UCKSpbfbl5kRQpTdL7kMc-1Q',
}, },
'playlist_mincount': 29, 'playlist_mincount': 29,
}, { }, {
@ -3288,21 +3365,25 @@ class YoutubePlaylistIE(InfoExtractor):
}, { }, {
'note': 'embedded', 'note': 'embedded',
'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu', 'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
'playlist_count': 4, # TODO: full playlist requires _reload_with_unavailable_videos()
# 'playlist_count': 4,
'playlist_mincount': 1,
'info_dict': { 'info_dict': {
'title': 'JODA15', 'title': 'JODA15',
'id': 'PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu', 'id': 'PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
'uploader': 'milan', 'uploader': 'milan',
'uploader_id': 'UCEI1-PVPcYXjB73Hfelbmaw', 'uploader_id': '@milan5503',
'channel_id': 'UCEI1-PVPcYXjB73Hfelbmaw',
} }
}, { }, {
'url': 'http://www.youtube.com/embed/_xDOZElKyNU?list=PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl', 'url': 'http://www.youtube.com/embed/_xDOZElKyNU?list=PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
'playlist_mincount': 982, 'playlist_mincount': 455,
'info_dict': { 'info_dict': {
'title': '2018 Chinese New Singles (11/6 updated)', 'title': '2018 Chinese New Singles (11/6 updated)',
'id': 'PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl', 'id': 'PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
'uploader': 'LBK', 'uploader': 'LBK',
'uploader_id': 'UC21nz3_MesPLqtDqwdvnoxA', 'uploader_id': '@music_king',
'channel_id': 'UC21nz3_MesPLqtDqwdvnoxA',
} }
}, { }, {
'url': 'TLGGrESM50VT6acwMjAyMjAxNw', 'url': 'TLGGrESM50VT6acwMjAyMjAxNw',
@ -3340,8 +3421,8 @@ class YoutubeYtBeIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Small Scale Baler and Braiding Rugs', 'title': 'Small Scale Baler and Braiding Rugs',
'uploader': 'Backus-Page House Museum', 'uploader': 'Backus-Page House Museum',
'uploader_id': 'backuspagemuseum', 'uploader_id': '@backuspagemuseum',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/@backuspagemuseum',
'upload_date': '20161008', 'upload_date': '20161008',
'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a', 'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
'categories': ['Nonprofits & Activism'], 'categories': ['Nonprofits & Activism'],

View file

@ -12,9 +12,11 @@ from .utils import (
js_to_json, js_to_json,
remove_quotes, remove_quotes,
unified_timestamp, unified_timestamp,
variadic,
) )
from .compat import ( from .compat import (
compat_basestring, compat_basestring,
compat_chr,
compat_collections_chain_map as ChainMap, compat_collections_chain_map as ChainMap,
compat_itertools_zip_longest as zip_longest, compat_itertools_zip_longest as zip_longest,
compat_str, compat_str,
@ -201,14 +203,14 @@ class JSInterpreter(object):
def __init__(self, msg, *args, **kwargs): def __init__(self, msg, *args, **kwargs):
expr = kwargs.pop('expr', None) expr = kwargs.pop('expr', None)
if expr is not None: if expr is not None:
msg = '{0} in: {1!r}'.format(msg.rstrip(), expr[:100]) msg = '{0} in: {1!r:.100}'.format(msg.rstrip(), expr)
super(JSInterpreter.Exception, self).__init__(msg, *args, **kwargs) super(JSInterpreter.Exception, self).__init__(msg, *args, **kwargs)
class JS_RegExp(object): class JS_RegExp(object):
_RE_FLAGS = { RE_FLAGS = {
# special knowledge: Python's re flags are bitmask values, current max 128 # special knowledge: Python's re flags are bitmask values, current max 128
# invent new bitmask values well above that for literal parsing # invent new bitmask values well above that for literal parsing
# TODO: new pattern class to execute matches with these flags # TODO: execute matches with these flags (remaining: d, y)
'd': 1024, # Generate indices for substring matches 'd': 1024, # Generate indices for substring matches
'g': 2048, # Global search 'g': 2048, # Global search
'i': re.I, # Case-insensitive search 'i': re.I, # Case-insensitive search
@ -218,12 +220,19 @@ class JSInterpreter(object):
'y': 4096, # Perform a "sticky" search that matches starting at the current position in the target string 'y': 4096, # Perform a "sticky" search that matches starting at the current position in the target string
} }
def __init__(self, pattern_txt, flags=''): def __init__(self, pattern_txt, flags=0):
if isinstance(flags, compat_str): if isinstance(flags, compat_str):
flags, _ = self.regex_flags(flags) flags, _ = self.regex_flags(flags)
# Thx: https://stackoverflow.com/questions/44773522/setattr-on-python2-sre-sre-pattern
# First, avoid https://github.com/python/cpython/issues/74534 # First, avoid https://github.com/python/cpython/issues/74534
self.__self = re.compile(pattern_txt.replace('[[', r'[\['), flags) self.__self = None
self.__pattern_txt = pattern_txt.replace('[[', r'[\[')
self.__flags = flags
def __instantiate(self):
if self.__self:
return
self.__self = re.compile(self.__pattern_txt, self.__flags)
# Thx: https://stackoverflow.com/questions/44773522/setattr-on-python2-sre-sre-pattern
for name in dir(self.__self): for name in dir(self.__self):
# Only these? Obviously __class__, __init__. # Only these? Obviously __class__, __init__.
# PyPy creates a __weakref__ attribute with value None # PyPy creates a __weakref__ attribute with value None
@ -232,15 +241,21 @@ class JSInterpreter(object):
continue continue
setattr(self, name, getattr(self.__self, name)) setattr(self, name, getattr(self.__self, name))
def __getattr__(self, name):
self.__instantiate()
if hasattr(self, name):
return getattr(self, name)
return super(JSInterpreter.JS_RegExp, self).__getattr__(name)
@classmethod @classmethod
def regex_flags(cls, expr): def regex_flags(cls, expr):
flags = 0 flags = 0
if not expr: if not expr:
return flags, expr return flags, expr
for idx, ch in enumerate(expr): for idx, ch in enumerate(expr):
if ch not in cls._RE_FLAGS: if ch not in cls.RE_FLAGS:
break break
flags |= cls._RE_FLAGS[ch] flags |= cls.RE_FLAGS[ch]
return flags, expr[idx + 1:] return flags, expr[idx + 1:]
@classmethod @classmethod
@ -262,20 +277,20 @@ class JSInterpreter(object):
if not expr: if not expr:
return return
# collections.Counter() is ~10% slower in both 2.7 and 3.9 # collections.Counter() is ~10% slower in both 2.7 and 3.9
counters = {k: 0 for k in _MATCHING_PARENS.values()} counters = dict((k, 0) for k in _MATCHING_PARENS.values())
start, splits, pos, delim_len = 0, 0, 0, len(delim) - 1 start, splits, pos, delim_len = 0, 0, 0, len(delim) - 1
in_quote, escaping, skipping = None, False, 0 in_quote, escaping, skipping = None, False, 0
after_op, in_regex_char_group, skip_re = True, False, 0 after_op, in_regex_char_group = True, False
for idx, char in enumerate(expr): for idx, char in enumerate(expr):
if skip_re > 0: paren_delta = 0
skip_re -= 1
continue
if not in_quote: if not in_quote:
if char in _MATCHING_PARENS: if char in _MATCHING_PARENS:
counters[_MATCHING_PARENS[char]] += 1 counters[_MATCHING_PARENS[char]] += 1
paren_delta = 1
elif char in counters: elif char in counters:
counters[char] -= 1 counters[char] -= 1
paren_delta = -1
if not escaping: if not escaping:
if char in _QUOTES and in_quote in (char, None): if char in _QUOTES and in_quote in (char, None):
if in_quote or after_op or char != '/': if in_quote or after_op or char != '/':
@ -283,7 +298,7 @@ class JSInterpreter(object):
elif in_quote == '/' and char in '[]': elif in_quote == '/' and char in '[]':
in_regex_char_group = char == '[' in_regex_char_group = char == '['
escaping = not escaping and in_quote and char == '\\' escaping = not escaping and in_quote and char == '\\'
after_op = not in_quote and (char in cls.OP_CHARS or (char.isspace() and after_op)) after_op = not in_quote and (char in cls.OP_CHARS or paren_delta > 0 or (after_op and char.isspace()))
if char != delim[pos] or any(counters.values()) or in_quote: if char != delim[pos] or any(counters.values()) or in_quote:
pos = skipping = 0 pos = skipping = 0
@ -293,7 +308,7 @@ class JSInterpreter(object):
continue continue
elif pos == 0 and skip_delims: elif pos == 0 and skip_delims:
here = expr[idx:] here = expr[idx:]
for s in skip_delims if isinstance(skip_delims, (list, tuple)) else [skip_delims]: for s in variadic(skip_delims):
if here.startswith(s) and s: if here.startswith(s) and s:
skipping = len(s) - 1 skipping = len(s) - 1
break break
@ -316,7 +331,7 @@ class JSInterpreter(object):
separated = list(cls._separate(expr, delim, 1)) separated = list(cls._separate(expr, delim, 1))
if len(separated) < 2: if len(separated) < 2:
raise cls.Exception('No terminating paren {delim} in {expr}'.format(**locals())) raise cls.Exception('No terminating paren {delim} in {expr!r:.5500}'.format(**locals()))
return separated[0][1:].strip(), separated[1].strip() return separated[0][1:].strip(), separated[1].strip()
@staticmethod @staticmethod
@ -361,6 +376,20 @@ class JSInterpreter(object):
except TypeError: except TypeError:
return self._named_object(namespace, obj) return self._named_object(namespace, obj)
# used below
_VAR_RET_THROW_RE = re.compile(r'''(?x)
(?P<var>(?:var|const|let)\s)|return(?:\s+|(?=["'])|$)|(?P<throw>throw\s+)
''')
_COMPOUND_RE = re.compile(r'''(?x)
(?P<try>try)\s*\{|
(?P<if>if)\s*\(|
(?P<switch>switch)\s*\(|
(?P<for>for)\s*\(|
(?P<while>while)\s*\(
''')
_FINALLY_RE = re.compile(r'finally\s*\{')
_SWITCH_RE = re.compile(r'switch\s*\(')
def interpret_statement(self, stmt, local_vars, allow_recursion=100): def interpret_statement(self, stmt, local_vars, allow_recursion=100):
if allow_recursion < 0: if allow_recursion < 0:
raise self.Exception('Recursion limit reached') raise self.Exception('Recursion limit reached')
@ -375,7 +404,7 @@ class JSInterpreter(object):
if should_return: if should_return:
return ret, should_return return ret, should_return
m = re.match(r'(?P<var>(?:var|const|let)\s)|return(?:\s+|(?=["\'])|$)|(?P<throw>throw\s+)', stmt) m = self._VAR_RET_THROW_RE.match(stmt)
if m: if m:
expr = stmt[len(m.group(0)):].strip() expr = stmt[len(m.group(0)):].strip()
if m.group('throw'): if m.group('throw'):
@ -405,7 +434,7 @@ class JSInterpreter(object):
left, right = self._separate_at_paren(obj[len(klass):]) left, right = self._separate_at_paren(obj[len(klass):])
argvals = self.interpret_iter(left, local_vars, allow_recursion) argvals = self.interpret_iter(left, local_vars, allow_recursion)
expr = konstr(*argvals) expr = konstr(*argvals)
if not expr: if expr is None:
raise self.Exception('Failed to parse {klass} {left!r:.100}'.format(**locals()), expr=expr) raise self.Exception('Failed to parse {klass} {left!r:.100}'.format(**locals()), expr=expr)
expr = self._dump(expr, local_vars) + right expr = self._dump(expr, local_vars) + right
break break
@ -447,13 +476,7 @@ class JSInterpreter(object):
for item in self._separate(inner)]) for item in self._separate(inner)])
expr = name + outer expr = name + outer
m = re.match(r'''(?x) m = self._COMPOUND_RE.match(expr)
(?P<try>try)\s*\{|
(?P<if>if)\s*\(|
(?P<switch>switch)\s*\(|
(?P<for>for)\s*\(|
(?P<while>while)\s*\(
''', expr)
md = m.groupdict() if m else {} md = m.groupdict() if m else {}
if md.get('if'): if md.get('if'):
cndn, expr = self._separate_at_paren(expr[m.end() - 1:]) cndn, expr = self._separate_at_paren(expr[m.end() - 1:])
@ -512,7 +535,7 @@ class JSInterpreter(object):
err = None err = None
pending = self.interpret_statement(sub_expr, catch_vars, allow_recursion) pending = self.interpret_statement(sub_expr, catch_vars, allow_recursion)
m = re.match(r'finally\s*\{', expr) m = self._FINALLY_RE.match(expr)
if m: if m:
sub_expr, expr = self._separate_at_paren(expr[m.end() - 1:]) sub_expr, expr = self._separate_at_paren(expr[m.end() - 1:])
ret, should_abort = self.interpret_statement(sub_expr, local_vars, allow_recursion) ret, should_abort = self.interpret_statement(sub_expr, local_vars, allow_recursion)
@ -531,7 +554,7 @@ class JSInterpreter(object):
if remaining.startswith('{'): if remaining.startswith('{'):
body, expr = self._separate_at_paren(remaining) body, expr = self._separate_at_paren(remaining)
else: else:
switch_m = re.match(r'switch\s*\(', remaining) # FIXME switch_m = self._SWITCH_RE.match(remaining) # FIXME
if switch_m: if switch_m:
switch_val, remaining = self._separate_at_paren(remaining[switch_m.end() - 1:]) switch_val, remaining = self._separate_at_paren(remaining[switch_m.end() - 1:])
body, expr = self._separate_at_paren(remaining, '}') body, expr = self._separate_at_paren(remaining, '}')
@ -699,7 +722,7 @@ class JSInterpreter(object):
""" assert, but without risk of getting optimized out """ """ assert, but without risk of getting optimized out """
if not cndn: if not cndn:
memb = member memb = member
raise self.Exception('{member} {msg}'.format(**locals()), expr=expr) raise self.Exception('{memb} {msg}'.format(**locals()), expr=expr)
def eval_method(): def eval_method():
if (variable, member) == ('console', 'debug'): if (variable, member) == ('console', 'debug'):
@ -735,7 +758,7 @@ class JSInterpreter(object):
if obj == compat_str: if obj == compat_str:
if member == 'fromCharCode': if member == 'fromCharCode':
assertion(argvals, 'takes one or more arguments') assertion(argvals, 'takes one or more arguments')
return ''.join(map(chr, argvals)) return ''.join(map(compat_chr, argvals))
raise self.Exception('Unsupported string method ' + member, expr=expr) raise self.Exception('Unsupported string method ' + member, expr=expr)
elif obj == float: elif obj == float:
if member == 'pow': if member == 'pow':
@ -808,10 +831,17 @@ class JSInterpreter(object):
if idx >= len(obj): if idx >= len(obj):
return None return None
return ord(obj[idx]) return ord(obj[idx])
elif member == 'replace': elif member in ('replace', 'replaceAll'):
assertion(isinstance(obj, compat_str), 'must be applied on a string') assertion(isinstance(obj, compat_str), 'must be applied on a string')
assertion(len(argvals) == 2, 'takes exactly two arguments') assertion(len(argvals) == 2, 'takes exactly two arguments')
return re.sub(argvals[0], argvals[1], obj) # TODO: argvals[1] callable, other Py vs JS edge cases
if isinstance(argvals[0], self.JS_RegExp):
count = 0 if argvals[0].flags & self.JS_RegExp.RE_FLAGS['g'] else 1
assertion(member != 'replaceAll' or count == 0,
'replaceAll must be called with a global RegExp')
return argvals[0].sub(argvals[1], obj, count=count)
count = ('replaceAll', 'replace').index(member)
return re.sub(re.escape(argvals[0]), argvals[1], obj, count=count)
idx = int(member) if isinstance(obj, list) else member idx = int(member) if isinstance(obj, list) else member
return obj[idx](argvals, allow_recursion=allow_recursion) return obj[idx](argvals, allow_recursion=allow_recursion)

View file

@ -2176,11 +2176,11 @@ def sanitize_url(url):
for mistake, fixup in COMMON_TYPOS: for mistake, fixup in COMMON_TYPOS:
if re.match(mistake, url): if re.match(mistake, url):
return re.sub(mistake, fixup, url) return re.sub(mistake, fixup, url)
return escape_url(url) return url
def sanitized_Request(url, *args, **kwargs): def sanitized_Request(url, *args, **kwargs):
return compat_urllib_request.Request(sanitize_url(url), *args, **kwargs) return compat_urllib_request.Request(escape_url(sanitize_url(url)), *args, **kwargs)
def expand_path(s): def expand_path(s):