Skip to content

Commit

Permalink
Merge branch 'ytdl-org:master' into df-sbs-extractor-ovrhaul
Browse files Browse the repository at this point in the history
  • Loading branch information
dirkf authored Oct 8, 2024
2 parents de91fe7 + c509896 commit 3abae00
Show file tree
Hide file tree
Showing 84 changed files with 9,611 additions and 2,989 deletions.
468 changes: 433 additions & 35 deletions .github/workflows/ci.yml

Large diffs are not rendered by default.

113 changes: 101 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.ex
You can also use pip:

sudo -H pip install --upgrade youtube-dl

This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.

macOS users can install youtube-dl with [Homebrew](https://brew.sh/):
Expand Down Expand Up @@ -563,7 +563,7 @@ The basic usage is not to set any template arguments when downloading a single f
- `is_live` (boolean): Whether this video is a live stream or a fixed-length video
- `start_time` (numeric): Time in seconds where the reproduction should start, as specified in the URL
- `end_time` (numeric): Time in seconds where the reproduction should end, as specified in the URL
- `format` (string): A human-readable description of the format
- `format` (string): A human-readable description of the format
- `format_id` (string): Format code specified by `--format`
- `format_note` (string): Additional info about the format
- `width` (numeric): Width of the video
Expand Down Expand Up @@ -675,7 +675,7 @@ The general syntax for format selection is `--format FORMAT` or shorter `-f FORM

**tl;dr:** [navigate me to examples](#format-selection-examples).

The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.

You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download the best quality format of a particular file extension served as a single file, e.g. `-f webm` will download the best quality format with the `webm` extension served as a single file.

Expand Down Expand Up @@ -760,7 +760,7 @@ Videos can be filtered by their upload date using the options `--date`, `--dateb

- Absolute dates: Dates in the format `YYYYMMDD`.
- Relative dates: Dates in the format `(now|today)[+-][0-9](day|week|month|year)(s)?`

Examples:

```bash
Expand Down Expand Up @@ -1000,6 +1000,8 @@ To run the test, simply invoke your favorite test runner, or execute a test file
python test/test_download.py
nosetests

For Python versions 3.6 and later, you can use [pynose](https://pypi.org/project/pynose/) to implement `nosetests`. The original [nose](https://pypi.org/project/nose/) has not been upgraded for 3.10 and later.

See item 6 of [new extractor tutorial](#adding-support-for-a-new-site) for how to run extractor specific test cases.

If you want to create a build of youtube-dl yourself, you'll need
Expand Down Expand Up @@ -1091,7 +1093,7 @@ In any case, thank you very much for your contributions!

## youtube-dl coding conventions

This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
This section introduces guidelines for writing idiomatic, robust and future-proof extractor code.

Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.

Expand All @@ -1114,7 +1116,7 @@ Say you have some source dictionary `meta` that you've fetched as JSON with HTTP
```python
meta = self._download_json(url, video_id)
```

Assume at this point `meta`'s layout is:

```python
Expand Down Expand Up @@ -1158,7 +1160,7 @@ description = self._search_regex(
```

On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.

### Provide fallbacks

When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
Expand Down Expand Up @@ -1206,7 +1208,7 @@ r'(id|ID)=(?P<id>\d+)'
#### Make regular expressions relaxed and flexible

When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on.

##### Example

Say you need to extract `title` from the following HTML code:
Expand All @@ -1230,7 +1232,7 @@ title = self._search_regex(
webpage, 'title', group='title')
```

Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:

The code definitely should not look like:

Expand Down Expand Up @@ -1331,27 +1333,114 @@ Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`]

Use `url_or_none` for safe URL processing.

Use `try_get` for safe metadata extraction from parsed JSON.
Use `traverse_obj` for safe metadata extraction from parsed JSON.

Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.

Explore [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.

#### More examples

##### Safely extract optional description from parsed JSON

When processing complex JSON, as often returned by site API requests or stashed in web pages for "hydration", you can use the `traverse_obj()` utility function to handle multiple fallback values and to ensure the expected type of metadata items. The function's docstring defines how the function works: also review usage in the codebase for more examples.

In this example, a text `description`, or `None`, is pulled from the `.result.video[0].summary` member of the parsed JSON `response`, if available.

```python
description = traverse_obj(response, ('result', 'video', 0, 'summary', T(compat_str)))
```
`T(...)` is a shorthand for a set literal; if you hate people who still run Python 2.6, `T(type_or_transformation)` could be written as a set literal `{type_or_transformation}`.

Some extractors use the older and less capable `try_get()` function in the same way.

```python
description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
```

##### Safely extract more optional metadata

In this example, various optional metadata values are extracted from the `.result.video[0]` member of the parsed JSON `response`, which is expected to be a JS object, parsed into a `dict`, with no crash if that isn't so, or if any of the target values are missing or invalid.

```python
video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
video = traverse_obj(response, ('result', 'video', 0, T(dict))) or {}
# formerly:
# video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
description = video.get('summary')
duration = float_or_none(video.get('durationMs'), scale=1000)
view_count = int_or_none(video.get('views'))
```

#### Safely extract nested lists

Suppose you've extracted JSON like this into a Python data structure named `media_json` using, say, the `_download_json()` or `_parse_json()` methods of `InfoExtractor`:
```json
{
"title": "Example video",
"comment": "try extracting this",
"media": [{
"type": "bad",
"size": 320,
"url": "https://some.cdn.site/bad.mp4"
}, {
"type": "streaming",
"url": "https://some.cdn.site/hls.m3u8"
}, {
"type": "super",
"size": 1280,
"url": "https://some.cdn.site/good.webm"
}],
"moreStuff": "more values",
...
}
```

Then extractor code like this can collect the various fields of the JSON:
```python
...
from ..utils import (
determine_ext,
int_or_none,
T,
traverse_obj,
txt_or_none,
url_or_none,
)
...
...
info_dict = {}
# extract title and description if valid and not empty
info_dict.update(traverse_obj(media_json, {
'title': ('title', T(txt_or_none)),
'description': ('comment', T(txt_or_none)),
}))

# extract any recognisable media formats
fmts = []
# traverse into "media" list, extract `dict`s with desired keys
for fmt in traverse_obj(media_json, ('media', Ellipsis, {
'format_id': ('type', T(txt_or_none)),
'url': ('url', T(url_or_none)),
'width': ('size', T(int_or_none)), })):
# bad `fmt` values were `None` and removed
if 'url' not in fmt:
continue
fmt_url = fmt['url'] # known to be valid URL
ext = determine_ext(fmt_url)
if ext == 'm3u8':
fmts.extend(self._extract_m3u8_formats(fmt_url, video_id, 'mp4', fatal=False))
else:
fmt['ext'] = ext
fmts.append(fmt)

# sort, raise if no formats
self._sort_formats(fmts)

info_dict['formats'] = fmts
...
```
The extractor raises an exception rather than random crashes if the JSON structure changes so that no formats are found.

# EMBEDDING YOUTUBE-DL

youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/ytdl-org/youtube-dl/issues/new).
Expand Down
1 change: 1 addition & 0 deletions devscripts/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Empty file needed to make devscripts.utils properly importable from outside
11 changes: 7 additions & 4 deletions devscripts/bash-completion.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,12 @@
from os.path import dirname as dirn
import sys

sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))

import youtube_dl
from youtube_dl.compat import compat_open as open

from utils import read_file

BASH_COMPLETION_FILE = "youtube-dl.bash-completion"
BASH_COMPLETION_TEMPLATE = "devscripts/bash-completion.in"
Expand All @@ -18,9 +22,8 @@ def build_completion(opt_parser):
for option in group.option_list:
# for every long flag
opts_flag.append(option.get_opt_string())
with open(BASH_COMPLETION_TEMPLATE) as f:
template = f.read()
with open(BASH_COMPLETION_FILE, "w") as f:
template = read_file(BASH_COMPLETION_TEMPLATE)
with open(BASH_COMPLETION_FILE, "w", encoding='utf-8') as f:
# just using the special char
filled_template = template.replace("{{flags}}", " ".join(opts_flag))
f.write(filled_template)
Expand Down
25 changes: 22 additions & 3 deletions devscripts/cli_to_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,34 @@ def parsed_options(argv):

# from https://github.com/yt-dlp/yt-dlp/issues/5859#issuecomment-1363938900
default = parsed_options([])
diff = dict((k, v) for k, v in parsed_options(opts).items() if default[k] != v)

def neq_opt(a, b):
if a == b:
return False
if a is None and repr(type(object)).endswith(".utils.DateRange'>"):
return '0001-01-01 - 9999-12-31' != '{0}'.format(b)
return a != b

diff = dict((k, v) for k, v in parsed_options(opts).items() if neq_opt(default[k], v))
if 'postprocessors' in diff:
diff['postprocessors'] = [pp for pp in diff['postprocessors'] if pp not in default['postprocessors']]
return diff


def main():
from pprint import pprint
pprint(cli_to_api(*sys.argv))
from pprint import PrettyPrinter

pprint = PrettyPrinter()
super_format = pprint.format

def format(object, context, maxlevels, level):
if repr(type(object)).endswith(".utils.DateRange'>"):
return '{0}: {1}>'.format(repr(object)[:-2], object), True, False
return super_format(object, context, maxlevels, level)

pprint.format = format

pprint.pprint(cli_to_api(*sys.argv))


if __name__ == '__main__':
Expand Down
9 changes: 5 additions & 4 deletions devscripts/create-github-release.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#!/usr/bin/env python
from __future__ import unicode_literals

import io
import json
import mimetypes
import netrc
Expand All @@ -10,7 +9,9 @@
import re
import sys

sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
dirn = os.path.dirname

sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))

from youtube_dl.compat import (
compat_basestring,
Expand All @@ -22,6 +23,7 @@
make_HTTPS_handler,
sanitized_Request,
)
from utils import read_file


class GitHubReleaser(object):
Expand Down Expand Up @@ -89,8 +91,7 @@ def main():

changelog_file, version, build_path = args

with io.open(changelog_file, encoding='utf-8') as inf:
changelog = inf.read()
changelog = read_file(changelog_file)

mobj = re.search(r'(?s)version %s\n{2}(.+?)\n{3}' % version, changelog)
body = mobj.group(1) if mobj else ''
Expand Down
11 changes: 6 additions & 5 deletions devscripts/fish-completion.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@
from os.path import dirname as dirn
import sys

sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))

import youtube_dl
from youtube_dl.utils import shell_quote

from utils import read_file, write_file

FISH_COMPLETION_FILE = 'youtube-dl.fish'
FISH_COMPLETION_TEMPLATE = 'devscripts/fish-completion.in'

Expand Down Expand Up @@ -38,11 +41,9 @@ def build_completion(opt_parser):
complete_cmd.extend(EXTRA_ARGS.get(long_option, []))
commands.append(shell_quote(complete_cmd))

with open(FISH_COMPLETION_TEMPLATE) as f:
template = f.read()
template = read_file(FISH_COMPLETION_TEMPLATE)
filled_template = template.replace('{{commands}}', '\n'.join(commands))
with open(FISH_COMPLETION_FILE, 'w') as f:
f.write(filled_template)
write_file(FISH_COMPLETION_FILE, filled_template)


parser = youtube_dl.parseOpts()[0]
Expand Down
15 changes: 10 additions & 5 deletions devscripts/gh-pages/add-version.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,21 @@
import hashlib
import os.path

dirn = os.path.dirname

sys.path.insert(0, dirn(dirn(dirn(os.path.abspath(__file__)))))

from devscripts.utils import read_file, write_file
from youtube_dl.compat import compat_open as open

if len(sys.argv) <= 1:
print('Specify the version number as parameter')
sys.exit()
version = sys.argv[1]

with open('update/LATEST_VERSION', 'w') as f:
f.write(version)
write_file('update/LATEST_VERSION', version)

versions_info = json.load(open('update/versions.json'))
versions_info = json.loads(read_file('update/versions.json'))
if 'signature' in versions_info:
del versions_info['signature']

Expand All @@ -39,5 +44,5 @@
versions_info['versions'][version] = new_version
versions_info['latest'] = version

with open('update/versions.json', 'w') as jsonf:
json.dump(versions_info, jsonf, indent=4, sort_keys=True)
with open('update/versions.json', 'w', encoding='utf-8') as jsonf:
json.dumps(versions_info, jsonf, indent=4, sort_keys=True)
Loading

0 comments on commit 3abae00

Please sign in to comment.