-
Notifications
You must be signed in to change notification settings - Fork 818
[CLI] Revamp hf cache
#3439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v1.0-release
Are you sure you want to change the base?
[CLI] Revamp hf cache
#3439
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
f06410e
to
292628d
Compare
Encountered a weird but while testing the CLI: (.venv) ➜ huggingface_hub git:(revamp-hf-cache) ✗ hf cache ls --filter "accessed>18mo" -f "size>1g" -q
model/google/gemma-7b-it
model/pyp1/VoiceCraft_giga330M
(.venv) ➜ huggingface_hub git:(revamp-hf-cache) ✗ hf cache ls --filter "accessed>18mo" -f "size>1g" -q | xargs hf cache rm
About to delete 2 repo(s) totalling 6.3G.
- model/google/gemma-7b-it (entire repo)
- model/pyp1/VoiceCraft_giga330M (entire repo)
Proceed with deletion? [y/N]: Aborted! I suspect |
80d96f3
to
ff98a2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! To be fair it was quite a long one to review ^^ Next time I think it'd be great to split it in smaller ones and leave the optional stuff for later (like the --format parameter or implementing all the filters at once). It would make things easier to review/iterate on. Anyway, now that it's here, let's keep everything 😄
… into revamp-hf-cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice job! 🔥 I've left some comments but only minor ones so feel free to merge once addressed!
I gave it a try, which was not unpleasant 😄
hf cache rm $(hf cache ls --filter "accessed>1y" -q) -y
Cache deletion done. Saved 137.4G.
Deleted 576 repo(s) and 1343 revision(s); freed 137.4G.
def format_cache_repo_id(repo: CachedRepoInfo) -> str: | ||
"""Return the canonical `type/id` string used across cache CLI outputs.""" | ||
return f"{repo.repo_type}/{repo.repo_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) could be a property of CachedRepoInfo
expected = value_raw.lower() | ||
|
||
if op != "=": | ||
raise ValueError("Only '=' is supported for 'type' filters.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError("Only '=' is supported for 'type' filters.") | |
raise ValueError(f"Only '=' is supported for 'type' filters. Got '{op}'.") |
if include_revisions: | ||
table_rows: List[List[str]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if include_revisions: | |
table_rows: List[List[str]] = [] | |
table_rows: List[List[str]] | |
if include_revisions: |
is the table_rows: List[List[str]] = []
line defined only for type hints issues? If yes, would this suggestion work? Otherwise ok to keep like this^^
print() | ||
summary = f"Found {repo_count} repo(s) for a total of {revision_count} revision(s) and {_format_size(total_size)} on disk." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print() | |
summary = f"Found {repo_count} repo(s) for a total of {revision_count} revision(s) and {_format_size(total_size)} on disk." | |
summary = f"\nFound {repo_count} repo(s) for a total of {revision_count} revision(s) and {_format_size(total_size)} on disk." |
(nit)
# | ||
# Once you've manually reviewed this file, please confirm deletion in the terminal. This file will be automatically removed once done. | ||
# ------------ | ||
@cache_cli.command(help="Remove detached revisions from the cache.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cache_cli.command(help="Remove detached revisions from the cache.") | |
@cache_cli.command() |
(same)
), | ||
] = False, | ||
) -> None: | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try: | |
"""Remove detached revisions from the cache.""" | |
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for fixing all the tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a ./tests/test_utils_parsing.py
module with a few tests for parse_size
and parse_duration
? (you can take tests from https://gist.github.com/Wauplin/a7a385db01ddbf0067325ec02bc35ce0)
msg=f"Wrong formatting for {size} == '{expected}'", | ||
) | ||
|
||
def test_format_timesince(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and move this test to ./tests/test_utils_parsing.py
?
Resolves #3432.
The goal of this PR is to completely revamp the hf cache CLI by taking inspiration from the docker CLI mainly:
hf cache ls
(inspired bydocker image ls
anddocker container ls
) : List cached repositories or revisions.hf cache rm
(inspired bydocker image rm
anddocker container rm
) : Remove cached repositories or revisions.hf cache prune
(inspired bydocker image prune
anddocker container prune
) : Remove detached revisions from the cache.The following DX is taken from the related issue mentioned above:
hf cache ls
List all cached repos
Kinda equivalent of current hf cache scan
List all revisions
Equivalent of current hf cache scan --verbose
Filter by total size
The filters are case insensitive.
Filter by last modified or accessed
Docker is able to handle many time representation ("7d", "2024-05-01", a timestamp, iso date format, timezone, etc.). In practice it's already good if we can handle semantic terms like
10s
,m
,h
,d
,w
,mo
andy
+ timestamps.Filter by repo type
hf cache ls --filter "type=dataset"
Combine filters
e.g. "give me all models of at least 1MB and not accessed for a year)"
hf cache ls --filter "type=model" --filter "size>1000000" --filter "accessed>1y"
Filters are processed as logical AND. Let's not support "OR".
Quiet mode: print only ids
Custom format
Default output format is to print as a table. But one could want to have a CSV or JSON. Docker handles custom templates but we don't need that much flexibility.
hf cache rm
Delete specific revision(s)
Delete specific repo(s)
Delete repos based on a query
Same as for
docker
, we use the quiet modee.g. "delete all repos not accessed in the last year"
hf cache rm $(hf cache ls --filter "accessed>1y" -q)
or on unix:
hf cache ls --filter "accessed>1y" -q | xargs hf cache rm
Confirmation step / dry-run
It would be good to have a confirmation step by default.
hf cache rm ... -y
Alternatively (or in addition), we could have a dry-run mode:
hf cache rm ... --dry-run
hf cache prune
Delete all detached revision
When downloading the same repo over time, the user might get several revisions in cache. Revisions can be linked to git refs (e.g.
main
,refs/pr/2
, etc.) or "detached". Pruning the cache will delete all revisions not explicitly bound to a reference.In practice, if a user has always downloaded from
main
, all revisions will be deleted except the last one.hf cache prune
Confirmation step / dry-run
Same as for
hf cache rm
.