-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve: Improve progress bar #431
base: main
Are you sure you want to change the base?
Conversation
if not os.path.exists(os.path.join(result, model_file)): | ||
raise FileNotFoundError("Couldn't download model from huggingface") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if any of the other required files is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, the blobs are downloaded first, then its extracted to .onnx and the required files. So if the .onnx exists, then the blobs extracted everything correctly. U can check that by deleting the snapshot folder and try to run the model. The snapshot folder will be extracted again and the .onnx (and other files) will exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check is redundant, let's rely on hf here
also we have models without onnx files like Qdrant/bm25 and then it just checks existence of a directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we can also remove model_file
variable then
|
||
if is_cached: | ||
disable_progress_bars() | ||
if snapshot_dir.exists() and metadata_file.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if version on hf is different from the one we have locally, then we will hide the progress bar and then we will silently download the updated files
I think we could make a corresponding call to HfApi and check revision and commit hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can u explain it further ? Cuz as far as Andrey's said, he don't want to make a call to HFapi as it requires network. Do u mean that we can call it only while downloading on the first time and add revision to metadata ? And only call it when there;s network ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a _get_file_hash
function to compute hash for each file and then later on checked with _verify_files_from_metadata
if the version changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe to make this call if local_files_only != True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc in snapshot_download they just pull the whole repo info and compare revision's commit hahs
if not os.path.exists(os.path.join(result, model_file)): | ||
raise FileNotFoundError("Couldn't download model from huggingface") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check is redundant, let's rely on hf here
also we have models without onnx files like Qdrant/bm25 and then it just checks existence of a directory
if not os.path.exists(os.path.join(result, model_file)): | ||
raise FileNotFoundError("Couldn't download model from huggingface") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we can also remove model_file
variable then
try: # there's network at least first time | ||
url = hf_hub_url(hf_source_repo, file_path.name) | ||
hf_metadata = get_hf_file_metadata(url) | ||
metadata[str(file_path.relative_to(model_dir))] = { | ||
"size": hf_metadata.size, | ||
"commit_hash": hf_metadata.commit_hash, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we retrieve information about the whole repository instead of retrieving files one by one?
|
||
if is_cached: | ||
disable_progress_bars() | ||
if snapshot_dir.exists() and metadata_file.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc in snapshot_download they just pull the whole repo info and compare revision's commit hahs
Problem:
Suggestion:
All Submissions:
New Feature Submissions:
pre-commit
withpip3 install pre-commit
and set up hooks withpre-commit install
?New models submission: