Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions dinov2/hub/dinotxt.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,21 @@ def dinov2_vitl14_reg4_dinotxt_tet1280d20h24l():
return model


def get_tokenizer():
def get_tokenizer(local_path=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: def get_tokenizer(bpe_path_or_url: Optional[str] = None)

from .text.tokenizer import Tokenizer
import requests
from io import BytesIO

url = _DINOV2_BASE_URL + "/thirdparty/bpe_simple_vocab_16e6.txt.gz"
try:
response = requests.get(url)
response.raise_for_status()
file_buf = BytesIO(response.content)
if not local_path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use urllib.parse.urlparse(bpe_path).scheme to distinguish between a URL with a scheme or something that looks like an actual local path?

response = requests.get(url)
response.raise_for_status()
content = response.content
Comment on lines +73 to +75
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cijose Do you remember why requests was pulled as dependency? Couldn't we simply use builtin Python modules instead?

    with urllib.request.urlopen(url) as f:
        content = f.read()

else:
with open(local_path, "rb") as f:
content = f.read()
file_buf = BytesIO(content)
return Tokenizer(vocab_path=file_buf)
except Exception as e:
raise FileNotFoundError(f"Failed to download file from url {url} with error last: {e}")