Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SentenceTransformer local_files_only env var #95

Closed
wants to merge 0 commits into from

Conversation

Merk0ff
Copy link

@Merk0ff Merk0ff commented Oct 31, 2024

In the current setup, building a custom Docker image with a sentence-transformers model results in an error when running the image in an environment without access to Hugging Face's model hub:

No sentence-transformers model found with name sentence-transformers/distiluse-base-multilingual-cased-v2. Creating a new one with mean pooling.

This behavior is undesirable, so a solution was implemented by adding local_files_only=True during SentenceTransformer initialization. To further enhance configuration flexibility, two new environment variables have been introduced:

LOCAL_FILES_ONLY: Enables local file-only loading for offline environments.
MODEL_DIRECTORY: Prepares for future support of model storage in a Persistent Volume Claim (PVC).
These additions improve the robustness of the model loading process, particularly in offline or isolated deployment scenarios.

@weaviate-git-bot
Copy link

To avoid any confusion in the future about your contribution to Weaviate, we work with a Contributor License Agreement. If you agree, you can simply add a comment to this PR that you agree with the CLA so that we can merge.

beep boop - the Weaviate bot 👋🤖

PS:
Are you already a member of the Weaviate Slack channel?

@Merk0ff
Copy link
Author

Merk0ff commented Oct 31, 2024

To avoid any confusion in the future about your contribution to Weaviate, we work with a Contributor License Agreement. If you agree, you can simply add a comment to this PR that you agree with the CLA so that we can merge.beep boop - the Weaviate bot 👋🤖PS:Are you already a member of the Weaviate Slack channel?

I agree with the CLA.

vectorizer.py Outdated
@@ -83,9 +84,17 @@ class SentenceTransformerVectorizer:
cuda_core: str

def __init__(self, model_path: str, model_name: str, cuda_core: str):
local_files_only = os.getenv("LOCAL_FILES_ONLY", "False").lower() in (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this logic that gets this setting to config.py? you can follow how we introduced TRUST_REMOTE_CODE setting

@antas-marcin
Copy link
Contributor

Thank you for your contribution @Merk0ff ! I have added 1 comment to your PR also you need to resolve in conflicts for vectorizer.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants