Skip to content

Commit

Permalink
fix: remove embedding dims config from .env.example (pingcap#472)
Browse files Browse the repository at this point in the history
  • Loading branch information
Mini256 authored Dec 6, 2024
1 parent 400021a commit 9dac9ca
Showing 1 changed file with 8 additions and 11 deletions.
19 changes: 8 additions & 11 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,12 @@ TIDB_PASSWORD=
TIDB_DATABASE=
TIDB_SSL=true

# CAUTION: Do not change EMBEDDING_DIMS after initializing the database.
# Changing the embedding dimensions requires recreating the database and tables.
# The default EMBEDDING_DIMS and EMBEDDING_MAX_TOKENS are set for the OpenAI text-embedding-3-small model.
# If using a different embedding model, adjust these values according to the model's specifications.
# For example:
# openai/text-embedding-3-small: EMBEDDING_DIMS=1536, EMBEDDING_MAX_TOKENS=8191
# maidalun1020/bce-embedding-base_v1: EMBEDDING_DIMS=768, EMBEDDING_MAX_TOKENS=512
# BAAI/bge-m3: EMBEDDING_DIMS=1024, EMBEDDING_MAX_TOKENS=8192
EMBEDDING_DIMS=1536
# EMBEDDING_MAX_TOKENS should be equal or smaller than the embedding model's max tokens,
# it indicates the max size of document chunks.
# EMBEDDING_MAX_TOKENS indicates the max size of document chunks.
#
# EMBEDDING_MAX_TOKENS should be smaller than the embedding model's max tokens due
# to the tokenizer difference. (see: https://github.com/pingcap/autoflow/issues/397)
#
# Go to https://tidb.ai/docs/embedding-model to check the max tokens of the embedding model.
#
# Notice: this variable will be deprecated in the future.
EMBEDDING_MAX_TOKENS=2048

0 comments on commit 9dac9ca

Please sign in to comment.