Skip to content

kn0sys/valentinus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

build test Crates.io Version Crates.io Total Downloads docs.rs GitHub commit activity Matrix

alt text

valentinus

next generation vector db built with lmdb bindings

dependencies

  • bincode/serde - serialize/deserialize
  • lmdb-rs - database bindings
  • ndarray - numpy equivalent
  • ort/onnx - embeddings

getting started

git clone https://github.com/kn0sys/valentinus && cd valentinus

optional environment variables

var usage default
LMDB_USER working directory of the user for database $USER
LMDB_MAP_SIZE Sets max environment size, i.e. size in memory/disk of all data 20% of available memory
ONNX_PARALLEL_THREADS parallel execution mode for this session 1
VALENTINUS_CUSTOM_DIM embeddings dimensions for custom models all-mini-lm-6 -> 384
VALENTINUS_LMDB_ENV environment for the database (i.e. test, prod) test

tests

  • Note: all tests currently require the all-MiniLM-L6-v2_onnx directory
  • Get the model.onnx and tokenizer.json from huggingface or build them
mkdir all-MiniLM-L6-v2_onnx
cd all-MiniLM-L6-v2_onnx && wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/config.json
wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx
wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/special_tokens_map.json
wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/main/tokenizer_config.json
wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/main/tokenizer.json
wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/main/vocab.txt

RUST_TEST_THREADS=1 cargo test

examples

see examples

reference

inspired by this chromadb python tutorial

About

next generation vector db built with lmdb bindings

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages