-
Couldn't load subscription status.
- Fork 0
Open
Description
To achieve a wide release for users to plug into their embedding pipelines, this module should achieve the following features:
- An async HTTP/1.1 server using the
axumrouting andtokioasync crates - A multi-threaded backend for massively parallel inference using the
rayoncrate - A interlink between
axum-tokioandrayonusing thetokio-rayoncrate - Support for transformer models sourced from HuggingFace running on CPU using ORT through the
OnnxBertstruct - Support for transformer models sourced from HuggingFace running on GPU using CUDA through the
CandleBertstruct - A pipeline that builds separate images for CPU and GPU support due to compiled nature of
rust:
Setup CICD to build and push images to Dockerhub #2 - Built and published images for the following embedding models:
-
BAAI/bge-large-en-v1.5andBAAI/bge-small-en-v1.5 -
sentence-transformers/all-MiniLM-L6-v2 -
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2, noonnx/model.onnxdir on HFhub -
Snowflake/snowflake-arctic-embed-landSnowflake/snowflake-arctic-embed-s -
mixedbread-ai/mxbai-embed-large-v1
-
Metadata
Metadata
Assignees
Labels
No labels