Skip to content

Bring the module into a state of readiness for release #1

@tsmith023

Description

@tsmith023

To achieve a wide release for users to plug into their embedding pipelines, this module should achieve the following features:

  • An async HTTP/1.1 server using the axum routing and tokio async crates
  • A multi-threaded backend for massively parallel inference using the rayon crate
  • A interlink between axum-tokio and rayon using the tokio-rayon crate
  • Support for transformer models sourced from HuggingFace running on CPU using ORT through the OnnxBert struct
  • Support for transformer models sourced from HuggingFace running on GPU using CUDA through the CandleBert struct
  • A pipeline that builds separate images for CPU and GPU support due to compiled nature of rust:
    Setup CICD to build and push images to Dockerhub #2
  • Built and published images for the following embedding models:
    • BAAI/bge-large-en-v1.5 and BAAI/bge-small-en-v1.5
    • sentence-transformers/all-MiniLM-L6-v2
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2, no onnx/model.onnx dir on HFhub
    • Snowflake/snowflake-arctic-embed-l and Snowflake/snowflake-arctic-embed-s
    • mixedbread-ai/mxbai-embed-large-v1

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions