Releases: huggingface/text-embeddings-inference
Releases · huggingface/text-embeddings-inference
v0.6.0
What's Changed
- Doc build only if doc files were changed by @mishig25 in #85
- fix: fix inappropriate title of API docs page by @ucyang in #88
- fix: hf hub redirects by @OlivierDehaene in #89
- feat: add grpc router by @OlivierDehaene in #90
- fix: fix padding support in batch tokens by @OlivierDehaene in #93
- fix: fix tokenizers with both whitespace and metaspace by @OlivierDehaene in #96
- fix: enable http feature in http-builder by @zhangfand in #98
- feat: add integration tests by @OlivierDehaene in #101
New Contributors
- @mishig25 made their first contribution in #85
- @ucyang made their first contribution in #88
- @zhangfand made their first contribution in #98
Full Changelog: v0.5.0...v0.6.0
v0.5.0
What's Changed
- feat: accept batches in predict by @OlivierDehaene in #78
- feat: rerank route by @OlivierDehaene in #84
Full Changelog: v0.4.0...v0.5.0
v0.4.0
What's Changed
- feat: USE_FLASH_ATTENTION env var by @OlivierDehaene in #57
- docs: The initial version of the TEI docs for the hf.co/docs/ by @MKhalusova in #60
- feat: support roberta by @kozistr in #62
- fix: GH workflows update: added --not_python_module flag by @MKhalusova in #66
- docs: Images links updated by @MKhalusova in #72
- feat: add
normalize
option by @OlivierDehaene in #70 - ci: Migrate CI to new Runners by @glegendre01 in #74
- feat: add support for classification models by @OlivierDehaene in #76
New Contributors
- @MKhalusova made their first contribution in #60
- @kozistr made their first contribution in #62
- @glegendre01 made their first contribution in #74
Full Changelog: v0.3.0...v0.4.0
v0.3.0
v0.2.2
What's Changed
fix: max_input_length should take into account position_offset (aec5efd)
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- fix: only use position offset for xlm-roberta (8c507c3)
Full Changelog: v0.2.0...v0.2.1
v0.2.0
v0.1.0
- No compilation step
- Dynamic shapes
- Small docker images and fast boot times. Get ready for true serverless!
- Token based dynamic batching
- Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
- Safetensors weight loading
- Production ready (distributed tracing with Open Telemetry, Prometheus metrics)