Releases · huggingface/text-embeddings-inference · GitHub

30 Nov 14:28

v0.6.0

What's Changed

Doc build only if doc files were changed by @mishig25 in #85
fix: fix inappropriate title of API docs page by @ucyang in #88
fix: hf hub redirects by @OlivierDehaene in #89
feat: add grpc router by @OlivierDehaene in #90
fix: fix padding support in batch tokens by @OlivierDehaene in #93
fix: fix tokenizers with both whitespace and metaspace by @OlivierDehaene in #96
fix: enable http feature in http-builder by @zhangfand in #98
feat: add integration tests by @OlivierDehaene in #101

New Contributors

@mishig25 made their first contribution in #85
@ucyang made their first contribution in #88
@zhangfand made their first contribution in #98

Full Changelog: v0.5.0...v0.6.0

Contributors

zhangfand, mishig25, and 2 other contributors

Assets 2

20 Nov 15:28

v0.5.0

What's Changed

feat: accept batches in predict by @OlivierDehaene in #78
feat: rerank route by @OlivierDehaene in #84

Full Changelog: v0.4.0...v0.5.0

Contributors

OlivierDehaene

Assets 2

15 Nov 18:20

v0.4.0

What's Changed

feat: USE_FLASH_ATTENTION env var by @OlivierDehaene in #57
docs: The initial version of the TEI docs for the hf.co/docs/ by @MKhalusova in #60
feat: support roberta by @kozistr in #62
fix: GH workflows update: added --not_python_module flag by @MKhalusova in #66
docs: Images links updated by @MKhalusova in #72
feat: add normalize option by @OlivierDehaene in #70
ci: Migrate CI to new Runners by @glegendre01 in #74
feat: add support for classification models by @OlivierDehaene in #76

New Contributors

@MKhalusova made their first contribution in #60
@kozistr made their first contribution in #62
@glegendre01 made their first contribution in #74

Full Changelog: v0.3.0...v0.4.0

Contributors

MKhalusova, kozistr, and 2 other contributors

Assets 2

27 Oct 12:46

v0.3.0

What's Changed

feat: faster CPU image on AMD in #35
feat: support camembert in #42
feat: support float32 on cuda in #41
feat: support jinaAI variant in #48

Full Changelog: v0.2.2...v0.3.0

Assets 2

19 Oct 12:12

v0.2.2

What's Changed

fix: max_input_length should take into account position_offset (aec5efd)

Full Changelog: v0.2.1...v0.2.2

Assets 2

18 Oct 17:39

v0.2.1

What's Changed

fix: only use position offset for xlm-roberta (8c507c3)

Full Changelog: v0.2.0...v0.2.1

Assets 2

18 Oct 11:40

v0.2.0

What's Changed

add support for XLM-RoBERTa in #5
get number of tokenization workers from the number of CPU cores in #8
prefetch batch in #10
support loading from .pth in #12
add --pooling arg in #14
fix compute cap matching in #21

Full Changelog: v0.1.0...v0.2.0

Assets 2

13 Oct 13:46

v0.1.0

No compilation step
Dynamic shapes
Small docker images and fast boot times. Get ready for true serverless!
Token based dynamic batching
Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
Safetensors weight loading
Production ready (distributed tracing with Open Telemetry, Prometheus metrics)

Assets 2