@@ -100,7 +100,7 @@ model=BAAI/bge-large-en-v1.5
100
100
revision=refs/pr/5
101
101
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
102
102
103
- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.3 .0 --model-id $model --revision $revision
103
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.4 .0 --model-id $model --revision $revision
104
104
```
105
105
106
106
And then you can make requests like
@@ -243,13 +243,13 @@ Text Embeddings Inference ships with multiple Docker images that you can use to
243
243
244
244
| Architecture | Image |
245
245
| -------------------------------------| ---------------------------------------------------------------------------|
246
- | CPU | ghcr.io/huggingface/text-embeddings-inference: cpu-0 .3 .0 |
246
+ | CPU | ghcr.io/huggingface/text-embeddings-inference: cpu-0 .4 .0 |
247
247
| Volta | NOT SUPPORTED |
248
- | Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference: turing-0 .3 .0 (experimental) |
249
- | Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:0.3 .0 |
250
- | Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-0.3 .0 |
251
- | Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-0.3 .0 |
252
- | Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference: hopper-0 .3 .0 (experimental) |
248
+ | Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference: turing-0 .4 .0 (experimental) |
249
+ | Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:0.4 .0 |
250
+ | Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-0.4 .0 |
251
+ | Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-0.4 .0 |
252
+ | Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference: hopper-0 .4 .0 (experimental) |
253
253
254
254
** Warning** : Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
255
255
You can turn Flash Attention v1 ON by using the ` USE_FLASH_ATTENTION=True ` environment variable.
@@ -278,7 +278,7 @@ model=<your private model>
278
278
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
279
279
token=< your cli READ token>
280
280
281
- docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.3 .0 --model-id $model
281
+ docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.4 .0 --model-id $model
282
282
```
283
283
284
284
### Using Sequence Classification models
@@ -293,7 +293,7 @@ model=BAAI/bge-reranker-large
293
293
revision=refs/pr/4
294
294
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
295
295
296
- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.3 .0 --model-id $model --revision $revision
296
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.4 .0 --model-id $model --revision $revision
297
297
```
298
298
299
299
And then you can rank the similarity between a pair of inputs with:
@@ -309,9 +309,9 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
309
309
310
310
``` shell
311
311
model=SamLowe/roberta-base-go_emotions
312
- volume=$PWD /data
312
+ volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
313
313
314
- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.3 .0 --model-id $model
314
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.4 .0 --model-id $model
315
315
```
316
316
317
317
Once you have deployed the model you can use the ` predict ` endpoint to get the emotions most associated with an input:
0 commit comments