This example demonstrates how to export a Huggingface sentence-transformer model to ONNX format.
Please try multi vector indexing for an intro to semantic search.
Follow Vespa getting started
through the vespa deploy
step, using this example instead of album-recommendation
.
minimum-required-vespa-version="8.311.28"
Feed documents (this includes embed inference in Vespa):
vespa feed ext/*.json
Example queries using E5-Small-V2 embedding model that maps text to a 384-dimensional vector representation.
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \ 'input.query(e)=embed(e5, @query)' \ 'query=space contains many suns'
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \ 'input.query(e)=embed(e5, @query)' \ 'query=shipping stuff over the sea'
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \ 'input.query(e)=embed(e5, @query)' \ 'query=exchanging information by sound'
Remove the container after use:
$ docker rm -f vespa
Transformer-based embedding models have named inputs and outputs that must
be compatible with the input and output names used by the Vespa Bert embedder or the Huggingface embedder.
See export_hf_model_from_hf.py for exporting a Huggingface sentence-transformer model to ONNX format compatible with default input and output names used by the Vespa huggingface-embedder.
The following exports intfloat/e5-small-v2:
./export_hf_model_from_hf.py --hf_model intfloat/e5-small-v2 --output_dir model
The following exports intfloat/multilingual-e5-small using quantization:
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-small --output_dir model --quantize
The following exports intfloat/multilingual-e5-small using quantization and tokenizer patching to workaround this issue with compatiblity problems with loading saved tokenizers:
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-small --output_dir model --quantize --patch_tokenizer
Prefer using the Vespa huggingface-embedder instead.
See export_model_from_hf.py for exporting a Huggingface sentence-transformer model to ONNX format compatible with default input and output names used by the bert-embedder.
The following exports intfloat/e5-small-v2 and saves the model parameters in an ONNX file and the vocab.txt
file
in the format expected by the Vespa bert-embedder.
./export_model_from_hf.py --hf_model intfloat/e5-small-v2 --output_dir model
The model directory is duplicated as other apps depend on model file locations. This should be rewritten into using model hub.