Skip to content

Latest commit

 

History

History
102 lines (71 loc) · 4.26 KB

README.md

File metadata and controls

102 lines (71 loc) · 4.26 KB
#Vespa

Vespa sample applications - Model Exporting

This example demonstrates how to export a Huggingface sentence-transformer model to ONNX format.

Please try multi vector indexing for an intro to semantic search.

To try this application

Follow Vespa getting started through the vespa deploy step, using this example instead of album-recommendation.

minimum-required-vespa-version="8.311.28"

Feed documents (this includes embed inference in Vespa):

vespa feed ext/*.json

Example queries using E5-Small-V2 embedding model that maps text to a 384-dimensional vector representation.

vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \
 'input.query(e)=embed(e5, @query)' \
 'query=space contains many suns'
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \
 'input.query(e)=embed(e5, @query)' \
 'query=shipping stuff over the sea'
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \
 'input.query(e)=embed(e5, @query)' \
 'query=exchanging information by sound' 

Remove the container after use:

$ docker rm -f vespa

Model exporting

Transformer-based embedding models have named inputs and outputs that must
be compatible with the input and output names used by the Vespa Bert embedder or the Huggingface embedder.

Huggingface-embedder

See export_hf_model_from_hf.py for exporting a Huggingface sentence-transformer model to ONNX format compatible with default input and output names used by the Vespa huggingface-embedder.

The following exports intfloat/e5-small-v2:

./export_hf_model_from_hf.py --hf_model intfloat/e5-small-v2 --output_dir model

The following exports intfloat/multilingual-e5-small using quantization:

./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-small --output_dir model --quantize

The following exports intfloat/multilingual-e5-small using quantization and tokenizer patching to workaround this issue with compatiblity problems with loading saved tokenizers:

./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-small --output_dir model --quantize --patch_tokenizer

Bert-embedder

Prefer using the Vespa huggingface-embedder instead.

See export_model_from_hf.py for exporting a Huggingface sentence-transformer model to ONNX format compatible with default input and output names used by the bert-embedder.

The following exports intfloat/e5-small-v2 and saves the model parameters in an ONNX file and the vocab.txt file in the format expected by the Vespa bert-embedder.

./export_model_from_hf.py --hf_model intfloat/e5-small-v2 --output_dir model

ToDo

The model directory is duplicated as other apps depend on model file locations. This should be rewritten into using model hub.