diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md index 1ef3a818f6..8e67b7cbca 100644 --- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md +++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md @@ -333,7 +333,6 @@ Follow these steps to build your search engine: PUT /_cluster/settings { "transient": { - "plugins.ml_commons.allow_registering_model_via_url": true, "plugins.ml_commons.only_run_on_ml_node": false, "plugins.ml_commons.native_memory_threshold": 99 } @@ -346,13 +345,9 @@ Follow these steps to build your search engine: ```json POST /_plugins/_ml/models/_register?deploy=true { - "name": "opensearch-neural-sparse-encoding", - "version": "1.0.0", - "description": "opensearch-neural-sparse-encoding", - "model_format": "TORCH_SCRIPT", - "function_name": "SPARSE_ENCODING", - "model_content_hash_value": "d1ebaa26615090bdb0195a62b180afd2a8524c68c5d406a11ad787267f515ea8", - "url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip" + "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1", + "version": "1.0.1", + "model_format": "TORCH_SCRIPT" } ``` @@ -377,7 +372,7 @@ Follow these steps to build your search engine: { "model_id": "", "task_type": "REGISTER_MODEL", - "function_name": "SPARSE_TOKENIZE", + "function_name": "SPARSE_ENCODING", "state": "COMPLETED", "worker_node": [ "wubXZX7xTIC7RW2z8nzhzw" @@ -448,8 +443,7 @@ Congratulations! You've now created your own semantic search engine based on spa "neural_sparse": { "passage_embedding": { "query_text": "Hello world a b", - "model_id": "", - "max_token_score": 2.0 + "model_id": "" } } } @@ -458,10 +452,11 @@ Congratulations! You've now created your own semantic search engine based on spa ### Neural sparse query parameters -The `neural_sparse` query supports two parameters: +The `neural_sparse` query supports three parameters: +- `query_text` (String): The query text from which to generate sparse vector embeddings. - `model_id` (String): The ID of the model that is used to generate tokens and weights from the query text. A sparse encoding model will expand the tokens from query text, while the tokenizer model will only tokenize the query text itself. -- `max_token_score` (Float): An extra parameter required for performance optimization. Just like a `match` query, a `neural_sparse` query is transformed to a Lucene BooleanQuery, combining term-level subqueries using disjunction. The difference is that a `neural_sparse` query uses FeatureQuery instead of TermQuery to match the terms. Lucene employs the Weak AND (WAND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses `FLOAT.MAX_VALUE` as the score upper bound, which makes the WAND optimization ineffective. The `max_token_score` parameter resets the score upper bound for each token in a query, which is consistent with the original FeatureQuery. Thus, setting the value to 3.5 for the bi-encoder model and to 2 for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated. +- `query_tokens` (Map): The query tokens, sometimes referred to as sparse vector embeddings. Similarly to dense semantic retrieval, you can use raw sparse vectors generated by neural models or tokenizers to perform a semantic search query. Use either the `query_text` option for raw field vectors or the `query_tokens` option for sparse vectors. Must be provided in order for the `neural_sparse` query to operate. ## Selecting a model @@ -469,9 +464,9 @@ OpenSearch provides several pretrained encoder models that you can use out of th Use the following recommendations to select a sparse encoding model: -- For **bi-encoder** mode, we recommend using the `opensearch-neural-sparse-encoding-v1` pretrained model. For this model, both online search and offline ingestion share the same model file. +- For **bi-encoder** mode, we recommend using the `opensearch-neural-sparse-encoding-v2-distill` pretrained model. For this model, both online search and offline ingestion share the same model file. -- For **document-only** mode, we recommended using the `opensearch-neural-sparse-encoding-doc-v1` pretrained model for ingestion and the `opensearch-neural-sparse-tokenizer-v1` model at search time to implement online query tokenization. This model does not employ model inference and only translates the query into tokens. +- For **document-only** mode, we recommended using the `opensearch-neural-sparse-encoding-doc-v3-distill` pretrained model for ingestion and the `opensearch-neural-sparse-tokenizer-v1` model at search time to implement online query tokenization. This model does not employ model inference and only translates the query into tokens. ## Next steps