opensearch-project · zhichao-aws · May 19, 2025
@@ -333,7 +333,6 @@ Follow these steps to build your search engine:
     PUT /_cluster/settings
     {
         "transient": {
-            "plugins.ml_commons.allow_registering_model_via_url": true,
             "plugins.ml_commons.only_run_on_ml_node": false,
             "plugins.ml_commons.native_memory_threshold": 99
         }
@@ -346,13 +345,9 @@ Follow these steps to build your search engine:
     ```json
     POST /_plugins/_ml/models/_register?deploy=true
     {
-        "name": "opensearch-neural-sparse-encoding",
-        "version": "1.0.0",
-        "description": "opensearch-neural-sparse-encoding",
-        "model_format": "TORCH_SCRIPT",
-        "function_name": "SPARSE_ENCODING",
-        "model_content_hash_value": "d1ebaa26615090bdb0195a62b180afd2a8524c68c5d406a11ad787267f515ea8",
-        "url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip"
+        "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1",
+        "version": "1.0.1",
+        "model_format": "TORCH_SCRIPT"
     }
     ```
 
@@ -377,7 +372,7 @@ Follow these steps to build your search engine:
     {
         "model_id": "<model_id>",
         "task_type": "REGISTER_MODEL",
-        "function_name": "SPARSE_TOKENIZE",
+        "function_name": "SPARSE_ENCODING",
         "state": "COMPLETED",
         "worker_node": [
             "wubXZX7xTIC7RW2z8nzhzw"
@@ -448,8 +443,7 @@ Congratulations! You've now created your own semantic search engine based on spa
         "neural_sparse": {
             "passage_embedding": {
                 "query_text": "Hello world a b",
-                "model_id": "<model_id>",
-                "max_token_score": 2.0
+                "model_id": "<model_id>"
             }
         }
     }
@@ -458,20 +452,21 @@ Congratulations! You've now created your own semantic search engine based on spa
 
 ### Neural sparse query parameters
 
-The `neural_sparse` query supports two parameters:
+The `neural_sparse` query supports three parameters:
 
+- `query_text` (String): The query text from which to generate sparse vector embeddings.
 - `model_id` (String): The ID of the model that is used to generate tokens and weights from the query text. A sparse encoding model will expand the tokens from query text, while the tokenizer model will only tokenize the query text itself.
-- `max_token_score` (Float): An extra parameter required for performance optimization. Just like a `match` query, a `neural_sparse` query is transformed to a Lucene BooleanQuery, combining term-level subqueries using disjunction. The difference is that a `neural_sparse` query uses FeatureQuery instead of TermQuery to match the terms. Lucene employs the Weak AND (WAND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses `FLOAT.MAX_VALUE` as the score upper bound, which makes the WAND optimization ineffective. The `max_token_score` parameter resets the score upper bound for each token in a query, which is consistent with the original FeatureQuery. Thus, setting the value to 3.5 for the bi-encoder model and to 2 for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated.
+- `query_tokens` (Map<String, Float>): The query tokens, sometimes referred to as sparse vector embeddings. Similarly to dense semantic retrieval, you can use raw sparse vectors generated by neural models or tokenizers to perform a semantic search query. Use either the `query_text` option for raw field vectors or the `query_tokens` option for sparse vectors. Must be provided in order for the `neural_sparse` query to operate.
 
 ## Selecting a model
 
 OpenSearch provides several pretrained encoder models that you can use out of the box without fine-tuning. For a list of sparse encoding models provided by OpenSearch, see [Sparse encoding models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models). We have also released the [models](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1) in Hugging Face model hub. 
 
 Use the following recommendations to select a sparse encoding model:
 
-- For **bi-encoder** mode, we recommend using the `opensearch-neural-sparse-encoding-v1` pretrained model. For this model, both online search and offline ingestion share the same model file. 
+- For **bi-encoder** mode, we recommend using the `opensearch-neural-sparse-encoding-v2-distill` pretrained model. For this model, both online search and offline ingestion share the same model file. 
 
-- For **document-only** mode, we recommended using the `opensearch-neural-sparse-encoding-doc-v1` pretrained model for ingestion and the `opensearch-neural-sparse-tokenizer-v1` model at search time to implement online query tokenization. This model does not employ model inference and only translates the query into tokens. 
+- For **document-only** mode, we recommended using the `opensearch-neural-sparse-encoding-doc-v3-distill` pretrained model for ingestion and the `opensearch-neural-sparse-tokenizer-v1` model at search time to implement online query tokenization. This model does not employ model inference and only translates the query into tokens. 
 
 
 ## Next steps