Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,6 @@ Follow these steps to build your search engine:
PUT /_cluster/settings
{
"transient": {
"plugins.ml_commons.allow_registering_model_via_url": true,
"plugins.ml_commons.only_run_on_ml_node": false,
"plugins.ml_commons.native_memory_threshold": 99
}
Expand All @@ -346,13 +345,9 @@ Follow these steps to build your search engine:
```json
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "opensearch-neural-sparse-encoding",
"version": "1.0.0",
"description": "opensearch-neural-sparse-encoding",
"model_format": "TORCH_SCRIPT",
"function_name": "SPARSE_ENCODING",
"model_content_hash_value": "d1ebaa26615090bdb0195a62b180afd2a8524c68c5d406a11ad787267f515ea8",
"url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip"
"name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT"
}
```

Expand All @@ -377,7 +372,7 @@ Follow these steps to build your search engine:
{
"model_id": "<model_id>",
"task_type": "REGISTER_MODEL",
"function_name": "SPARSE_TOKENIZE",
"function_name": "SPARSE_ENCODING",
"state": "COMPLETED",
"worker_node": [
"wubXZX7xTIC7RW2z8nzhzw"
Expand Down Expand Up @@ -448,8 +443,7 @@ Congratulations! You've now created your own semantic search engine based on spa
"neural_sparse": {
"passage_embedding": {
"query_text": "Hello world a b",
"model_id": "<model_id>",
"max_token_score": 2.0
"model_id": "<model_id>"
}
}
}
Expand All @@ -458,20 +452,21 @@ Congratulations! You've now created your own semantic search engine based on spa

### Neural sparse query parameters

The `neural_sparse` query supports two parameters:
The `neural_sparse` query supports three parameters:

- `query_text` (String): The query text from which to generate sparse vector embeddings.
- `model_id` (String): The ID of the model that is used to generate tokens and weights from the query text. A sparse encoding model will expand the tokens from query text, while the tokenizer model will only tokenize the query text itself.
- `max_token_score` (Float): An extra parameter required for performance optimization. Just like a `match` query, a `neural_sparse` query is transformed to a Lucene BooleanQuery, combining term-level subqueries using disjunction. The difference is that a `neural_sparse` query uses FeatureQuery instead of TermQuery to match the terms. Lucene employs the Weak AND (WAND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses `FLOAT.MAX_VALUE` as the score upper bound, which makes the WAND optimization ineffective. The `max_token_score` parameter resets the score upper bound for each token in a query, which is consistent with the original FeatureQuery. Thus, setting the value to 3.5 for the bi-encoder model and to 2 for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated.
- `query_tokens` (Map<String, Float>): The query tokens, sometimes referred to as sparse vector embeddings. Similarly to dense semantic retrieval, you can use raw sparse vectors generated by neural models or tokenizers to perform a semantic search query. Use either the `query_text` option for raw field vectors or the `query_tokens` option for sparse vectors. Must be provided in order for the `neural_sparse` query to operate.

## Selecting a model

OpenSearch provides several pretrained encoder models that you can use out of the box without fine-tuning. For a list of sparse encoding models provided by OpenSearch, see [Sparse encoding models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models). We have also released the [models](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1) in Hugging Face model hub.

Use the following recommendations to select a sparse encoding model:

- For **bi-encoder** mode, we recommend using the `opensearch-neural-sparse-encoding-v1` pretrained model. For this model, both online search and offline ingestion share the same model file.
- For **bi-encoder** mode, we recommend using the `opensearch-neural-sparse-encoding-v2-distill` pretrained model. For this model, both online search and offline ingestion share the same model file.

- For **document-only** mode, we recommended using the `opensearch-neural-sparse-encoding-doc-v1` pretrained model for ingestion and the `opensearch-neural-sparse-tokenizer-v1` model at search time to implement online query tokenization. This model does not employ model inference and only translates the query into tokens.
- For **document-only** mode, we recommended using the `opensearch-neural-sparse-encoding-doc-v3-distill` pretrained model for ingestion and the `opensearch-neural-sparse-tokenizer-v1` model at search time to implement online query tokenization. This model does not employ model inference and only translates the query into tokens.


## Next steps
Expand Down