Skip to content

Commit cfeaa66

Browse files
committed
VEC-223: Documentation for sparse and hybrid indexes
Added them under features, also updated the REST API specification.
1 parent f56b544 commit cfeaa66

14 files changed

+915
-16
lines changed

mint.json

+3-1
Original file line numberDiff line numberDiff line change
@@ -766,7 +766,9 @@
766766
"vector/features/filtering",
767767
"vector/features/embeddingmodels",
768768
"vector/features/namespaces",
769-
"vector/features/resumablequery"
769+
"vector/features/resumablequery",
770+
"vector/features/sparseindexes",
771+
"vector/features/hybridindexes"
770772
]
771773
},
772774
{

vector/api/endpoints/fetch-random.mdx

+5-2
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,11 @@ The response will be `null` if the namespace is empty.
2323
<ResponseField name="id" type="string" required>
2424
The id of the vector.
2525
</ResponseField>
26-
<ResponseField name="vector" type="number[]" required>
27-
The vector value.
26+
<ResponseField name="vector" type="number[]">
27+
The dense vector value for dense and hybrid indexes.
28+
</ResponseField>
29+
<ResponseField name="sparseVector" type="Object[]">
30+
The sparse vector value for sparse and hybrid indexes.
2831
</ResponseField>
2932

3033
<RequestExample>

vector/api/endpoints/fetch.mdx

+4-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,10 @@ their vector ids.
4949
The id of the vector.
5050
</ResponseField>
5151
<ResponseField name="vector" type="number[]">
52-
The vector value.
52+
The dense vector value for dense and hybrid indexes.
53+
</ResponseField>
54+
<ResponseField name="sparseVector" type="Object[]">
55+
The sparse vector value for sparse and hybrid indexes.
5356
</ResponseField>
5457
<ResponseField name="metadata" type="Object">
5558
The metadata of the vector, if any.

vector/api/endpoints/query-data.mdx

+25-2
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,23 @@ of fields below.
4444
<ParamField body="filter" type="string" default="">
4545
[Metadata filter](/vector/features/filtering) to apply.
4646
</ParamField>
47+
<ParamField body="weightingStrategy" type="string">
48+
For sparse vectors of sparse and hybrid indexes, specifies what kind of
49+
weighting strategy should be used while querying the matching non-zero
50+
dimension values of the query vector with the documents.
51+
52+
If not provided, no weighting will be used.
53+
54+
Only possible value is `IDF` (inverse document frequency).
55+
</ParamField>
56+
<ParamField body="fusionAlgorithm" type="string">
57+
Fusion algorithm to use while fusing scores
58+
from dense and sparse components of a hybrid index.
59+
60+
If not provided, defaults to `RRF` (Reciprocal Rank Fusion).
61+
62+
Other possible value is `DBSF` (Distribution-Based Score Fusion).
63+
</ParamField>
4764

4865
## Path
4966

@@ -61,9 +78,12 @@ If the request was an array of more than one items, an array of
6178
objects below is returned, one for each query item.
6279

6380
<Note>
64-
The score is normalized to always be between 0 and 1.
81+
For dense indexes, the score is normalized to always be between 0 and 1.
6582
The closer the score is to 1, the more similar the vector is to the query vector.
6683
This does not depend on the distance metric you use.
84+
85+
For sparse and hybrid indexes, scores can be arbitrary values, but the score
86+
will be higher for more similar vectors.
6787
</Note>
6888

6989
<ResponseField name="Scores" type="Object[]">
@@ -75,7 +95,10 @@ objects below is returned, one for each query item.
7595
The similarity score of the vector, calculated based on the distance metric of your index.
7696
</ResponseField>
7797
<ResponseField name="vector" type="number[]">
78-
The vector value.
98+
The dense vector value for dense and hybrid indexes.
99+
</ResponseField>
100+
<ResponseField name="sparseVector" type="Object[]">
101+
The sparse vector value for sparse and hybrid indexes.
79102
</ResponseField>
80103
<ResponseField name="metadata" type="Object">
81104
The metadata of the vector, if any.

vector/api/endpoints/query.mdx

+25-2
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,23 @@ of fields below.
4040
<ParamField body="filter" type="string" default="">
4141
[Metadata filter](/vector/features/filtering) to apply.
4242
</ParamField>
43+
<ParamField body="weightingStrategy" type="string">
44+
For sparse vectors of sparse and hybrid indexes, specifies what kind of
45+
weighting strategy should be used while querying the matching non-zero
46+
dimension values of the query vector with the documents.
47+
48+
If not provided, no weighting will be used.
49+
50+
Only possible value is `IDF` (inverse document frequency).
51+
</ParamField>
52+
<ParamField body="fusionAlgorithm" type="string">
53+
Fusion algorithm to use while fusing scores
54+
from dense and sparse components of a hybrid index.
55+
56+
If not provided, defaults to `RRF` (Reciprocal Rank Fusion).
57+
58+
Other possible value is `DBSF` (Distribution-Based Score Fusion).
59+
</ParamField>
4360

4461
## Path
4562

@@ -57,9 +74,12 @@ If the request was an array of more than one items, an array of
5774
objects below is returned, one for each query item.
5875

5976
<Note>
60-
The score is normalized to always be between 0 and 1.
77+
For dense indexes, the score is normalized to always be between 0 and 1.
6178
The closer the score is to 1, the more similar the vector is to the query vector.
6279
This does not depend on the distance metric you use.
80+
81+
For sparse and hybrid indexes, scores can be arbitrary values, but the score
82+
will be higher for more similar vectors.
6383
</Note>
6484

6585
<ResponseField name="Scores" type="Object[]">
@@ -71,7 +91,10 @@ objects below is returned, one for each query item.
7191
The similarity score of the vector, calculated based on the distance metric of your index.
7292
</ResponseField>
7393
<ResponseField name="vector" type="number[]">
74-
The vector value.
94+
The dense vector value for dense and hybrid indexes.
95+
</ResponseField>
96+
<ResponseField name="sparseVector" type="Object[]">
97+
The sparse vector value for sparse and hybrid indexes.
7598
</ResponseField>
7699
<ResponseField name="metadata" type="Object">
77100
The metadata of the vector, if any.

vector/api/endpoints/range.mdx

+5-2
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,11 @@ authMethod: "GET"
5252
<ResponseField name="id" type="string" required>
5353
The id of the vector.
5454
</ResponseField>
55-
<ResponseField name="vector" type="number[]" required>
56-
The vector value.
55+
<ResponseField name="vector" type="number[]">
56+
The dense vector value for dense and hybrid indexes.
57+
</ResponseField>
58+
<ResponseField name="sparseVector" type="Object[]">
59+
The sparse vector value for sparse and hybrid indexes.
5760
</ResponseField>
5861
<ResponseField name="metadata" type="Object">
5962
The metadata of the vector, if any.

vector/api/endpoints/resumable-query/resume.mdx

+4-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,10 @@ authMethod: "bearer"
2727
metric of your index.
2828
</ResponseField>
2929
<ResponseField name="vector" type="number[]">
30-
The vector value.
30+
The dense vector value for dense and hybrid indexes.
31+
</ResponseField>
32+
<ResponseField name="sparseVector" type="Object[]">
33+
The sparse vector value for sparse and hybrid indexes.
3134
</ResponseField>
3235
<ResponseField name="metadata" type="Object">
3336
The metadata of the vector, if any.

vector/api/endpoints/resumable-query/start-with-data.mdx

+19
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,25 @@ authMethod: "bearer"
4040
Maximum idle time for the resumable query in seconds.
4141
</ParamField>
4242

43+
<ParamField body="weightingStrategy" type="string">
44+
For sparse vectors of sparse and hybrid indexes, specifies what kind of
45+
weighting strategy should be used while querying the matching non-zero
46+
dimension values of the query vector with the documents.
47+
48+
If not provided, no weighting will be used.
49+
50+
Only possible value is `IDF` (inverse document frequency).
51+
</ParamField>
52+
53+
<ParamField body="fusionAlgorithm" type="string">
54+
Fusion algorithm to use while fusing scores
55+
from dense and sparse components of a hybrid index.
56+
57+
If not provided, defaults to `RRF` (Reciprocal Rank Fusion).
58+
59+
Other possible value is `DBSF` (Distribution-Based Score Fusion).
60+
</ParamField>
61+
4362
## Path
4463

4564
<ParamField path="namespace" type="string" default="">

vector/api/endpoints/resumable-query/start-with-vector.mdx

+23-1
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,25 @@ authMethod: "bearer"
4646
Maximum idle time for the resumable query in seconds.
4747
</ParamField>
4848

49+
<ParamField body="weightingStrategy" type="string">
50+
For sparse vectors of sparse and hybrid indexes, specifies what kind of
51+
weighting strategy should be used while querying the matching non-zero
52+
dimension values of the query vector with the documents.
53+
54+
If not provided, no weighting will be used.
55+
56+
Only possible value is `IDF` (inverse document frequency).
57+
</ParamField>
58+
59+
<ParamField body="fusionAlgorithm" type="string">
60+
Fusion algorithm to use while fusing scores
61+
from dense and sparse components of a hybrid index.
62+
63+
If not provided, defaults to `RRF` (Reciprocal Rank Fusion).
64+
65+
Other possible value is `DBSF` (Distribution-Based Score Fusion).
66+
</ParamField>
67+
4968
## Path
5069

5170
<ParamField path="namespace" type="string" default="">
@@ -69,7 +88,10 @@ authMethod: "bearer"
6988
metric of your index.
7089
</ResponseField>
7190
<ResponseField name="vector" type="number[]">
72-
The vector value.
91+
The dense vector value for dense and hybrid indexes.
92+
</ResponseField>
93+
<ResponseField name="sparseVector" type="Object[]">
94+
The sparse vector value for sparse and hybrid indexes.
7395
</ResponseField>
7496
<ResponseField name="metadata" type="Object">
7597
The metadata of the vector, if any.

vector/api/endpoints/update.mdx

+9-1
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,12 @@ of those.
1919
The id of the vector.
2020
</ParamField>
2121
<ParamField body="vector" type="number[]">
22-
The vector value to update to.
22+
The dense vector value to update to for dense and hybrid indexes.
2323
<Note>The vector should have the same dimensions as your index.</Note>
2424
</ParamField>
25+
<ParamField body="sparseVector" type="Object[]">
26+
The sparse vector value to update to for sparse and hybrid indexes.
27+
</ParamField>
2528
<ParamField body="data" type="string">
2629
The raw text data to update to.
2730
<Note>If the index is created with an [embedding model](/vector/features/embeddingmodels)
@@ -38,6 +41,11 @@ of those.
3841
`OVERWRITE` for overwrite, `PATCH` for patch.
3942
</ParamField>
4043

44+
<Note>
45+
For hybrid indexes either none or both of `vector` and `sparseVector` fields
46+
must be present. It is not allowed to update only `vector` or `sparseVector`.
47+
</Note>
48+
4149
## Path
4250

4351
<ParamField path="namespace" type="string" default="">

vector/api/endpoints/upsert.mdx

+13-2
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,13 @@ You can either upsert a single vector, or multiple vectors in an array.
1717
<ParamField body="id" type="string" required>
1818
The id of the vector.
1919
</ParamField>
20-
<ParamField body="vector" type="number[]" required>
21-
The vector value.
20+
<ParamField body="vector" type="number[]">
21+
The dense vector value for dense and hybrid indexes.
2222
<Note>The vector should have the same dimensions as your index.</Note>
2323
</ParamField>
24+
<ParamField body="sparseVector" type="Object[]">
25+
The sparse vector value for sparse and hybrid indexes.
26+
</ParamField>
2427
<ParamField body="metadata" type="Object">
2528
The metadata of the vector. This makes identifying vectors
2629
on retrieval easier and can be used to with filters on queries.
@@ -30,6 +33,14 @@ You can either upsert a single vector, or multiple vectors in an array.
3033
data, which can be anything associated with this vector.
3134
</ParamField>
3235

36+
<Note>
37+
For dense indexes, only `vector` should be provided, and `sparseVector` should not be set.
38+
39+
For sparse indexes, only `sparseVector` should be provided, and `vector` should not be set.
40+
41+
For hybrid indexes both of `vector` and `sparseVector` must be present.
42+
</Note>
43+
3344
## Path
3445

3546
<ParamField path="namespace" type="string" default="">

vector/features/embeddingmodels.mdx

+10-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Upstash Vector comes with a variety of embedding models that score well in the
3232
for measuring the performance of embedding models. They support use cases such
3333
as classification, clustering, or retrieval.
3434

35-
You can choose the following general purpose models:
35+
You can choose the following general purpose models for dense and hybrid indexes:
3636

3737
| Name | Dimension | Sequence Length | MTEB |
3838
| ------------------------------------------------------------------------------------------------------- | --------- | --------------- | ----- |
@@ -56,6 +56,15 @@ You can choose the following general purpose models:
5656
MTEB score for the `BAAI/bge-m3` is not fully measured.
5757
</Note>
5858

59+
For sparse and hybrid indexes, on the following models can be selected:
60+
61+
| Name |
62+
| ------------------------------------------------- |
63+
| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) |
64+
| [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) |
65+
66+
See [Creating Sparse Vectors](/vector/features/sparseindexes#creating-sparse-vectors) for the details of the above models.
67+
5968
## Using a Model
6069

6170
To start using embedding models, create the index with a model of your choice.

0 commit comments

Comments
 (0)