Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
f5e9a32
[Feature] Add table function relation syntax to SQL grammar (#5318)
mengweieric Apr 7, 2026
6fe6fc0
Add knn_vector as recognized MappingType in OpenSearchDataType
mengweieric Apr 7, 2026
9ec7bc6
Widen OpenSearchIndexScanBuilder constructor to public
mengweieric Apr 7, 2026
0381f6b
Add vector search table function, query builder, and index
mengweieric Apr 7, 2026
5c594e9
Register VectorSearchTableFunctionResolver in OpenSearchStorageEngine
mengweieric Apr 7, 2026
f8f8274
Add VectorSearchQueryBuilder unit tests
mengweieric Apr 7, 2026
bcfbbc7
Address review feedback: add validation guards and pushDownFilter test
mengweieric Apr 8, 2026
35afe25
Canonicalize option values as numeric types before DSL generation
mengweieric Apr 8, 2026
322c622
Harden input validation and add size=k default for top-k mode
mengweieric Apr 8, 2026
8d75bf6
Add null-arg-name guard and make storage engine test less brittle
mengweieric Apr 8, 2026
167f49a
Clean up dead code, fix misleading comment and test name
mengweieric Apr 8, 2026
47473bb
Add test for missing required option validation path
mengweieric Apr 8, 2026
02151e9
Add mutual exclusivity and k range validation
mengweieric Apr 8, 2026
80aeed1
Add LIMIT > k rejection in top-k vector search mode
mengweieric Apr 8, 2026
02b39fe
Add comprehensive test coverage for vector search hardening
mengweieric Apr 8, 2026
f18575f
Add resolver argument count edge case tests
mengweieric Apr 8, 2026
11c0789
Add radial size policy, sort restriction, and integration tests
mengweieric Apr 8, 2026
73c4464
Fill test coverage gaps for vector search hardening
mengweieric Apr 8, 2026
06fa060
Preserve sort.getCount() limit pushdown contract in pushDownSort
mengweieric Apr 9, 2026
c00d35e
Add compound predicate and radial+WHERE test coverage
mengweieric Apr 9, 2026
39ff505
Route pushDownSort count through LIMIT > k validation
mengweieric Apr 14, 2026
98064f0
Split explain tests into dedicated VectorSearchExplainIT
mengweieric Apr 14, 2026
b316d4c
Add FilterType enum for post|efficient filter placement
mengweieric Apr 9, 2026
fa52705
Add filter_type to allowed option keys with post|efficient validation
mengweieric Apr 9, 2026
ad19b6a
Strip filter_type from options and pass as typed FilterType to Vector…
mengweieric Apr 9, 2026
c00b844
Collapse buildKnnQueryJson to accept optional filter clause
mengweieric Apr 9, 2026
a511acd
Implement efficient filter pushdown branching and build-time validati…
mengweieric Apr 9, 2026
cee09d0
Wire FilterType and rebuild callback through createScanBuilder
mengweieric Apr 9, 2026
8059f1b
Add build-time validation and regression tests for LIMIT/sort under e…
mengweieric Apr 9, 2026
dc26225
Add integration tests for filter_type=post|efficient and spotless for…
mengweieric Apr 9, 2026
857773c
Reject radial vector search without LIMIT
mengweieric Apr 10, 2026
0b87746
Fix limitPushed not set when limit comes through pushDownSort count path
mengweieric Apr 14, 2026
f7318c8
Handle non-pushdownable WHERE with explicit filter_type
mengweieric Apr 14, 2026
52df3a5
Add defensive null guard for EFFICIENT mode rebuild callback
mengweieric Apr 15, 2026
f122be5
Address review feedback: reword user-facing error, strengthen explain…
mengweieric Apr 16, 2026
d8b0e85
Rename remaining backward-compatible constructor comment for consistency
mengweieric Apr 16, 2026
96d7a36
Reject GROUP BY / aggregations on vectorSearch() relations
mengweieric Apr 17, 2026
2845f4f
Address review: reframe as SQL-preview constraint; add GROUP BY and b…
mengweieric Apr 17, 2026
c771b72
Strengthen VectorSearchExplainIT with structural DSL assertions
mengweieric Apr 17, 2026
fe4b653
Address review nits: outer-bool guard, helper comment, efficient+ORDE…
mengweieric Apr 17, 2026
281beb2
Fail fast when k-NN plugin is missing
mengweieric Apr 17, 2026
fa444fe
Defer k-NN plugin probe to scan open() so _explain keeps working
mengweieric Apr 17, 2026
b7bac49
[BugFix] Reject positional args and tighten table-name validation in …
mengweieric Apr 20, 2026
ffbd1cf
Drop PR-reference from Argument-shape section header
mengweieric Apr 21, 2026
24bf6e0
[BugFix] Lock in BETWEEN / NOT IN pushdown shapes for vectorSearch
mengweieric Apr 21, 2026
cbb928b
Rewrite IS NOT NULL / IS NULL to native exists DSL in v2 filter pushdown
mengweieric Apr 21, 2026
af73737
Force script fallback for nested IS NULL / IS NOT NULL
mengweieric Apr 21, 2026
cb2420e
Clearer error when vectorSearch() is used without a table alias
mengweieric Apr 20, 2026
6dfbdca
Reject outer WHERE on vectorSearch() subqueries
mengweieric Apr 21, 2026
3091eb9
Preserve subquery Project marker across inner Filter in walker
mengweieric Apr 21, 2026
a5102b7
Apply spotless formatting to walker Javadoc
mengweieric Apr 21, 2026
2bf1faf
[BugFix] Reject vectorSearch on index with user _score field
mengweieric Apr 21, 2026
effca01
Address review nits on synthetic _score collision guard
mengweieric Apr 21, 2026
a2c82ea
Reject OFFSET, WHERE on _score, and script subtrees under filter_type…
mengweieric Apr 20, 2026
01a5d87
Replace script-query blacklist with fail-closed allow-list for effici…
mengweieric Apr 22, 2026
91d6d8e
[BugFix] Tighten vectorSearch() option and vector parsing error messages
mengweieric Apr 23, 2026
12c5553
Address review feedback: reject empty vector / option segments, stabi…
mengweieric Apr 23, 2026
c0faf7c
Apply second-pass polish: return parsed numeric options, tighten key-…
mengweieric Apr 23, 2026
ca6da43
[BugFix] Surface dedicated error for wildcard or multi-target table i…
mengweieric Apr 23, 2026
2c28a86
Trim duplicated inline comment on wildcard-table check
mengweieric Apr 23, 2026
f6ec54f
Extend validateTableName Javadoc to cover wildcard/multi-target rejec…
mengweieric Apr 24, 2026
1b711d7
Add happy-path execution tests for vectorSearch()
mengweieric Apr 17, 2026
7ae4189
Tighten VectorSearchExecutionIT per reviewer feedback
mengweieric Apr 17, 2026
3d55191
Fix efficient-filter test to satisfy LIMIT <= k contract
mengweieric Apr 24, 2026
cf3d660
Add user doc page for vectorSearch() table function
mengweieric Apr 17, 2026
aaea249
Address review: soften aggregation language; tighten filter_type sect…
mengweieric Apr 17, 2026
5571842
Correct vector-search doc to match current behavior
mengweieric Apr 24, 2026
bb45f0f
Drop preview framing from Limitations section
mengweieric Apr 24, 2026
1ae98b3
Tighten Limitations wording after dropping preview framing
mengweieric Apr 24, 2026
e5413fc
Align doc with current branch behavior after hardening PRs
mengweieric Apr 24, 2026
db26cb0
Align Limitations wording with house style
mengweieric Apr 24, 2026
c443be5
Polish doc wording to match existing user-doc house style
mengweieric Apr 24, 2026
394c190
Strengthen no-kNN integration tests with exact message and scan-opera…
mengweieric Apr 24, 2026
d6ab5a5
Add integration test for vectorSearch() alias with multiple backing i…
mengweieric Apr 24, 2026
67d4819
Remove em dashes from added comments
mengweieric Apr 24, 2026
5086c06
Reject outer operators over vectorSearch() subqueries
mengweieric Apr 24, 2026
c06c472
Address PR review: operator-specific error messages, inner ORDER BY _…
mengweieric Apr 24, 2026
cb2b754
[Feature] Mark vectorSearch() experimental and default to efficient f…
mengweieric Apr 29, 2026
e58363f
[Doc] Apply reviewer feedback on vectorSearch() user doc
mengweieric Apr 29, 2026
674e877
[Doc] Apply second-pass review polish to vectorSearch() user doc
mengweieric Apr 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion core/src/main/java/org/opensearch/sql/planner/Planner.java
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import org.opensearch.sql.planner.optimizer.LogicalPlanOptimizer;
import org.opensearch.sql.planner.physical.PhysicalPlan;
import org.opensearch.sql.storage.Table;
import org.opensearch.sql.storage.read.TableScanBuilder;

/** Planner that plans and chooses the optimal physical plan. */
@RequiredArgsConstructor
Expand All @@ -34,7 +35,35 @@ public PhysicalPlan plan(LogicalPlan plan) {
if (table == null) {
return plan.accept(new DefaultImplementor<>(), null);
}
return table.implement(table.optimize(optimize(plan)));
LogicalPlan optimized = table.optimize(optimize(plan));
// Give scan builders a chance to reject shapes that push-down alone cannot express safely
// (e.g. operators that land above the scan but outside its push-down contract).
validateScanBuilders(optimized);
return table.implement(optimized);
}

/**
* Walk the optimized plan and invoke {@link TableScanBuilder#validatePlan(LogicalPlan)} on every
* scan builder, passing the fully optimized root so scan builders can inspect their ancestors.
*/
private void validateScanBuilders(LogicalPlan optimized) {
optimized.accept(
new LogicalPlanNodeVisitor<Void, Object>() {
@Override
public Void visitNode(LogicalPlan node, Object context) {
for (LogicalPlan child : node.getChild()) {
child.accept(this, context);
}
return null;
}

@Override
public Void visitTableScanBuilder(TableScanBuilder node, Object context) {
node.validatePlan(optimized);
return null;
}
},
null);
}

private Table findTable(LogicalPlan plan) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,19 @@ public boolean pushDownPageSize(LogicalPaginate paginate) {
return false;
}

/**
* Post-optimization validation hook. Called once by the planner after all push-down rules have
* run, with the fully optimized plan root. Subclasses may inspect the ancestors of this scan
* builder to reject planner shapes that push-down alone cannot express safely (for example,
* operators that land above the scan but outside its push-down contract and would be executed
* after the scan has already returned a bounded result set). Default is no-op.
*
* @param root the fully optimized logical plan containing this scan builder
*/
public void validatePlan(LogicalPlan root) {
// no-op by default
}

@Override
public <R, C> R accept(LogicalPlanNodeVisitor<R, C> visitor, C context) {
return visitor.visitTableScanBuilder(this, context);
Expand Down
331 changes: 331 additions & 0 deletions docs/user/dql/vector-search.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,331 @@

==============================
Vector Search [Experimental]
==============================

.. rubric:: Table of contents

.. contents::
:local:
:depth: 2

Introduction
============

``vectorSearch()`` is an experimental feature. Syntax, options, and
pushdown behavior may change in future releases based on feedback.

The ``vectorSearch()`` table function runs a k-NN query against a ``knn_vector``
field and exposes the matching documents as a relation in the ``FROM`` clause.
It relies on the OpenSearch `k-NN plugin
<https://docs.opensearch.org/latest/vector-search/>`_. The target index must
map the vector field as ``knn_vector`` and the index must be created with
``index.knn: true``.

The SQL layer translates ``vectorSearch()`` into an OpenSearch search
request whose body is native k-NN query DSL; the query vector is parsed
into a numeric array before that DSL is emitted.

Relevance is expressed through the OpenSearch ``_score`` metadata field, and
results are returned ordered by ``_score DESC`` by default.

vectorSearch
============

Description
-----------

``vectorSearch(table='<index>', field='<vector-field>', vector='<array>', option='<key=value[,key=value]*>')``

All four arguments are required and must be passed by name as string
literals. Positional arguments, or a mix of positional and named
arguments, are not supported. For example, the following is invalid::

FROM vectorSearch('my_vectors', field='embedding',
vector='[0.1,0.2]', option='k=5') AS v

A table alias is required. Projected fields are referenced through the
alias (``v._id``, ``v._score``, ``v.category``).

If the ``opensearch-knn`` plugin is not installed on the target cluster,
query execution fails with a ``vectorSearch() requires the k-NN plugin``
error. ``_explain`` continues to work without the plugin.

Arguments
---------

- ``table``: single concrete index or alias to search. Wildcards
(``*``), comma-separated multi-index targets, ``_all``, ``.``, and
``..`` are not supported. The target index must have
``index.knn: true`` and map the target field as ``knn_vector``. A
normal alias name is accepted. If the alias resolves to multiple
backing indices, the SQL layer does not prevalidate that every
backing index has a compatible ``knn_vector`` mapping, dimension, or
engine; OpenSearch execution remains the source of truth for those
checks.
- ``field``: name of the ``knn_vector`` field.
- ``vector``: query vector as a JSON-style array of numbers, passed as a
string (for example, ``'[0.1, 0.2, 0.3]'``). Components must be
comma-separated finite numbers. Semicolon, colon, and pipe separators
are not supported, and empty components (for example, ``'[1.0,,2.0]'``
or ``'[1.0,]'``) return an error. The vector dimension must match the
``knn_vector`` mapping on the target index.
- ``option``: comma-separated ``key=value`` pairs. Exactly one of ``k``,
``max_distance``, or ``min_score`` is required. ``filter_type`` is
optional.

Supported option keys
---------------------

Option keys are lower-case and case-sensitive. ``K=5`` or
``Filter_Type=post`` returns an "Unknown option key" error.

- ``k``: top-k mode. Integer between 1 and 10000. The query returns up to
``k`` nearest neighbors.
- ``max_distance``: radial mode. Non-negative number. Matches documents
within the given distance of the query vector. ``LIMIT`` is required and
caps the returned rows.
- ``min_score``: radial mode. Non-negative number. Matches documents with
score at or above the given threshold. ``LIMIT`` is required and caps
the returned rows.
- ``filter_type``: ``post`` or ``efficient``. Controls how a ``WHERE``
clause is applied. See `Filtering`_.

``k``, ``max_distance``, and ``min_score`` are mutually exclusive; specify
exactly one.

Native k-NN tuning options (for example, ``method_parameters.ef_search``,
``method_parameters.nprobes``, ``rescore.oversample_factor``) are not
supported through ``vectorSearch()`` and return an "Unknown option
key" error.

Syntax
------

::

SELECT <projection>
FROM vectorSearch(
table='<index>',
field='<vector-field>',
vector='<array>',
option='<key=value[,key=value]*>'
) AS <alias>
[WHERE <predicate on alias non-vector fields>]
[ORDER BY <alias>._score DESC]
[LIMIT <n>]

Example 1: Top-k
----------------

Return the five nearest neighbors of a query vector::

POST /_plugins/_sql
{
"query" : """
SELECT v._id, v._score
FROM vectorSearch(
table='my_vectors',
field='embedding',
vector='[0.1, 0.2, 0.3]',
option='k=5'
) AS v
"""
}

In top-k mode, the request size defaults to ``k``; adding ``LIMIT n`` further
reduces the row count, but ``n`` must not exceed ``k``.

Example 2: Radial search (``max_distance``)
-------------------------------------------

Return up to the specified ``LIMIT`` documents within a maximum distance
of the query vector. ``LIMIT`` is required for radial searches; without
it the result set would be unbounded::

POST /_plugins/_sql
{
"query" : """
SELECT v._id, v._score
FROM vectorSearch(
table='my_vectors',
field='embedding',
vector='[0.1, 0.2, 0.3]',
option='max_distance=0.5'
) AS v
LIMIT 100
"""
}

Example 3: Radial search (``min_score``)
----------------------------------------

Return up to the specified ``LIMIT`` documents whose score is at or
above the given threshold. ``LIMIT`` is required for radial searches;
without it the result set would be unbounded::

POST /_plugins/_sql
{
"query" : """
SELECT v._id, v._score
FROM vectorSearch(
table='my_vectors',
field='embedding',
vector='[0.1, 0.2, 0.3]',
option='min_score=0.8'
) AS v
LIMIT 100
"""
}

Filtering
=========

A ``WHERE`` clause on non-vector fields of the ``vectorSearch()`` alias is
pushed down to OpenSearch when it can be translated to an OpenSearch filter.
Two placement strategies are available via the ``filter_type`` option:

- ``efficient`` (default): the ``WHERE`` predicate is embedded directly
inside the k-NN query (``knn.filter``), enabling native efficient
k-NN filtering during vector search. Efficient filtering depends on
native k-NN engine and method support; if the target index does not
support ``knn.filter`` for the configured engine and method, set
``filter_type=post``. See the `k-NN filtering guide
<https://docs.opensearch.org/latest/vector-search/filter-search-knn/efficient-knn-filtering/>`_
for engine and method requirements.
- ``post``: the k-NN query is placed in a scoring (``bool.must``)
context and the ``WHERE`` predicate is placed as a non-scoring
``bool.filter`` outside the k-NN clause. This is Boolean filter
placement, not the REST ``post_filter`` parameter, and may return
fewer than ``k`` rows when the filter is selective.

Full-text predicates (``match``, ``match_phrase``, ``multi_match``, and
the rest of the full-text family) under a ``WHERE`` clause are used as
filters, not as hybrid keyword-vector score fusion. Their placement
follows ``filter_type``: the default (``efficient``) embeds supported
full-text predicates under ``knn.filter``, while ``post`` places them
in ``bool.filter`` outside the k-NN clause. In both cases they restrict
which candidates are retained but their text relevance score does not
combine with the vector ``_score``. ``vectorSearch()`` is not a hybrid
vector + text relevance scorer.

Behavior depends on whether ``filter_type`` is specified:

- **Omitted (default, ``efficient``)**: the ``WHERE`` predicate is
embedded under ``knn.filter`` so the k-NN engine applies native
efficient filtering during vector search. A query with no ``WHERE``
clause is valid. ``efficient`` supports simple native filters:
``term``, ``range``, ``wildcard``, ``exists``, full-text family
(``match``, ``match_phrase``, ``match_phrase_prefix``,
``match_bool_prefix``, ``multi_match``, ``query_string``,
``simple_query_string``), and boolean combinations of those filters.
Predicates that compile to script queries (arithmetic, function calls
on indexed fields, ``CASE``, date math), nested predicates, and other
query shapes are not supported under ``knn.filter`` and return an
error. Set ``filter_type=post`` to apply such predicates after the
k-NN search. If the predicate cannot be translated to an OpenSearch
filter query at all (a distinct translation failure from the
unsupported-shape cases above), the default path falls back to
evaluating the ``WHERE`` clause in memory after the k-NN results are
returned.
- **Explicit ``efficient``**: same contract as the default. Specifying
it is useful when a query should be explicit about the placement
strategy and should fail if the predicate cannot be safely embedded
under ``knn.filter``.
- **Explicit ``post``**: a ``WHERE`` clause is required and must be
translatable to an OpenSearch filter query. Predicates that translate
to native OpenSearch queries are pushed down as a ``bool.filter``
alongside the k-NN query. Predicates that do not have a native
equivalent (for example, arithmetic or function calls on indexed
fields) are pushed down as an OpenSearch script query and evaluated
server-side. If predicate translation itself fails, the query returns
an error; there is no silent in-memory fallback under explicit
``post``. Use ``filter_type=post`` when the predicate shape is not
supported by efficient filtering.

Example 4: Default efficient filtering (no ``filter_type``)
-----------------------------------------------------------

::

POST /_plugins/_sql
{
"query" : """
SELECT v._id, v._score, v.category
FROM vectorSearch(
table='my_vectors',
field='embedding',
vector='[0.1, 0.2, 0.3]',
option='k=10'
) AS v
WHERE v.category = 'books'
"""
}

The predicate is embedded under ``knn.filter`` so the k-NN engine
applies native efficient filtering during vector search.

Example 5: Post-filtering for predicates not supported by efficient mode
------------------------------------------------------------------------

Use ``filter_type=post`` for predicates that do not fit the ``efficient``
allow-list, such as arithmetic or function calls on indexed fields::

POST /_plugins/_sql
{
"query" : """
SELECT v._id, v._score, v.category
FROM vectorSearch(
table='my_vectors',
field='embedding',
vector='[0.1, 0.2, 0.3]',
option='k=10,filter_type=post'
) AS v
WHERE v.price * 1.1 < 100
"""
}

Scoring, sorting, and limits
============================

- ``vectorSearch()`` exposes the OpenSearch ``_score`` metadata field on the
alias. For an alias ``v``, select it as ``v._score``.
- ``_score`` can be selected and referenced in ``ORDER BY``, but it cannot
appear in ``WHERE``. Use ``option='min_score=...'`` for score-threshold
vector search.
- Results are returned in ``_score DESC`` order by default. The only
supported ``ORDER BY`` expression is ``<alias>._score DESC`` (for
example, ``v._score DESC``).
- In top-k mode (``k=N``), ``LIMIT n`` is optional; when present, ``n`` must
be ``≤ k``.
- In radial mode (``max_distance`` or ``min_score``), ``LIMIT`` is required.
- ``OFFSET`` is not supported on ``vectorSearch()``. Use ``LIMIT`` only.

Limitations
===========

The following are not supported on ``vectorSearch()``:

- ``GROUP BY`` and aggregations directly over a ``vectorSearch()``
relation are not supported and return an error.
- Operators wrapped around a ``vectorSearch()`` subquery are rejected
when they would run after ``vectorSearch()`` has already produced a
finite result set, because they can silently yield zero, skipped, or
incorrectly ordered rows. Specifically, an outer ``WHERE``,
``ORDER BY``, ``OFFSET`` (non-zero), ``GROUP BY``, aggregation, or
``DISTINCT`` applied to a ``vectorSearch()`` subquery returns an
error. Place ``WHERE`` predicates inside the subquery, directly on
the ``vectorSearch()`` alias, so that they participate in ``WHERE``
pushdown. A plain outer ``LIMIT`` (without ``OFFSET``) wrapping a
``vectorSearch()`` subquery is allowed and caps the returned rows.
- ``JOIN`` between a ``vectorSearch()`` relation and another relation is
not supported.
- ``UNION`` / ``INTERSECT`` / ``EXCEPT`` combining a ``vectorSearch()``
relation with another relation is not supported.
- Multiple ``vectorSearch()`` calls in the same query are not supported.
- The query vector must be supplied as a literal. Parameterized vectors
(for example, values bound from another column) are not supported.
- Indexes that define a user field named ``_score`` cannot be queried
with ``vectorSearch()`` because ``_score`` is reserved for the
synthetic vector score exposed on the alias. Rename the field or query
the index with a plain ``SELECT``.
Loading
Loading