Merged
Conversation
* [Feature] Add table function relation to SQL grammar for vectorSearch()
Add table function relation support to the SQL parser:
- New `tableFunctionRelation` alternative in `relation` grammar rule
- Named argument syntax: `key=value` (e.g., table='index', field='vec')
- Alias is required by grammar (FROM func(...) AS alias)
- AstBuilder emits existing TableFunction + SubqueryAlias AST nodes
- 3 parser unit tests: basic parse, with WHERE/ORDER BY/LIMIT, alias required
This is a pure grammar change — no execution support yet. Queries will
parse successfully but fail at the Analyzer with "unsupported function".
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
* Address review feedback on table function relation grammar
1. Canonicalize argument names at parser boundary:
unquoteIdentifier + toLowerCase(Locale.ROOT) in visitTableFunctionRelation
so FIELD='x' and `field`='x' both produce argName="field"
2. Make AS keyword optional (AS? alias) for consistency with
tableAsRelation and subqueryAsRelation grammar rules
3. Strengthen test coverage:
- Full structural AST assertion for WHERE + ORDER BY + LIMIT
(verifies Sort, Limit, Filter nodes, not just toString)
- Argument reorder test proves names resolve by name not position
- Case canonicalization test (TABLE= → table=)
- Alias-without-AS test (FROM func(...) v)
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
* Apply spotless formatting
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
---------
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Maps knn_vector fields to ExprCoreType.ARRAY so they appear in DESCRIBE output and can be referenced in projections. This is a visibility shim — not a full vector type. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
VectorSearchIndex.createScanBuilder() needs to construct an OpenSearchIndexScanBuilder with a custom VectorSearchQueryBuilder delegate. The existing constructor was protected (test-only). Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Introduces the core execution pipeline for vectorsearch(): - VectorSearchTableFunctionResolver: registers vectorsearch with 4 STRING args - VectorSearchTableFunctionImplementation: parses named args, vector literal, options string, validates search mode (k/max_distance/min_score) - VectorSearchIndex: extends OpenSearchIndex with knn query seeding, score tracking, and WrapperQueryBuilder DSL construction - VectorSearchQueryBuilder: keeps knn in must (scoring) context, WHERE filters in filter (non-scoring) context Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Override getFunctions() to expose vectorsearch() table function to the query analysis pipeline. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Verifies knn query is placed in scoring (must) context, not wrapped in bool.filter when no WHERE clause is present. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Add pushDownFilter() unit test asserting knn stays in bool.must (scoring) and WHERE predicate goes to bool.filter (non-scoring) - Add option key allowlist (k, max_distance, min_score) to reject unknown/unsupported keys before they reach DSL generation - Add field name validation to reject characters that could corrupt the WrapperQueryBuilder JSON (allows alphanumeric, dots, underscores, hyphens) - Add named-arg type guard to reject non-NamedArgumentExpression args early with a clear error message Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Parse k as integer, max_distance and min_score as double before they reach buildKnnQuery(). Rejects non-numeric and non-finite values with clear errors. This closes the residual JSON-injection path through option values without requiring full XContent migration. Also fixes toString() to be consistent with the named-arg guard (no longer blindly casts to NamedArgumentExpression). Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- parseOptions: reject malformed segments and duplicate keys - parseVector: wrap errors in ExpressionEvaluationException, reject non-finite floats (Infinity, NaN) - VectorSearchIndex: default requestedTotalSize to k via pushDownLimitToRequestTotal so queries without LIMIT return k results - Add 5 new tests: malformed option, duplicate key, empty vector, malformed vector component, non-finite vector component Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- validateNamedArgs() now rejects null/empty arg names defensively, closing a potential NPE if the shared table-function path is later wired into PPL - OpenSearchStorageEngineTest uses contains-check instead of exact collection size assertion - Add testNullArgNameThrows test Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Remove unused VECTOR_OPTION constant from VectorSearchIndex - Clarify buildKnnQuery() comment: quoted fallback is for forward compatibility, all P0 values are already canonicalized as numeric - Rename testMissingSearchModeOptionThrows to testUnknownOptionKeyOnlyThrows to match what it actually tests Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Enforce exactly one of k, max_distance, or min_score - Validate k is in [1, 10000] range - Add 6 tests: mutual exclusivity (3 combos), k too small, k too large, k boundary values (1 and 10000) Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
VectorSearchQueryBuilder now accepts options map and rejects pushDownLimit when LIMIT exceeds k. Radial modes (max_distance, min_score) have no LIMIT restriction. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Create VectorSearchIndexTest: 7 tests covering buildKnnQueryJson() for top-k, max_distance, min_score, nested fields, multi-element and single-element vectors, numeric option rendering - Add edge case tests to VectorSearchTableFunctionImplementationTest: NaN vector component, empty option key/value, negative k, NaN for max_distance and min_score (6 new tests) - Add VectorSearchQueryBuilderTest: min_score radial mode LIMIT, pushDownSort delegation to parent (2 new tests) - Extract buildKnnQueryJson() as package-private for direct testing Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Test too-many (5) and zero arguments paths in VectorSearchTableFunctionResolver to complement existing too-few (2) test. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Cap radial mode (max_distance/min_score) results at maxResultWindow to prevent unbounded result sets - Reject ORDER BY on non-_score fields and _score ASC in vectorSearch since knn results are naturally sorted by _score DESC - Add 12 integration tests: 4 _explain DSL shape verification tests and 8 validation error path tests Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Add multi-sort expression test: ORDER BY _score DESC, name ASC correctly rejects the non-_score field (VectorSearchQueryBuilderTest) - Add case-insensitive argument name lookup test to verify TABLE='x' resolves same as table='x' (Implementation test) - Add non-numeric option fallback test: verifies string options are quoted in JSON output (VectorSearchIndexTest) - Add 4 integration tests: ORDER BY _score DESC succeeds, ORDER BY non-score rejects, ORDER BY _score ASC rejects, LIMIT within k succeeds (VectorSearchIT, now 16 tests) Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
The base OpenSearchIndexScanQueryBuilder.pushDownSort() pushes sort.getCount() as a limit when non-zero. Our override validated _score DESC and returned true, but did not preserve this contract. SQL always sets count=0, so this was not reachable today, but PPL or future callers may set a non-zero count to combine sort+limit in one LogicalSort node. Preserve the behavior defensively. Add focused test: LogicalSort(count=7) with _score DESC verifies the count is pushed down as request size. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Unit test: compound AND predicate survives pushdown into bool.filter - Integration test: compound WHERE (term + range) produces bool query - Integration test: radial max_distance with WHERE produces bool query Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
pushDownSort() called requestBuilder.pushDownLimit() directly, bypassing the LIMIT > k guard in pushDownLimit(). Extract validateLimitWithinK() helper and call it from both paths so the invariant holds when PPL or future callers set a non-zero sort count. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Move all explainQuery()-based DSL shape tests into a dedicated VectorSearchExplainIT suite. VectorSearchIT now contains only validation and error-path tests. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…SearchIndex Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…on in VectorSearchQueryBuilder Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…fficient mode Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…matting Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…iltering
Mark the vectorSearch() table function as experimental in the user doc
following the repo convention (title [Experimental] suffix), and flip the
default WHERE filter placement from post-filtering to efficient
pre-filtering so a query without filter_type embeds the predicate under
knn.filter for ANN-time pruning.
Production code: filterType=null in VectorSearchIndex now resolves to
FilterType.EFFICIENT, and VectorSearchQueryBuilder's full constructor
defaults to EFFICIENT when passed null. The test-only 3-arg constructor
stays pinned to POST because it does not wire a rebuildKnnWithFilter
callback and EFFICIENT mode requires one.
Allow-list error messages are reworded to neutral wording
("vectorSearch WHERE pre-filtering does not support...") so default-path
users never see internal filter_type=efficient terminology and get a
clear "set filter_type=post" fallback hint.
Doc updates the Filtering section to describe Omitted=efficient as the
default, with post framed as the opt-in fallback for predicates outside
the efficient allow-list. Example 4 shows the default knn.filter shape;
Example 5 shows filter_type=post for arithmetic predicates.
Tests: BETWEEN / NOT IN regression guards pin filter_type=post
explicitly so they continue to assert the post-filter DSL shape.
testPostFilterReturnsOnlyMatchingDocs pins filter_type=post so the test
name still reflects what it exercises. New default-shape IT coverage
asserts knn.filter embeds the predicate and there is no outer bool
wrapping.
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Move translation-failure in-memory fallback note from explicit `post` to the default (omitted) section; under explicit `post` the query errors instead. - Expand Limitations to cover outer ORDER BY, non-zero OFFSET, GROUP BY, aggregation, and DISTINCT over a vectorSearch() subquery (matches the expanded rejection landed via #5385); note that plain outer LIMIT without OFFSET is allowed. - Add engine/method caveat for default `efficient` filtering and soften "pre-filtering during ANN search" phrasing to "native efficient k-NN filtering". - Clarify that full-text predicates under WHERE act as filters, not as hybrid relevance scorers. - Rename Example 5 to "Post-filtering for predicates not supported by efficient mode". - Tighten explicit `efficient` wording to emphasize it fails closed. - Reword radial examples and supported option keys to say "matches / returns up to the specified LIMIT documents" instead of "returns all". - Add alias fan-out note under the `table` argument. - Sweep remaining em dashes in the file to plain text. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Introduction: add a sentence noting that the SQL layer translates vectorSearch() into an OpenSearch search request whose body is native k-NN query DSL, with the query vector parsed into a numeric array before emission. - Soften the multi-backing alias note: SQL validates the table string shape only; it does not prevalidate per-backing-index mapping, dimension, or engine compatibility. OpenSearch execution remains the source of truth for those checks. - Rewrite the full-text paragraph: placement now follows filter_type, so under default `efficient` full-text predicates are embedded under `knn.filter` (not only "alongside" the k-NN query). Keep the not-hybrid-scorer clarification. - Reword `post` bullet to describe Boolean filter placement (`bool.must(knn)` + `bool.filter(where)`) instead of "runs first"; explicitly contrast with the REST `post_filter` parameter, and note that selective filters can yield fewer than k rows. - Rename Example 4 to "Default efficient filtering (no filter_type)" and replace the remaining "pre-filtering" mention with "efficient filtering" to align with OpenSearch k-NN terminology. - Scoring section: use a concrete `v._score` example for readability alongside the `<alias>._score` form. - Limitations: replace "top-k rows" with "finite result set" to cover both top-k and radial modes. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
ahkcs
approved these changes
Apr 29, 2026
Swiddis
approved these changes
Apr 29, 2026
This was referenced May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merge the
feature/vector-search-p0branch intomain.Adds the experimental
vectorSearch()SQL table function (k-NN pushdown, efficient/post filtering, radial and top-k modes). See individual PRs in the stack for details.