Merged
Conversation
* [Feature] Add table function relation to SQL grammar for vectorSearch() Add table function relation support to the SQL parser: - New `tableFunctionRelation` alternative in `relation` grammar rule - Named argument syntax: `key=value` (e.g., table='index', field='vec') - Alias is required by grammar (FROM func(...) AS alias) - AstBuilder emits existing TableFunction + SubqueryAlias AST nodes - 3 parser unit tests: basic parse, with WHERE/ORDER BY/LIMIT, alias required This is a pure grammar change — no execution support yet. Queries will parse successfully but fail at the Analyzer with "unsupported function". Signed-off-by: Eric Wei <[email protected]> * Address review feedback on table function relation grammar 1. Canonicalize argument names at parser boundary: unquoteIdentifier + toLowerCase(Locale.ROOT) in visitTableFunctionRelation so FIELD='x' and `field`='x' both produce argName="field" 2. Make AS keyword optional (AS? alias) for consistency with tableAsRelation and subqueryAsRelation grammar rules 3. Strengthen test coverage: - Full structural AST assertion for WHERE + ORDER BY + LIMIT (verifies Sort, Limit, Filter nodes, not just toString) - Argument reorder test proves names resolve by name not position - Case canonicalization test (TABLE= → table=) - Alias-without-AS test (FROM func(...) v) Signed-off-by: Eric Wei <[email protected]> * Apply spotless formatting Signed-off-by: Eric Wei <[email protected]> --------- Signed-off-by: Eric Wei <[email protected]>
Maps knn_vector fields to ExprCoreType.ARRAY so they appear in DESCRIBE output and can be referenced in projections. This is a visibility shim — not a full vector type. Signed-off-by: Eric Wei <[email protected]>
VectorSearchIndex.createScanBuilder() needs to construct an OpenSearchIndexScanBuilder with a custom VectorSearchQueryBuilder delegate. The existing constructor was protected (test-only). Signed-off-by: Eric Wei <[email protected]>
Introduces the core execution pipeline for vectorsearch(): - VectorSearchTableFunctionResolver: registers vectorsearch with 4 STRING args - VectorSearchTableFunctionImplementation: parses named args, vector literal, options string, validates search mode (k/max_distance/min_score) - VectorSearchIndex: extends OpenSearchIndex with knn query seeding, score tracking, and WrapperQueryBuilder DSL construction - VectorSearchQueryBuilder: keeps knn in must (scoring) context, WHERE filters in filter (non-scoring) context Signed-off-by: Eric Wei <[email protected]>
Override getFunctions() to expose vectorsearch() table function to the query analysis pipeline. Signed-off-by: Eric Wei <[email protected]>
Verifies knn query is placed in scoring (must) context, not wrapped in bool.filter when no WHERE clause is present. Signed-off-by: Eric Wei <[email protected]>
- Add pushDownFilter() unit test asserting knn stays in bool.must (scoring) and WHERE predicate goes to bool.filter (non-scoring) - Add option key allowlist (k, max_distance, min_score) to reject unknown/unsupported keys before they reach DSL generation - Add field name validation to reject characters that could corrupt the WrapperQueryBuilder JSON (allows alphanumeric, dots, underscores, hyphens) - Add named-arg type guard to reject non-NamedArgumentExpression args early with a clear error message Signed-off-by: Eric Wei <[email protected]>
Parse k as integer, max_distance and min_score as double before they reach buildKnnQuery(). Rejects non-numeric and non-finite values with clear errors. This closes the residual JSON-injection path through option values without requiring full XContent migration. Also fixes toString() to be consistent with the named-arg guard (no longer blindly casts to NamedArgumentExpression). Signed-off-by: Eric Wei <[email protected]>
- parseOptions: reject malformed segments and duplicate keys - parseVector: wrap errors in ExpressionEvaluationException, reject non-finite floats (Infinity, NaN) - VectorSearchIndex: default requestedTotalSize to k via pushDownLimitToRequestTotal so queries without LIMIT return k results - Add 5 new tests: malformed option, duplicate key, empty vector, malformed vector component, non-finite vector component Signed-off-by: Eric Wei <[email protected]>
- validateNamedArgs() now rejects null/empty arg names defensively, closing a potential NPE if the shared table-function path is later wired into PPL - OpenSearchStorageEngineTest uses contains-check instead of exact collection size assertion - Add testNullArgNameThrows test Signed-off-by: Eric Wei <[email protected]>
- Remove unused VECTOR_OPTION constant from VectorSearchIndex - Clarify buildKnnQuery() comment: quoted fallback is for forward compatibility, all P0 values are already canonicalized as numeric - Rename testMissingSearchModeOptionThrows to testUnknownOptionKeyOnlyThrows to match what it actually tests Signed-off-by: Eric Wei <[email protected]>
Signed-off-by: Eric Wei <[email protected]>
- Enforce exactly one of k, max_distance, or min_score - Validate k is in [1, 10000] range - Add 6 tests: mutual exclusivity (3 combos), k too small, k too large, k boundary values (1 and 10000) Signed-off-by: Eric Wei <[email protected]>
VectorSearchQueryBuilder now accepts options map and rejects pushDownLimit when LIMIT exceeds k. Radial modes (max_distance, min_score) have no LIMIT restriction. Signed-off-by: Eric Wei <[email protected]>
- Create VectorSearchIndexTest: 7 tests covering buildKnnQueryJson() for top-k, max_distance, min_score, nested fields, multi-element and single-element vectors, numeric option rendering - Add edge case tests to VectorSearchTableFunctionImplementationTest: NaN vector component, empty option key/value, negative k, NaN for max_distance and min_score (6 new tests) - Add VectorSearchQueryBuilderTest: min_score radial mode LIMIT, pushDownSort delegation to parent (2 new tests) - Extract buildKnnQueryJson() as package-private for direct testing Signed-off-by: Eric Wei <[email protected]>
Test too-many (5) and zero arguments paths in VectorSearchTableFunctionResolver to complement existing too-few (2) test. Signed-off-by: Eric Wei <[email protected]>
- Cap radial mode (max_distance/min_score) results at maxResultWindow to prevent unbounded result sets - Reject ORDER BY on non-_score fields and _score ASC in vectorSearch since knn results are naturally sorted by _score DESC - Add 12 integration tests: 4 _explain DSL shape verification tests and 8 validation error path tests Signed-off-by: Eric Wei <[email protected]>
- Add multi-sort expression test: ORDER BY _score DESC, name ASC correctly rejects the non-_score field (VectorSearchQueryBuilderTest) - Add case-insensitive argument name lookup test to verify TABLE='x' resolves same as table='x' (Implementation test) - Add non-numeric option fallback test: verifies string options are quoted in JSON output (VectorSearchIndexTest) - Add 4 integration tests: ORDER BY _score DESC succeeds, ORDER BY non-score rejects, ORDER BY _score ASC rejects, LIMIT within k succeeds (VectorSearchIT, now 16 tests) Signed-off-by: Eric Wei <[email protected]>
The base OpenSearchIndexScanQueryBuilder.pushDownSort() pushes sort.getCount() as a limit when non-zero. Our override validated _score DESC and returned true, but did not preserve this contract. SQL always sets count=0, so this was not reachable today, but PPL or future callers may set a non-zero count to combine sort+limit in one LogicalSort node. Preserve the behavior defensively. Add focused test: LogicalSort(count=7) with _score DESC verifies the count is pushed down as request size. Signed-off-by: Eric Wei <[email protected]>
- Unit test: compound AND predicate survives pushdown into bool.filter - Integration test: compound WHERE (term + range) produces bool query - Integration test: radial max_distance with WHERE produces bool query Signed-off-by: Eric Wei <[email protected]>
pushDownSort() called requestBuilder.pushDownLimit() directly, bypassing the LIMIT > k guard in pushDownLimit(). Extract validateLimitWithinK() helper and call it from both paths so the invariant holds when PPL or future callers set a non-zero sort count. Signed-off-by: Eric Wei <[email protected]>
Move all explainQuery()-based DSL shape tests into a dedicated VectorSearchExplainIT suite. VectorSearchIT now contains only validation and error-path tests. Signed-off-by: Eric Wei <[email protected]>
Signed-off-by: Eric Wei <[email protected]>
Signed-off-by: Eric Wei <[email protected]>
…SearchIndex Signed-off-by: Eric Wei <[email protected]>
Signed-off-by: Eric Wei <[email protected]>
…on in VectorSearchQueryBuilder Signed-off-by: Eric Wei <[email protected]>
Signed-off-by: Eric Wei <[email protected]>
…fficient mode Signed-off-by: Eric Wei <[email protected]>
…matting Signed-off-by: Eric Wei <[email protected]>
- Introduction: add a sentence noting that the SQL layer translates vectorSearch() into an OpenSearch search request whose body is native k-NN query DSL, with the query vector parsed into a numeric array before emission. - Soften the multi-backing alias note: SQL validates the table string shape only; it does not prevalidate per-backing-index mapping, dimension, or engine compatibility. OpenSearch execution remains the source of truth for those checks. - Rewrite the full-text paragraph: placement now follows filter_type, so under default `efficient` full-text predicates are embedded under `knn.filter` (not only "alongside" the k-NN query). Keep the not-hybrid-scorer clarification. - Reword `post` bullet to describe Boolean filter placement (`bool.must(knn)` + `bool.filter(where)`) instead of "runs first"; explicitly contrast with the REST `post_filter` parameter, and note that selective filters can yield fewer than k rows. - Rename Example 4 to "Default efficient filtering (no filter_type)" and replace the remaining "pre-filtering" mention with "efficient filtering" to align with OpenSearch k-NN terminology. - Scoring section: use a concrete `v._score` example for readability alongside the `<alias>._score` form. - Limitations: replace "top-k rows" with "finite result set" to cover both top-k and radial modes. Signed-off-by: Eric Wei <[email protected]>
ahkcs
approved these changes
Apr 29, 2026
Swiddis
approved these changes
Apr 29, 2026
This was referenced May 4, 2026
ahkcs
added a commit
to ahkcs/sql
that referenced
this pull request
May 8, 2026
Single squashed commit on top of feature/mustang-ppl-integration that absorbs upstream/main's commits not yet on the feature branch. Replaces the prior catch-up squash (opensearch-project#5396 base + the original af831d3 rebase commit) so this PR is a fast-forward into feature/mustang-ppl-integration. Squashed (rather than a merge commit) because upstream main commits were authored by many contributors with inconsistent or missing Signed-off-by trailers; DCO would otherwise reject those commits. Main commits absorbed (54 since divergence; 4 since the original catch-up squash was made on 2026-04-30): - opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec) - opensearch-project#5408 (datetime type normalization) - opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion) - opensearch-project#5399 (FGAC-scoped SQL cursor continuation) - opensearch-project#5394 (SQL Vector Search), opensearch-project#5361 (OpenSearch 3.7), opensearch-project#5360 (unified SQL language spec), opensearch-project#5240 (PPL Union), and 46 others. Conflict resolutions: api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec, UnifiedSqlSpec): took main. Main is a strict superset — adds postAnalysisRules and preCompilationRules extension points, the new FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/ date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension wiring on SQL spec. RELEVANCE category is preserved unchanged. api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java: took main. Both adopt the new postAnalysisRules / preCompilationRules hooks introduced in opensearch-project#5408 / opensearch-project#5419. core/executor/QueryService.java: composed both sides — kept feature's CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's StageErrorHandler stage tracking; both improvements are orthogonal. legacy/plugin/RestSqlAction.java: took feature. The 3-way merge produced a duplicated handleException/getRawErrorCode block; feature already contained both the delegateToV2Engine refactor and the ErrorReport unwrap from main, so feature is the correct superset. CLAUDE.md, docs/user/ppl/functions/condition.md: took main. explain_streamstats_global{,_null_bucket}.yaml: took main (post-opensearch-project#5359 shape). core/CalciteRelNodeVisitor + utils/PlanUtils: took main (collation utility hoisted from CalciteRelNodeVisitor.backtrackForCollation into PlanUtils.findInputCollation). integ-test/CalciteNoPushdownIT.java: added CalciteMixedFieldTypeIT. ppl/antlr/OpenSearchPPLParser.g4: added unionCommand. ppl/calcite/CalcitePPLStreamstatsTest.java: added testMultipleStreamstatsWithWindow. integ-test/build.gradle: took feature. Both sides added the same @ignore exclusion block; feature has alphabetical ordering and a more detailed comment explaining the Gradle 9.4.1 cast bug. integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took feature's helper-managed test_eval provisioning (createIndexByRestClient + isIndexExist guard, from opensearch-project#5407) so analytics-engine compatibility runs get a parquet-backed index. Added the test_eval_agent setup (needed by the dotted-path eval tests for opensearch-project#5351) wrapped in its own isIndexExist guard for the same parquet-aware idempotency. plugin/.../TransportPPLQueryAction.java, plugin/.../SQLPlugin.java: took feature. PR opensearch-project#5403 made analytics-engine an optional dependency by moving QueryPlanExecutor from a required constructor parameter to an @Inject(optional=true) setter, and removed the loadExtensions / EngineExtensionsHolder / executionEngineExtensions plumbing. Feature retains the createSqlAnalyticsRouter method this catch-up introduced. plugin/.../config/EngineExtensionsHolder.java: deleted (unreferenced post-opensearch-project#5403; not present on feature). Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava + :integ-test compileTestJava all pass; unit tests pass; spotlessCheck clean. Signed-off-by: Kai Huang <[email protected]>
ahkcs
added a commit
that referenced
this pull request
May 8, 2026
…5397) Single squashed commit on top of feature/mustang-ppl-integration that absorbs upstream/main's commits not yet on the feature branch. Replaces the prior catch-up squash (#5396 base + the original af831d3 rebase commit) so this PR is a fast-forward into feature/mustang-ppl-integration. Squashed (rather than a merge commit) because upstream main commits were authored by many contributors with inconsistent or missing Signed-off-by trailers; DCO would otherwise reject those commits. Main commits absorbed (54 since divergence; 4 since the original catch-up squash was made on 2026-04-30): - #5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec) - #5408 (datetime type normalization) - #5414 (Gradle wrapper bump + @ignore exclusion) - #5399 (FGAC-scoped SQL cursor continuation) - #5394 (SQL Vector Search), #5361 (OpenSearch 3.7), #5360 (unified SQL language spec), #5240 (PPL Union), and 46 others. Conflict resolutions: api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec, UnifiedSqlSpec): took main. Main is a strict superset — adds postAnalysisRules and preCompilationRules extension points, the new FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/ date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension wiring on SQL spec. RELEVANCE category is preserved unchanged. api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java: took main. Both adopt the new postAnalysisRules / preCompilationRules hooks introduced in #5408 / #5419. core/executor/QueryService.java: composed both sides — kept feature's CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's StageErrorHandler stage tracking; both improvements are orthogonal. legacy/plugin/RestSqlAction.java: took feature. The 3-way merge produced a duplicated handleException/getRawErrorCode block; feature already contained both the delegateToV2Engine refactor and the ErrorReport unwrap from main, so feature is the correct superset. CLAUDE.md, docs/user/ppl/functions/condition.md: took main. explain_streamstats_global{,_null_bucket}.yaml: took main (post-#5359 shape). core/CalciteRelNodeVisitor + utils/PlanUtils: took main (collation utility hoisted from CalciteRelNodeVisitor.backtrackForCollation into PlanUtils.findInputCollation). integ-test/CalciteNoPushdownIT.java: added CalciteMixedFieldTypeIT. ppl/antlr/OpenSearchPPLParser.g4: added unionCommand. ppl/calcite/CalcitePPLStreamstatsTest.java: added testMultipleStreamstatsWithWindow. integ-test/build.gradle: took feature. Both sides added the same @ignore exclusion block; feature has alphabetical ordering and a more detailed comment explaining the Gradle 9.4.1 cast bug. integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took feature's helper-managed test_eval provisioning (createIndexByRestClient + isIndexExist guard, from #5407) so analytics-engine compatibility runs get a parquet-backed index. Added the test_eval_agent setup (needed by the dotted-path eval tests for #5351) wrapped in its own isIndexExist guard for the same parquet-aware idempotency. plugin/.../TransportPPLQueryAction.java, plugin/.../SQLPlugin.java: took feature. PR #5403 made analytics-engine an optional dependency by moving QueryPlanExecutor from a required constructor parameter to an @Inject(optional=true) setter, and removed the loadExtensions / EngineExtensionsHolder / executionEngineExtensions plumbing. Feature retains the createSqlAnalyticsRouter method this catch-up introduced. plugin/.../config/EngineExtensionsHolder.java: deleted (unreferenced post-#5403; not present on feature). Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava + :integ-test compileTestJava all pass; unit tests pass; spotlessCheck clean. Signed-off-by: Kai Huang <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merge the
feature/vector-search-p0branch intomain.Adds the experimental
vectorSearch()SQL table function (k-NN pushdown, efficient/post filtering, radial and top-k modes). See individual PRs in the stack for details.