Add UDT normalizer for unified query planner#5355
Add UDT normalizer for unified query planner#5355dai-chen wants to merge 3 commits intoopensearch-project:mainfrom
Conversation
PR Reviewer Guide 🔍(Review updated until commit 9887cf5)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 9887cf5 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 9887cf5
Suggestions up to commit 65c549f
Suggestions up to commit 4bc4176
Suggestions up to commit c5c311d
|
c5c311d to
4bc4176
Compare
|
Persistent review updated to latest commit 4bc4176 |
Bridge the type mismatch between PPL UDT types (String-based) and standard Calcite types (int/long-based) in the unified query API path by contributing two post-analysis rules to PPL's LanguageSpec: 1. DatetimeUdtNormalizeRule rewrites UDT calls — replaces UDT return types with standard Calcite types and wraps UDF implementors to convert values at input (int/long -> String) and output (String -> int/long). 2. DatetimeUdtOutputCastRule wraps the plan root with a projection that casts remaining datetime output columns to VARCHAR so the wire format matches PPL's String datetime contract. Both rules are registered via DatetimeUdtExtension, which encapsulates the ordering invariant (normalize before cast). The extension plugs into the LanguageExtension mechanism introduced in opensearch-project#5360 via a new postAnalysisRules hook, applied once at the top of UnifiedQueryPlanner.plan() after the language-specific strategy returns. Applied only on the PPL path; zero impact on the SQL or OpenSearch plugin paths. Signed-off-by: Chen Dai <daichen@amazon.com>
4bc4176 to
65c549f
Compare
|
Persistent review updated to latest commit 65c549f |
Add DatetimeUdtLiteralCoercionRule as a post-analysis rule that wraps VARCHAR operands with CAST(... AS <datetime>) inside comparisons, IN, SEARCH, BETWEEN, and COALESCE when the call has a standard Calcite DATE/TIME/TIMESTAMP operand alongside. This closes the gap left by DatetimeUdtNormalizeRule, which only rewrites operators backed by ImplementableUDFunction. The rule only modifies operand subtrees inside RexCall nodes; no RelNode rowType or RexInputRef slot identity is altered, so Calcite's cached RexInputRef types cannot be invalidated (unlike an in-place ref rewrite). Registered first in the extension's rule list so normalize and output-cast see homogeneous types downstream. Known scope limits: IN and BETWEEN with cross-type value lists are rejected by CalciteRexNodeVisitor before any post-analysis rule can run, and datetime+interval arithmetic requires a separate function signature registration; both are out of scope for this rule. Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
|
Persistent review updated to latest commit 9887cf5 |
1 similar comment
|
Persistent review updated to latest commit 9887cf5 |
Description [WIP]
Fix type mismatch between PPL UDT types (String-based) and standard Calcite types (int/long-based) in the unified query API path.
UdtNormalizer post-processes the logical plan to:
Related Issues
Resolves #5250
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.