Add UDT normalizer for unified query planner by dai-chen · Pull Request #5355 · opensearch-project/sql

dai-chen · 2026-04-15T19:32:53Z

Description [WIP]

Fix type mismatch between PPL UDT types (String-based) and standard Calcite types (int/long-based) in the unified query API path.

UdtNormalizer post-processes the logical plan to:

Replace UDT return types with standard Calcite types in UDF signatures
Wrap UDF implementors to convert between value representations at input (int/long → String) and output (String → int/long)

Related Issues

Resolves #5250

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-04-15T19:34:43Z

PR Reviewer Guide 🔍

(Review updated until commit `9887cf5`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Introduce PostAnalysisRule extension point in LanguageSpec and UnifiedQueryPlanner Relevant files: core/src/main/java/org/opensearch/sql/expression/function/ImplementorUDF.java api/src/main/java/org/opensearch/sql/api/spec/LanguageSpec.java api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java Sub-PR theme: Add DatetimeUDT normalizer rules and register with PPL spec Relevant files: api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtExtension.java api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtLiteralCoercionRule.java api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtOutputCastRule.java api/src/main/java/org/opensearch/sql/api/spec/UnifiedPplSpec.java api/src/test/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtExtensionTest.java api/src/test/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtLiteralCoercionRuleTest.java
⚡ Recommended focus areas for review Unsafe Cast In `getOperandMetadata()`, the result of `udf.getOperandTypeChecker()` is cast directly to `UDFOperandMetadata` without a null check or instanceof guard. If the UDF was constructed with a different or null operand type checker, this will throw a ClassCastException at runtime. public UDFOperandMetadata getOperandMetadata() { return (UDFOperandMetadata) udf.getOperandTypeChecker(); } Operand Count Mismatch In `normalizeImplementation`, the loop iterates over `operands` (from the rewritten `rexCall`) but indexes into `originalCall.getOperands()` for type lookup. If the operand lists differ in size (e.g., due to a prior rewrite), this will throw an `IndexOutOfBoundsException`. List<Expression> converted = new ArrayList<>(operands.size()); for (int i = 0; i < operands.size(); i++) { Expression operand = operands.get(i); RelDataType opType = originalCall.getOperands().get(i).getType(); converted.add( UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand)); Nullability Loss The `varcharType` created at line 38 does not carry nullability. While nullability is applied per-field at line 44-47, if `field.getType().isNullable()` is false, the cast target is non-nullable VARCHAR. This may be correct, but it should be verified that non-nullable datetime columns are intentionally cast to non-nullable VARCHAR and that this doesn't conflict with downstream consumers expecting nullable types. RelDataType varcharType = rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR); List<RexNode> projects = new ArrayList<>(fields.size()); List<String> names = new ArrayList<>(fields.size()); for (RelDataTypeField field : fields) { RexNode ref = rexBuilder.makeInputRef(plan, field.getIndex()); if (UdtMapping.fromStdType(field.getType()).isPresent()) { RelDataType nullableVarchar = rexBuilder .getTypeFactory() .createTypeWithNullability(varcharType, field.getType().isNullable()); projects.add(rexBuilder.makeCast(nullableVarchar, ref)); Incomplete Coverage The `isTargetOperator` method includes `SqlKind.IN`, `SqlKind.BETWEEN`, and `SqlKind.SEARCH`, but the test file explicitly documents that `IN` and `BETWEEN` are NOT covered because the PPL visitor rejects them before this rule runs. This creates dead code in the operator check and may give a false sense of coverage. Consider removing these kinds from `isTargetOperator` or adding a comment explaining why they are included despite not being reachable. private static boolean isTargetOperator(RexCall call) { SqlKind kind = call.getKind(); return kind == SqlKind.EQUALS \|\| kind == SqlKind.NOT_EQUALS \|\| kind == SqlKind.GREATER_THAN \|\| kind == SqlKind.GREATER_THAN_OR_EQUAL \|\| kind == SqlKind.LESS_THAN \|\| kind == SqlKind.LESS_THAN_OR_EQUAL \|\| kind == SqlKind.IN \|\| kind == SqlKind.SEARCH \|\| kind == SqlKind.BETWEEN \|\| kind == SqlKind.COALESCE; }

github-actions · 2026-04-15T19:35:18Z

PR Code Suggestions ✨

Latest suggestions up to 9887cf5

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Use runtime call operand types in implementor lambda The lambda captures `originalCall` to look up operand types, but `rexCall` passed to the implementor at code-generation time may have a different number of operands than `originalCall` (e.g., after further rewrites or if Calcite adds implicit operands). This could cause an `IndexOutOfBoundsException` at code generation time. The operand types should be looked up from `rexCall` instead of `originalCall` to ensure consistency. api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [103-110] return (translator, rexCall, operands) -> { List<Expression> converted = new ArrayList<>(operands.size()); for (int i = 0; i < operands.size(); i++) { Expression operand = operands.get(i); - RelDataType opType = originalCall.getOperands().get(i).getType(); + RelDataType opType = rexCall.getOperands().get(i).getType(); converted.add( UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand)); } Suggestion importance[1-10]: 6 __ Why: This is a valid concern: the lambda captures `originalCall` for operand type lookup, but the `rexCall` passed at code-generation time is the correct source for operand types. Using `rexCall.getOperands()` instead of `originalCall.getOperands()` is more robust and avoids potential `IndexOutOfBoundsException` if operand counts differ between analysis and code-generation time.	Low
General	Guard UDT normalization to avoid false positives The condition checks if the return type is not a UDT AND no operand has a standard datetime type, but the intent should be to skip normalization when neither the return type is a UDT nor any operand is a standard datetime type. The current logic is correct, but the `fromStdType` check on operands is used to detect when a UDF receives standard datetime inputs that need conversion back to UDT values. However, this could also match non-datetime UDFs that happen to have standard datetime operands passed in from other sources, potentially causing incorrect wrapping. Consider adding a guard to ensure the UDF actually expects UDT inputs by checking the original operand types against UDT types as well. api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [65-70] Optional<UdtMapping> returnType = UdtMapping.fromUdtType(visited.getType()); -if (returnType.isEmpty() - && visited.getOperands().stream() - .noneMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent())) { +boolean anyUdtOperand = visited.getOperands().stream() + .anyMatch(op -> UdtMapping.fromUdtType(op.getType()).isPresent() + \|\| UdtMapping.fromStdType(op.getType()).isPresent()); +if (returnType.isEmpty() && !anyUdtOperand) { return visited; } Suggestion importance[1-10]: 3 __ Why: The suggestion changes the operand check to also include UDT-typed operands, but the original code already handles the case correctly: it checks if any operand has a standard datetime type (which is what happens after `DatetimeUdtLiteralCoercionRule` runs). The improved code adds `fromUdtType` check which may be redundant since UDT operands would be handled by `fromUdtType` on the return type check. The concern about false positives is speculative and not clearly demonstrated.	Low
General	Avoid reusing shared mutable type instance in loop The `varcharType` created outside the loop has no precision specified, which may result in a VARCHAR with default or unbounded precision. When casting datetime values to VARCHAR, the resulting string representation (e.g., "2020-03-15") has a known maximum length. Using an unbounded VARCHAR could cause issues with some downstream consumers or type inference. Consider creating the VARCHAR type with an explicit precision or reusing the nullable variant inside the loop instead of creating a base type outside. api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtOutputCastRule.java [38-53] -RelDataType varcharType = rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR); List<RexNode> projects = new ArrayList<>(fields.size()); List<String> names = new ArrayList<>(fields.size()); for (RelDataTypeField field : fields) { RexNode ref = rexBuilder.makeInputRef(plan, field.getIndex()); if (UdtMapping.fromStdType(field.getType()).isPresent()) { RelDataType nullableVarchar = rexBuilder .getTypeFactory() - .createTypeWithNullability(varcharType, field.getType().isNullable()); + .createTypeWithNullability( + rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR), + field.getType().isNullable()); projects.add(rexBuilder.makeCast(nullableVarchar, ref)); } else { projects.add(ref); } names.add(field.getName()); } Suggestion importance[1-10]: 2 __ Why: The `varcharType` object created by `createSqlType` is typically immutable in Calcite's type factory, and `createTypeWithNullability` creates a new type object anyway. The suggestion moves the `varcharType` creation inside the loop but doesn't actually change the precision or behavior, making it a marginal style change with no functional impact.	Low

Previous suggestions

Suggestions up to commit 9887cf5

Category	Suggestion	Impact
Possible issue	Fix incorrect operand UDT check in normalize rule The condition checks if any operand has a standard datetime type to decide whether to proceed, but the intent is to normalize UDT operands (i.e., operands whose type is a UDT). Using `fromStdType` here means the rule will also trigger on calls that already have standard datetime operands but no UDT involvement, potentially double-wrapping implementors. The guard should use `fromUdtType` to check operands. api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [65-70] Optional<UdtMapping> returnType = UdtMapping.fromUdtType(visited.getType()); if (returnType.isEmpty() && visited.getOperands().stream() - .noneMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent())) { + .noneMatch(op -> UdtMapping.fromUdtType(op.getType()).isPresent())) { return visited; } Suggestion importance[1-10]: 7 __ Why: The guard condition uses `fromStdType` to check operands, but the intent is to detect UDT operands that need normalization. Using `fromStdType` could cause the rule to trigger on calls with standard datetime operands that have no UDT involvement, potentially double-wrapping implementors. Changing to `fromUdtType` would correctly scope the rule to only UDT-involved calls.	Medium
Possible issue	Use code-gen call operand types instead of captured original The implementor lambda captures `originalCall` (the pre-rewrite call) to look up operand types, but at code-generation time the actual operands passed in may differ in count or order if Calcite rewrites the call further. The operand types should be derived from `rexCall` (the call at code-gen time) rather than `originalCall` to stay consistent. api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [103-110] return (translator, rexCall, operands) -> { List<Expression> converted = new ArrayList<>(operands.size()); for (int i = 0; i < operands.size(); i++) { Expression operand = operands.get(i); - RelDataType opType = originalCall.getOperands().get(i).getType(); + RelDataType opType = rexCall.getOperands().get(i).getType(); converted.add( UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand)); } Suggestion importance[1-10]: 6 __ Why: The lambda captures `originalCall` to look up operand types at code-generation time, but `rexCall` (the call at code-gen time) is more appropriate since Calcite may further rewrite the call. Using `rexCall.getOperands().get(i).getType()` ensures the operand types are consistent with what's actually being compiled.	Low
General	Avoid reusing shared VARCHAR type across fields The `varcharType` created outside the loop has no precision specified, which may result in a VARCHAR with default (often unlimited or implementation-defined) precision. For consistency and to avoid potential type mismatches downstream, the nullable varchar should be created directly per field without reusing a shared base type, or an explicit precision should be set. api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtOutputCastRule.java [38-53] -RelDataType varcharType = rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR); List<RexNode> projects = new ArrayList<>(fields.size()); List<String> names = new ArrayList<>(fields.size()); for (RelDataTypeField field : fields) { RexNode ref = rexBuilder.makeInputRef(plan, field.getIndex()); if (UdtMapping.fromStdType(field.getType()).isPresent()) { RelDataType nullableVarchar = rexBuilder .getTypeFactory() - .createTypeWithNullability(varcharType, field.getType().isNullable()); + .createTypeWithNullability( + rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR), + field.getType().isNullable()); projects.add(rexBuilder.makeCast(nullableVarchar, ref)); } else { projects.add(ref); } names.add(field.getName()); } Suggestion importance[1-10]: 2 __ Why: The suggestion moves `varcharType` creation inside the loop, but the existing code already creates a fresh `nullableVarchar` per field using `createTypeWithNullability`. The shared `varcharType` is only used as a base for that call, so there's no actual type-sharing issue. The improvement is marginal at best.	Low

Suggestions up to commit 65c549f

Category	Suggestion	Impact
Possible issue	Fix incorrect UDT detection on operands The condition checks `fromStdType` on operands to decide whether to rewrite, but `fromStdType` matches standard Calcite types (DATE, TIME, TIMESTAMP), not UDT types. This means any UDF call that happens to receive a plain DATE/TIME/TIMESTAMP operand (e.g., from a table column) will be rewritten even if neither the return type nor the operand is a UDT. The operand check should use `fromUdtType` to detect actual UDT inputs that need conversion. api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [65-70] Optional<UdtMapping> returnType = UdtMapping.fromUdtType(visited.getType()); if (returnType.isEmpty() && visited.getOperands().stream() - .noneMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent())) { + .noneMatch(op -> UdtMapping.fromUdtType(op.getType()).isPresent())) { return visited; } Suggestion importance[1-10]: 7 __ Why: The suggestion points out that `fromStdType` on operands would match plain DATE/TIME/TIMESTAMP columns (not just UDT types), potentially triggering unnecessary rewrites. Using `fromUdtType` would correctly limit the check to actual UDT inputs. However, the intent of the rule is to handle UDFs that accept standard datetime types as inputs (converting from standard to UDT), so the correctness depends on the broader design intent.	Medium
Possible issue	Fix operand UDT-to-standard conversion direction Inside `normalizeImplementation`, the operand conversion uses `fromStdType` to detect which operands need conversion from standard to UDT values. However, the intent is to convert UDT-typed inputs (String) into standard values before passing them to the UDF. The check should use `fromUdtType` on the original operand type to correctly identify UDT inputs that need to be converted to standard representation. api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [104-110] List<Expression> converted = new ArrayList<>(operands.size()); for (int i = 0; i < operands.size(); i++) { Expression operand = operands.get(i); RelDataType opType = originalCall.getOperands().get(i).getType(); converted.add( - UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand)); + UdtMapping.fromUdtType(opType).map(u -> u.toStdValue(operand)).orElse(operand)); } Suggestion importance[1-10]: 7 __ Why: The suggestion identifies a potential directional mismatch: `fromStdType` with `fromStdValue` converts standard→UDT, but the intent is to convert UDT inputs to standard values before passing to the UDF. If operands are UDT-typed strings, `fromUdtType` with `toStdValue` would be the correct conversion. This could be a real logic bug depending on the actual operand types at this stage.	Medium
General	Document potential over-casting of native datetime columns The cast to VARCHAR is applied to all standard datetime types (DATE, TIME, TIMESTAMP) in the output, including columns that were originally plain table columns (not UDT columns). This may unintentionally cast non-UDT datetime columns to VARCHAR. The rule should only cast columns that originated as UDT types, but since provenance is lost after normalization, consider limiting the cast to columns that were originally UDT-typed by tracking them, or document this known behavior explicitly. api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtOutputCastRule.java [38-53] +// Note: after DatetimeUdtNormalizeRule, UDT-originated columns are indistinguishable from +// native datetime columns. If native datetime columns should not be cast, track UDT column +// indices before normalization and pass them here. Otherwise, this behavior is intentional. RelDataType varcharType = rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR); List<RexNode> projects = new ArrayList<>(fields.size()); List<String> names = new ArrayList<>(fields.size()); for (RelDataTypeField field : fields) { RexNode ref = rexBuilder.makeInputRef(plan, field.getIndex()); if (UdtMapping.fromStdType(field.getType()).isPresent()) { RelDataType nullableVarchar = rexBuilder .getTypeFactory() .createTypeWithNullability(varcharType, field.getType().isNullable()); projects.add(rexBuilder.makeCast(nullableVarchar, ref)); } else { projects.add(ref); } names.add(field.getName()); } Suggestion importance[1-10]: 2 __ Why: The suggestion only adds a comment to the existing code without changing any logic. The `improved_code` is functionally identical to the `existing_code`, making this a documentation-only suggestion with minimal impact.	Low

Suggestions up to commit 4bc4176

Category	Suggestion	Impact
Possible issue	Guard against operand list size mismatch The `normalizeImplementation` lambda captures `originalCall` to get operand types, but at runtime the `rexCall` passed to the implementor is the new normalized call (with updated operand types). Using `originalCall.getOperands()` for type lookup is correct for determining which operands need conversion, but if the operand list sizes differ (e.g., due to variadic functions), this will throw an `IndexOutOfBoundsException`. Add a bounds check or assert that sizes match. api/src/main/java/org/opensearch/sql/api/udt/UdtNormalizer.java [97-103] List<Expression> converted = new ArrayList<>(operands.size()); +List<RexNode> originalOperands = originalCall.getOperands(); for (int i = 0; i < operands.size(); i++) { Expression operand = operands.get(i); - RelDataType opType = originalCall.getOperands().get(i).getType(); - converted.add( - UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand)); + if (i < originalOperands.size()) { + RelDataType opType = originalOperands.get(i).getType(); + converted.add( + UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand)); + } else { + converted.add(operand); + } } Suggestion importance[1-10]: 4 __ Why: The suggestion adds a bounds check to prevent a potential `IndexOutOfBoundsException` when `operands.size()` exceeds `originalCall.getOperands().size()` for variadic functions. While this is a valid defensive programming concern, it's an edge case that may not occur in practice given the current UDF usage patterns.	Low
General	Clarify operand conversion condition logic The condition checks `fromStdType` on operands to decide whether to normalize, but `normalizeImplementation` also uses `fromStdType` to convert operands. However, `fromUdtType` is used in `assertNoUdtInRexCalls` to verify no UDT return types remain. If a UDF has a UDT return type but no standard-type operands, the early-return condition would still proceed (since `returnType` is present). The logic is correct in that case, but the comment says "inputs (standard → UDT)" which is misleading — it should say "standard → UDT" means converting standard Calcite types back to UDT string values for the UDF's input. Consider clarifying the comment and verifying the operand conversion direction is correct (i.e., that `fromStdValue` wraps a standard int/long into a UDT string as expected by the UDF). api/src/main/java/org/opensearch/sql/api/udt/UdtNormalizer.java [59-64] Optional<UdtMapping> returnType = UdtMapping.fromUdtType(visited.getType()); -if (returnType.isEmpty() - && visited.getOperands().stream() - .noneMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent())) { +boolean anyOperandNeedsConversion = + visited.getOperands().stream() + .anyMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent()); +if (returnType.isEmpty() && !anyOperandNeedsConversion) { return visited; } Suggestion importance[1-10]: 2 __ Why: The `improved_code` is functionally equivalent to the `existing_code` — it just extracts the stream condition into a named variable `anyOperandNeedsConversion`. This is a minor readability improvement with no correctness impact, and the suggestion's main content is about clarifying a comment rather than fixing a bug.	Low
General	Clarify discarded shuttle result in test The `RexShuttle` returned by `other.accept(new RexShuttle(){...})` is discarded — `RexShuttle.accept` returns a new `RelNode` with rewritten expressions, but the assertions inside `visitCall` are side effects that do run. However, `super.visitCall(call)` recurses into nested calls, so nested UDT types will be checked. This is fine for assertion purposes, but the result of `other.accept(...)` is silently dropped, which could be confusing. This is not a bug since assertions are the goal, but consider adding a comment to clarify intent. api/src/test/java/org/opensearch/sql/api/udt/UdtNormalizerTest.java [59-77] private void assertNoUdtInRexCalls(RelNode plan) { plan.accept( new RelHomogeneousShuttle() { @Override public RelNode visit(RelNode other) { + // Use RexShuttle for its traversal side-effects (assertions); result is discarded. other.accept( new RexShuttle() { @Override public RexNode visitCall(RexCall call) { assertFalse( "RexCall " + call + " has UDT return type: " + call.getType(), UdtMapping.fromUdtType(call.getType()).isPresent()); return super.visitCall(call); } }); return super.visit(other); } }); } Suggestion importance[1-10]: 1 __ Why: This suggestion only adds a comment to clarify that the `RexShuttle` result is intentionally discarded. The `existing_code` and `improved_code` are functionally identical, making this a purely cosmetic change with minimal impact.	Low

Suggestions up to commit c5c311d

Category	Suggestion	Impact
Possible issue	Fix operand UDT detection direction in early-exit check The condition checks `fromStdType` on operands to decide whether conversion is needed, but `normalizeImplementation` also uses `fromStdType` on operand types to wrap inputs. However, the early-exit condition uses `fromStdType` (standard → UDT direction) while the intent for operand conversion is to convert standard Calcite types back to UDT values for the UDF. If the operand check is meant to detect UDT operands that need conversion, it should use `fromUdtType` instead of `fromStdType`, otherwise UDT-typed operands will be silently skipped. api/src/main/java/org/opensearch/sql/api/udt/UdtNormalizer.java [59-64] Optional<UdtMapping> returnType = UdtMapping.fromUdtType(visited.getType()); if (returnType.isEmpty() && visited.getOperands().stream() - .noneMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent())) { + .noneMatch(op -> UdtMapping.fromUdtType(op.getType()).isPresent())) { return visited; } Suggestion importance[1-10]: 7 __ Why: The early-exit condition uses `fromStdType` on operands to detect if conversion is needed, but `normalizeImplementation` also uses `fromStdType` to convert operands. The suggestion argues that UDT-typed operands should be detected with `fromUdtType` instead. However, looking at the design more carefully: the normalizer converts UDT-returning UDFs to standard types, and operands that are standard Calcite types (DATE, TIME, TIMESTAMP) need to be converted back to UDT strings for the UDF. The `fromStdType` check on operands seems intentional for detecting standard-typed operands that need UDT conversion. The suggestion may be incorrect about the direction, but raises a valid concern worth investigating.	Medium
Possible issue	Guard against unsafe cast of operand type checker The cast `(UDFOperandMetadata) udf.getOperandTypeChecker()` will throw a `ClassCastException` at runtime if the original UDF's operand type checker is not an instance of `UDFOperandMetadata`. A guard check or safe cast should be used to avoid unexpected failures. api/src/main/java/org/opensearch/sql/api/udt/UdtNormalizer.java [78-80] @Override public UDFOperandMetadata getOperandMetadata() { - return (UDFOperandMetadata) udf.getOperandTypeChecker(); + if (udf.getOperandTypeChecker() instanceof UDFOperandMetadata meta) { + return meta; + } + throw new IllegalStateException( + "Expected UDFOperandMetadata but got: " + udf.getOperandTypeChecker()); } Suggestion importance[1-10]: 5 __ Why: The unchecked cast `(UDFOperandMetadata) udf.getOperandTypeChecker()` could throw a `ClassCastException` at runtime if the UDF's operand type checker is not a `UDFOperandMetadata`. The improved code provides a safer pattern with a meaningful error message, though the context suggests this path is only reached for `ImplementorUDFFunction` UDFs which should always have `UDFOperandMetadata`.	Low
General	Use current call's operand types in implementation lambda The lambda captures `originalCall` to get operand types, but at implementation time `rexCall` (the normalized call) is available and its operands reflect the actual types being processed. Using `originalCall` operand types is fragile if the shuttle has already rewritten nested calls. Additionally, `fromStdValue` converts a standard value to a UDT string, but the UDF likely expects UDT string inputs — verify that `fromStdType` is the correct lookup direction here (it maps standard types, not UDT types), as it may never match UDT-typed operands. api/src/main/java/org/opensearch/sql/api/udt/UdtNormalizer.java [94-103] private NotNullImplementor normalizeImplementation( RexCall originalCall, ImplementorUDFFunction func, Optional<UdtMapping> returnUdt) { return (translator, rexCall, operands) -> { List<Expression> converted = new ArrayList<>(operands.size()); for (int i = 0; i < operands.size(); i++) { Expression operand = operands.get(i); - RelDataType opType = originalCall.getOperands().get(i).getType(); + RelDataType opType = rexCall.getOperands().get(i).getType(); converted.add( UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand)); } Suggestion importance[1-10]: 5 __ Why: Using `originalCall` operand types in the lambda is potentially fragile since `rexCall` is available at implementation time. However, the `originalCall` is captured specifically to preserve the pre-normalization types, which may be intentional since the normalized call's operand types may have changed. The suggestion is reasonable but may not be strictly necessary depending on the design intent.	Low

github-actions · 2026-04-15T20:18:43Z

Persistent review updated to latest commit 4bc4176

Bridge the type mismatch between PPL UDT types (String-based) and standard Calcite types (int/long-based) in the unified query API path by contributing two post-analysis rules to PPL's LanguageSpec: 1. DatetimeUdtNormalizeRule rewrites UDT calls — replaces UDT return types with standard Calcite types and wraps UDF implementors to convert values at input (int/long -> String) and output (String -> int/long). 2. DatetimeUdtOutputCastRule wraps the plan root with a projection that casts remaining datetime output columns to VARCHAR so the wire format matches PPL's String datetime contract. Both rules are registered via DatetimeUdtExtension, which encapsulates the ordering invariant (normalize before cast). The extension plugs into the LanguageExtension mechanism introduced in opensearch-project#5360 via a new postAnalysisRules hook, applied once at the top of UnifiedQueryPlanner.plan() after the language-specific strategy returns. Applied only on the PPL path; zero impact on the SQL or OpenSearch plugin paths. Signed-off-by: Chen Dai <daichen@amazon.com>

github-actions · 2026-04-17T21:42:05Z

Persistent review updated to latest commit 65c549f

Add DatetimeUdtLiteralCoercionRule as a post-analysis rule that wraps VARCHAR operands with CAST(... AS <datetime>) inside comparisons, IN, SEARCH, BETWEEN, and COALESCE when the call has a standard Calcite DATE/TIME/TIMESTAMP operand alongside. This closes the gap left by DatetimeUdtNormalizeRule, which only rewrites operators backed by ImplementableUDFunction. The rule only modifies operand subtrees inside RexCall nodes; no RelNode rowType or RexInputRef slot identity is altered, so Calcite's cached RexInputRef types cannot be invalidated (unlike an in-place ref rewrite). Registered first in the extension's rule list so normalize and output-cast see homogeneous types downstream. Known scope limits: IN and BETWEEN with cross-type value lists are rejected by CalciteRexNodeVisitor before any post-analysis rule can run, and datetime+interval arithmetic requires a separate function signature registration; both are out of scope for this rule. Signed-off-by: Chen Dai <daichen@amazon.com>

Signed-off-by: Chen Dai <daichen@amazon.com>

github-actions · 2026-04-29T23:20:20Z

Persistent review updated to latest commit 9887cf5

github-actions · 2026-04-29T23:20:21Z

Persistent review updated to latest commit 9887cf5

dai-chen self-assigned this Apr 15, 2026

dai-chen added the bugFix label Apr 15, 2026

dai-chen force-pushed the poc/datetime-late-binding branch from c5c311d to 4bc4176 Compare April 15, 2026 20:17

dai-chen added the PPL Piped processing language label Apr 16, 2026

dai-chen force-pushed the poc/datetime-late-binding branch from 4bc4176 to 65c549f Compare April 17, 2026 21:40

dai-chen closed this Apr 17, 2026

This was referenced Apr 17, 2026

Add datetime UDT extension to bridge PPL datetime UDF and standard types #5368

Closed

[BUG] Unified query API fails with date types #5250

Open

dai-chen added 2 commits April 29, 2026 16:13

style(api): Apply spotless formatting to coercion rule test

9887cf5

Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen reopened this Apr 29, 2026

dai-chen closed this Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add UDT normalizer for unified query planner#5355

Add UDT normalizer for unified query planner#5355
dai-chen wants to merge 3 commits intoopensearch-project:mainfrom
dai-chen:poc/datetime-late-binding

dai-chen commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dai-chen commented Apr 15, 2026

Description [WIP]

Related Issues

Check List

Uh oh!

github-actions Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 9887cf5)

Uh oh!

github-actions Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 15, 2026 •

edited

Loading

(Review updated until commit `9887cf5`)

github-actions Bot commented Apr 15, 2026 •

edited

Loading