Skip to content

Add UDT normalizer for unified query planner#5355

Closed
dai-chen wants to merge 3 commits intoopensearch-project:mainfrom
dai-chen:poc/datetime-late-binding
Closed

Add UDT normalizer for unified query planner#5355
dai-chen wants to merge 3 commits intoopensearch-project:mainfrom
dai-chen:poc/datetime-late-binding

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

Description [WIP]

Fix type mismatch between PPL UDT types (String-based) and standard Calcite types (int/long-based) in the unified query API path.

UdtNormalizer post-processes the logical plan to:

  1. Replace UDT return types with standard Calcite types in UDF signatures
  2. Wrap UDF implementors to convert between value representations at input (int/long → String) and output (String → int/long)

Related Issues

Resolves #5250

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen self-assigned this Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 15, 2026

PR Reviewer Guide 🔍

(Review updated until commit 9887cf5)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Introduce PostAnalysisRule extension point in LanguageSpec and UnifiedQueryPlanner

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/ImplementorUDF.java
  • api/src/main/java/org/opensearch/sql/api/spec/LanguageSpec.java
  • api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java

Sub-PR theme: Add DatetimeUDT normalizer rules and register with PPL spec

Relevant files:

  • api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtExtension.java
  • api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java
  • api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtLiteralCoercionRule.java
  • api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtOutputCastRule.java
  • api/src/main/java/org/opensearch/sql/api/spec/UnifiedPplSpec.java
  • api/src/test/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtExtensionTest.java
  • api/src/test/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtLiteralCoercionRuleTest.java

⚡ Recommended focus areas for review

Unsafe Cast

In getOperandMetadata(), the result of udf.getOperandTypeChecker() is cast directly to UDFOperandMetadata without a null check or instanceof guard. If the UDF was constructed with a different or null operand type checker, this will throw a ClassCastException at runtime.

public UDFOperandMetadata getOperandMetadata() {
  return (UDFOperandMetadata) udf.getOperandTypeChecker();
}
Operand Count Mismatch

In normalizeImplementation, the loop iterates over operands (from the rewritten rexCall) but indexes into originalCall.getOperands() for type lookup. If the operand lists differ in size (e.g., due to a prior rewrite), this will throw an IndexOutOfBoundsException.

List<Expression> converted = new ArrayList<>(operands.size());
for (int i = 0; i < operands.size(); i++) {
  Expression operand = operands.get(i);
  RelDataType opType = originalCall.getOperands().get(i).getType();
  converted.add(
      UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand));
Nullability Loss

The varcharType created at line 38 does not carry nullability. While nullability is applied per-field at line 44-47, if field.getType().isNullable() is false, the cast target is non-nullable VARCHAR. This may be correct, but it should be verified that non-nullable datetime columns are intentionally cast to non-nullable VARCHAR and that this doesn't conflict with downstream consumers expecting nullable types.

RelDataType varcharType = rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR);
List<RexNode> projects = new ArrayList<>(fields.size());
List<String> names = new ArrayList<>(fields.size());
for (RelDataTypeField field : fields) {
  RexNode ref = rexBuilder.makeInputRef(plan, field.getIndex());
  if (UdtMapping.fromStdType(field.getType()).isPresent()) {
    RelDataType nullableVarchar =
        rexBuilder
            .getTypeFactory()
            .createTypeWithNullability(varcharType, field.getType().isNullable());
    projects.add(rexBuilder.makeCast(nullableVarchar, ref));
Incomplete Coverage

The isTargetOperator method includes SqlKind.IN, SqlKind.BETWEEN, and SqlKind.SEARCH, but the test file explicitly documents that IN and BETWEEN are NOT covered because the PPL visitor rejects them before this rule runs. This creates dead code in the operator check and may give a false sense of coverage. Consider removing these kinds from isTargetOperator or adding a comment explaining why they are included despite not being reachable.

private static boolean isTargetOperator(RexCall call) {
  SqlKind kind = call.getKind();
  return kind == SqlKind.EQUALS
      || kind == SqlKind.NOT_EQUALS
      || kind == SqlKind.GREATER_THAN
      || kind == SqlKind.GREATER_THAN_OR_EQUAL
      || kind == SqlKind.LESS_THAN
      || kind == SqlKind.LESS_THAN_OR_EQUAL
      || kind == SqlKind.IN
      || kind == SqlKind.SEARCH
      || kind == SqlKind.BETWEEN
      || kind == SqlKind.COALESCE;
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 15, 2026

PR Code Suggestions ✨

Latest suggestions up to 9887cf5

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Use runtime call operand types in implementor lambda

The lambda captures originalCall to look up operand types, but rexCall passed to the
implementor at code-generation time may have a different number of operands than
originalCall (e.g., after further rewrites or if Calcite adds implicit operands).
This could cause an IndexOutOfBoundsException at code generation time. The operand
types should be looked up from rexCall instead of originalCall to ensure
consistency.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [103-110]

 return (translator, rexCall, operands) -> {
   List<Expression> converted = new ArrayList<>(operands.size());
   for (int i = 0; i < operands.size(); i++) {
     Expression operand = operands.get(i);
-    RelDataType opType = originalCall.getOperands().get(i).getType();
+    RelDataType opType = rexCall.getOperands().get(i).getType();
     converted.add(
         UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand));
   }
Suggestion importance[1-10]: 6

__

Why: This is a valid concern: the lambda captures originalCall for operand type lookup, but the rexCall passed at code-generation time is the correct source for operand types. Using rexCall.getOperands() instead of originalCall.getOperands() is more robust and avoids potential IndexOutOfBoundsException if operand counts differ between analysis and code-generation time.

Low
General
Guard UDT normalization to avoid false positives

The condition checks if the return type is not a UDT AND no operand has a standard
datetime type, but the intent should be to skip normalization when neither the
return type is a UDT nor any operand is a standard datetime type. The current logic
is correct, but the fromStdType check on operands is used to detect when a UDF
receives standard datetime inputs that need conversion back to UDT values. However,
this could also match non-datetime UDFs that happen to have standard datetime
operands passed in from other sources, potentially causing incorrect wrapping.
Consider adding a guard to ensure the UDF actually expects UDT inputs by checking
the original operand types against UDT types as well.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [65-70]

 Optional<UdtMapping> returnType = UdtMapping.fromUdtType(visited.getType());
-if (returnType.isEmpty()
-    && visited.getOperands().stream()
-        .noneMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent())) {
+boolean anyUdtOperand = visited.getOperands().stream()
+    .anyMatch(op -> UdtMapping.fromUdtType(op.getType()).isPresent()
+        || UdtMapping.fromStdType(op.getType()).isPresent());
+if (returnType.isEmpty() && !anyUdtOperand) {
   return visited;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion changes the operand check to also include UDT-typed operands, but the original code already handles the case correctly: it checks if any operand has a standard datetime type (which is what happens after DatetimeUdtLiteralCoercionRule runs). The improved code adds fromUdtType check which may be redundant since UDT operands would be handled by fromUdtType on the return type check. The concern about false positives is speculative and not clearly demonstrated.

Low
Avoid reusing shared mutable type instance in loop

The varcharType created outside the loop has no precision specified, which may
result in a VARCHAR with default or unbounded precision. When casting datetime
values to VARCHAR, the resulting string representation (e.g., "2020-03-15") has a
known maximum length. Using an unbounded VARCHAR could cause issues with some
downstream consumers or type inference. Consider creating the VARCHAR type with an
explicit precision or reusing the nullable variant inside the loop instead of
creating a base type outside.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtOutputCastRule.java [38-53]

-RelDataType varcharType = rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR);
 List<RexNode> projects = new ArrayList<>(fields.size());
 List<String> names = new ArrayList<>(fields.size());
 for (RelDataTypeField field : fields) {
   RexNode ref = rexBuilder.makeInputRef(plan, field.getIndex());
   if (UdtMapping.fromStdType(field.getType()).isPresent()) {
     RelDataType nullableVarchar =
         rexBuilder
             .getTypeFactory()
-            .createTypeWithNullability(varcharType, field.getType().isNullable());
+            .createTypeWithNullability(
+                rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR),
+                field.getType().isNullable());
     projects.add(rexBuilder.makeCast(nullableVarchar, ref));
   } else {
     projects.add(ref);
   }
   names.add(field.getName());
 }
Suggestion importance[1-10]: 2

__

Why: The varcharType object created by createSqlType is typically immutable in Calcite's type factory, and createTypeWithNullability creates a new type object anyway. The suggestion moves the varcharType creation inside the loop but doesn't actually change the precision or behavior, making it a marginal style change with no functional impact.

Low

Previous suggestions

Suggestions up to commit 9887cf5
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect operand UDT check in normalize rule

The condition checks if any operand has a standard datetime type to decide whether
to proceed, but the intent is to normalize UDT operands (i.e., operands whose type
is a UDT). Using fromStdType here means the rule will also trigger on calls that
already have standard datetime operands but no UDT involvement, potentially
double-wrapping implementors. The guard should use fromUdtType to check operands.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [65-70]

 Optional<UdtMapping> returnType = UdtMapping.fromUdtType(visited.getType());
 if (returnType.isEmpty()
     && visited.getOperands().stream()
-        .noneMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent())) {
+        .noneMatch(op -> UdtMapping.fromUdtType(op.getType()).isPresent())) {
   return visited;
 }
Suggestion importance[1-10]: 7

__

Why: The guard condition uses fromStdType to check operands, but the intent is to detect UDT operands that need normalization. Using fromStdType could cause the rule to trigger on calls with standard datetime operands that have no UDT involvement, potentially double-wrapping implementors. Changing to fromUdtType would correctly scope the rule to only UDT-involved calls.

Medium
Use code-gen call operand types instead of captured original

The implementor lambda captures originalCall (the pre-rewrite call) to look up
operand types, but at code-generation time the actual operands passed in may differ
in count or order if Calcite rewrites the call further. The operand types should be
derived from rexCall (the call at code-gen time) rather than originalCall to stay
consistent.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [103-110]

 return (translator, rexCall, operands) -> {
   List<Expression> converted = new ArrayList<>(operands.size());
   for (int i = 0; i < operands.size(); i++) {
     Expression operand = operands.get(i);
-    RelDataType opType = originalCall.getOperands().get(i).getType();
+    RelDataType opType = rexCall.getOperands().get(i).getType();
     converted.add(
         UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand));
   }
Suggestion importance[1-10]: 6

__

Why: The lambda captures originalCall to look up operand types at code-generation time, but rexCall (the call at code-gen time) is more appropriate since Calcite may further rewrite the call. Using rexCall.getOperands().get(i).getType() ensures the operand types are consistent with what's actually being compiled.

Low
General
Avoid reusing shared VARCHAR type across fields

The varcharType created outside the loop has no precision specified, which may
result in a VARCHAR with default (often unlimited or implementation-defined)
precision. For consistency and to avoid potential type mismatches downstream, the
nullable varchar should be created directly per field without reusing a shared base
type, or an explicit precision should be set.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtOutputCastRule.java [38-53]

-RelDataType varcharType = rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR);
 List<RexNode> projects = new ArrayList<>(fields.size());
 List<String> names = new ArrayList<>(fields.size());
 for (RelDataTypeField field : fields) {
   RexNode ref = rexBuilder.makeInputRef(plan, field.getIndex());
   if (UdtMapping.fromStdType(field.getType()).isPresent()) {
     RelDataType nullableVarchar =
         rexBuilder
             .getTypeFactory()
-            .createTypeWithNullability(varcharType, field.getType().isNullable());
+            .createTypeWithNullability(
+                rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR),
+                field.getType().isNullable());
     projects.add(rexBuilder.makeCast(nullableVarchar, ref));
   } else {
     projects.add(ref);
   }
   names.add(field.getName());
 }
Suggestion importance[1-10]: 2

__

Why: The suggestion moves varcharType creation inside the loop, but the existing code already creates a fresh nullableVarchar per field using createTypeWithNullability. The shared varcharType is only used as a base for that call, so there's no actual type-sharing issue. The improvement is marginal at best.

Low
Suggestions up to commit 65c549f
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect UDT detection on operands

The condition checks fromStdType on operands to decide whether to rewrite, but
fromStdType matches standard Calcite types (DATE, TIME, TIMESTAMP), not UDT types.
This means any UDF call that happens to receive a plain DATE/TIME/TIMESTAMP operand
(e.g., from a table column) will be rewritten even if neither the return type nor
the operand is a UDT. The operand check should use fromUdtType to detect actual UDT
inputs that need conversion.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [65-70]

 Optional<UdtMapping> returnType = UdtMapping.fromUdtType(visited.getType());
 if (returnType.isEmpty()
     && visited.getOperands().stream()
-        .noneMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent())) {
+        .noneMatch(op -> UdtMapping.fromUdtType(op.getType()).isPresent())) {
   return visited;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion points out that fromStdType on operands would match plain DATE/TIME/TIMESTAMP columns (not just UDT types), potentially triggering unnecessary rewrites. Using fromUdtType would correctly limit the check to actual UDT inputs. However, the intent of the rule is to handle UDFs that accept standard datetime types as inputs (converting from standard to UDT), so the correctness depends on the broader design intent.

Medium
Fix operand UDT-to-standard conversion direction

Inside normalizeImplementation, the operand conversion uses fromStdType to detect
which operands need conversion from standard to UDT values. However, the intent is
to convert UDT-typed inputs (String) into standard values before passing them to the
UDF. The check should use fromUdtType on the original operand type to correctly
identify UDT inputs that need to be converted to standard representation.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [104-110]

 List<Expression> converted = new ArrayList<>(operands.size());
 for (int i = 0; i < operands.size(); i++) {
   Expression operand = operands.get(i);
   RelDataType opType = originalCall.getOperands().get(i).getType();
   converted.add(
-      UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand));
+      UdtMapping.fromUdtType(opType).map(u -> u.toStdValue(operand)).orElse(operand));
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion identifies a potential directional mismatch: fromStdType with fromStdValue converts standard→UDT, but the intent is to convert UDT inputs to standard values before passing to the UDF. If operands are UDT-typed strings, fromUdtType with toStdValue would be the correct conversion. This could be a real logic bug depending on the actual operand types at this stage.

Medium
General
Document potential over-casting of native datetime columns

The cast to VARCHAR is applied to all standard datetime types (DATE, TIME,
TIMESTAMP) in the output, including columns that were originally plain table columns
(not UDT columns). This may unintentionally cast non-UDT datetime columns to
VARCHAR. The rule should only cast columns that originated as UDT types, but since
provenance is lost after normalization, consider limiting the cast to columns that
were originally UDT-typed by tracking them, or document this known behavior
explicitly.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtOutputCastRule.java [38-53]

+// Note: after DatetimeUdtNormalizeRule, UDT-originated columns are indistinguishable from
+// native datetime columns. If native datetime columns should not be cast, track UDT column
+// indices before normalization and pass them here. Otherwise, this behavior is intentional.
 RelDataType varcharType = rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR);
 List<RexNode> projects = new ArrayList<>(fields.size());
 List<String> names = new ArrayList<>(fields.size());
 for (RelDataTypeField field : fields) {
   RexNode ref = rexBuilder.makeInputRef(plan, field.getIndex());
   if (UdtMapping.fromStdType(field.getType()).isPresent()) {
     RelDataType nullableVarchar =
         rexBuilder
             .getTypeFactory()
             .createTypeWithNullability(varcharType, field.getType().isNullable());
     projects.add(rexBuilder.makeCast(nullableVarchar, ref));
   } else {
     projects.add(ref);
   }
   names.add(field.getName());
 }
Suggestion importance[1-10]: 2

__

Why: The suggestion only adds a comment to the existing code without changing any logic. The improved_code is functionally identical to the existing_code, making this a documentation-only suggestion with minimal impact.

Low
Suggestions up to commit 4bc4176
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against operand list size mismatch

The normalizeImplementation lambda captures originalCall to get operand types, but
at runtime the rexCall passed to the implementor is the new normalized call (with
updated operand types). Using originalCall.getOperands() for type lookup is correct
for determining which operands need conversion, but if the operand list sizes differ
(e.g., due to variadic functions), this will throw an IndexOutOfBoundsException. Add
a bounds check or assert that sizes match.

api/src/main/java/org/opensearch/sql/api/udt/UdtNormalizer.java [97-103]

 List<Expression> converted = new ArrayList<>(operands.size());
+List<RexNode> originalOperands = originalCall.getOperands();
 for (int i = 0; i < operands.size(); i++) {
   Expression operand = operands.get(i);
-  RelDataType opType = originalCall.getOperands().get(i).getType();
-  converted.add(
-      UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand));
+  if (i < originalOperands.size()) {
+    RelDataType opType = originalOperands.get(i).getType();
+    converted.add(
+        UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand));
+  } else {
+    converted.add(operand);
+  }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion adds a bounds check to prevent a potential IndexOutOfBoundsException when operands.size() exceeds originalCall.getOperands().size() for variadic functions. While this is a valid defensive programming concern, it's an edge case that may not occur in practice given the current UDF usage patterns.

Low
General
Clarify operand conversion condition logic

The condition checks fromStdType on operands to decide whether to normalize, but
normalizeImplementation also uses fromStdType to convert operands. However,
fromUdtType is used in assertNoUdtInRexCalls to verify no UDT return types remain.
If a UDF has a UDT return type but no standard-type operands, the early-return
condition would still proceed (since returnType is present). The logic is correct in
that case, but the comment says "inputs (standard → UDT)" which is misleading — it
should say "standard → UDT" means converting standard Calcite types back to UDT
string values for the UDF's input. Consider clarifying the comment and verifying the
operand conversion direction is correct (i.e., that fromStdValue wraps a standard
int/long into a UDT string as expected by the UDF).

api/src/main/java/org/opensearch/sql/api/udt/UdtNormalizer.java [59-64]

 Optional<UdtMapping> returnType = UdtMapping.fromUdtType(visited.getType());
-if (returnType.isEmpty()
-    && visited.getOperands().stream()
-        .noneMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent())) {
+boolean anyOperandNeedsConversion =
+    visited.getOperands().stream()
+        .anyMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent());
+if (returnType.isEmpty() && !anyOperandNeedsConversion) {
   return visited;
 }
Suggestion importance[1-10]: 2

__

Why: The improved_code is functionally equivalent to the existing_code — it just extracts the stream condition into a named variable anyOperandNeedsConversion. This is a minor readability improvement with no correctness impact, and the suggestion's main content is about clarifying a comment rather than fixing a bug.

Low
Clarify discarded shuttle result in test

The RexShuttle returned by other.accept(new RexShuttle(){...}) is discarded —
RexShuttle.accept returns a new RelNode with rewritten expressions, but the
assertions inside visitCall are side effects that do run. However,
super.visitCall(call) recurses into nested calls, so nested UDT types will be
checked. This is fine for assertion purposes, but the result of other.accept(...) is
silently dropped, which could be confusing. This is not a bug since assertions are
the goal, but consider adding a comment to clarify intent.

api/src/test/java/org/opensearch/sql/api/udt/UdtNormalizerTest.java [59-77]

 private void assertNoUdtInRexCalls(RelNode plan) {
   plan.accept(
       new RelHomogeneousShuttle() {
         @Override
         public RelNode visit(RelNode other) {
+          // Use RexShuttle for its traversal side-effects (assertions); result is discarded.
           other.accept(
               new RexShuttle() {
                 @Override
                 public RexNode visitCall(RexCall call) {
                   assertFalse(
                       "RexCall " + call + " has UDT return type: " + call.getType(),
                       UdtMapping.fromUdtType(call.getType()).isPresent());
                   return super.visitCall(call);
                 }
               });
           return super.visit(other);
         }
       });
 }
Suggestion importance[1-10]: 1

__

Why: This suggestion only adds a comment to clarify that the RexShuttle result is intentionally discarded. The existing_code and improved_code are functionally identical, making this a purely cosmetic change with minimal impact.

Low
Suggestions up to commit c5c311d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix operand UDT detection direction in early-exit check

The condition checks fromStdType on operands to decide whether conversion is needed,
but normalizeImplementation also uses fromStdType on operand types to wrap inputs.
However, the early-exit condition uses fromStdType (standard → UDT direction) while
the intent for operand conversion is to convert standard Calcite types back to UDT
values for the UDF. If the operand check is meant to detect UDT operands that need
conversion, it should use fromUdtType instead of fromStdType, otherwise UDT-typed
operands will be silently skipped.

api/src/main/java/org/opensearch/sql/api/udt/UdtNormalizer.java [59-64]

 Optional<UdtMapping> returnType = UdtMapping.fromUdtType(visited.getType());
 if (returnType.isEmpty()
     && visited.getOperands().stream()
-        .noneMatch(op -> UdtMapping.fromStdType(op.getType()).isPresent())) {
+        .noneMatch(op -> UdtMapping.fromUdtType(op.getType()).isPresent())) {
   return visited;
 }
Suggestion importance[1-10]: 7

__

Why: The early-exit condition uses fromStdType on operands to detect if conversion is needed, but normalizeImplementation also uses fromStdType to convert operands. The suggestion argues that UDT-typed operands should be detected with fromUdtType instead. However, looking at the design more carefully: the normalizer converts UDT-returning UDFs to standard types, and operands that are standard Calcite types (DATE, TIME, TIMESTAMP) need to be converted back to UDT strings for the UDF. The fromStdType check on operands seems intentional for detecting standard-typed operands that need UDT conversion. The suggestion may be incorrect about the direction, but raises a valid concern worth investigating.

Medium
Guard against unsafe cast of operand type checker

The cast (UDFOperandMetadata) udf.getOperandTypeChecker() will throw a
ClassCastException at runtime if the original UDF's operand type checker is not an
instance of UDFOperandMetadata. A guard check or safe cast should be used to avoid
unexpected failures.

api/src/main/java/org/opensearch/sql/api/udt/UdtNormalizer.java [78-80]

 @Override
 public UDFOperandMetadata getOperandMetadata() {
-  return (UDFOperandMetadata) udf.getOperandTypeChecker();
+  if (udf.getOperandTypeChecker() instanceof UDFOperandMetadata meta) {
+    return meta;
+  }
+  throw new IllegalStateException(
+      "Expected UDFOperandMetadata but got: " + udf.getOperandTypeChecker());
 }
Suggestion importance[1-10]: 5

__

Why: The unchecked cast (UDFOperandMetadata) udf.getOperandTypeChecker() could throw a ClassCastException at runtime if the UDF's operand type checker is not a UDFOperandMetadata. The improved code provides a safer pattern with a meaningful error message, though the context suggests this path is only reached for ImplementorUDFFunction UDFs which should always have UDFOperandMetadata.

Low
General
Use current call's operand types in implementation lambda

The lambda captures originalCall to get operand types, but at implementation time
rexCall (the normalized call) is available and its operands reflect the actual types
being processed. Using originalCall operand types is fragile if the shuttle has
already rewritten nested calls. Additionally, fromStdValue converts a standard value
to a UDT string, but the UDF likely expects UDT string inputs — verify that
fromStdType is the correct lookup direction here (it maps standard types, not UDT
types), as it may never match UDT-typed operands.

api/src/main/java/org/opensearch/sql/api/udt/UdtNormalizer.java [94-103]

 private NotNullImplementor normalizeImplementation(
     RexCall originalCall, ImplementorUDFFunction func, Optional<UdtMapping> returnUdt) {
   return (translator, rexCall, operands) -> {
     List<Expression> converted = new ArrayList<>(operands.size());
     for (int i = 0; i < operands.size(); i++) {
       Expression operand = operands.get(i);
-      RelDataType opType = originalCall.getOperands().get(i).getType();
+      RelDataType opType = rexCall.getOperands().get(i).getType();
       converted.add(
           UdtMapping.fromStdType(opType).map(u -> u.fromStdValue(operand)).orElse(operand));
     }
Suggestion importance[1-10]: 5

__

Why: Using originalCall operand types in the lambda is potentially fragile since rexCall is available at implementation time. However, the originalCall is captured specifically to preserve the pre-normalization types, which may be intentional since the normalized call's operand types may have changed. The suggestion is reasonable but may not be strictly necessary depending on the design intent.

Low

@dai-chen dai-chen force-pushed the poc/datetime-late-binding branch from c5c311d to 4bc4176 Compare April 15, 2026 20:17
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 4bc4176

@dai-chen dai-chen added the PPL Piped processing language label Apr 16, 2026
Bridge the type mismatch between PPL UDT types (String-based) and standard
Calcite types (int/long-based) in the unified query API path by contributing
two post-analysis rules to PPL's LanguageSpec:

1. DatetimeUdtNormalizeRule rewrites UDT calls — replaces UDT return types
   with standard Calcite types and wraps UDF implementors to convert values
   at input (int/long -> String) and output (String -> int/long).
2. DatetimeUdtOutputCastRule wraps the plan root with a projection that
   casts remaining datetime output columns to VARCHAR so the wire format
   matches PPL's String datetime contract.

Both rules are registered via DatetimeUdtExtension, which encapsulates the
ordering invariant (normalize before cast). The extension plugs into the
LanguageExtension mechanism introduced in opensearch-project#5360 via a new postAnalysisRules
hook, applied once at the top of UnifiedQueryPlanner.plan() after the
language-specific strategy returns.

Applied only on the PPL path; zero impact on the SQL or OpenSearch plugin
paths.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the poc/datetime-late-binding branch from 4bc4176 to 65c549f Compare April 17, 2026 21:40
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 65c549f

Add DatetimeUdtLiteralCoercionRule as a post-analysis rule that wraps
VARCHAR operands with CAST(... AS <datetime>) inside comparisons, IN,
SEARCH, BETWEEN, and COALESCE when the call has a standard Calcite
DATE/TIME/TIMESTAMP operand alongside. This closes the gap left by
DatetimeUdtNormalizeRule, which only rewrites operators backed by
ImplementableUDFunction.

The rule only modifies operand subtrees inside RexCall nodes; no
RelNode rowType or RexInputRef slot identity is altered, so Calcite's
cached RexInputRef types cannot be invalidated (unlike an in-place
ref rewrite).

Registered first in the extension's rule list so normalize and
output-cast see homogeneous types downstream.

Known scope limits: IN and BETWEEN with cross-type value lists are
rejected by CalciteRexNodeVisitor before any post-analysis rule can
run, and datetime+interval arithmetic requires a separate function
signature registration; both are out of scope for this rule.

Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen reopened this Apr 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9887cf5

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9887cf5

@dai-chen dai-chen closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugFix PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Unified query API fails with date types

1 participant