Skip to content

Normalize datetime types for unified query API#5408

Draft
dai-chen wants to merge 1 commit intoopensearch-project:mainfrom
dai-chen:feature/ppl-udt-cleanup
Draft

Normalize datetime types for unified query API#5408
dai-chen wants to merge 1 commit intoopensearch-project:mainfrom
dai-chen:feature/ppl-udt-cleanup

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen commented May 6, 2026

Description

Add postAnalysisRules to LanguageSpec.LanguageExtension and register DatetimeExtension in UnifiedPplSpec with two post-analysis rules:

  1. DatetimeUdtNormalizeRule: Rewrites PPL datetime UDT return types (EXPR_DATE/TIME/TIMESTAMP) on RexCall nodes to standard Calcite types, enabling downstream consumers to process the plan without UDT-related failures.
  2. DatetimeOutputCastRule: Adds a final LogicalProject that casts standard datetime output columns to VARCHAR, aligning with PPL's wire-format contract (ISO string representation).

Examples

UDT Normalize Rule — source=events | eval d = DATE(name) | fields d

 Before:  LogicalProject(d=[DATE($1):EXPR_DATE])
 After:   LogicalProject(d=[DATE($1):DATE])

Output Cast Rule — source=events | fields hire_date, start_time, created_at

 Before:  LogicalProject(hire_date=[$2], start_time=[$3], created_at=[$4])
            LogicalTableScan(table=[[events]])

 After:   LogicalProject(hire_date=[CAST($0):VARCHAR], start_time=[CAST($1):VARCHAR], created_at=[CAST($2):VARCHAR])
            LogicalProject(hire_date=[$2], start_time=[$3], created_at=[$4])
              LogicalTableScan(table=[[events]])

Notes

This PR partially resolves #5250. After the UDT normalize rule rewrites datetime function signatures to standard Calcite types, a mismatch exists between the RexCall return type (now standard DATE/TIME/TIMESTAMP) and the underlying PPL UDF implementation (which still expects and produces String values). This does not affect the Analytics Engine path (which reimplements UDFs in DataFusion), but the UnifiedQueryCompiler path will still fail for queries involving datetime UDFs. A follow-up PR will replace/wrap the function implementations properly.

Related Issues

Part of #5250

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…uery API

Add postAnalysisRules (List<RelShuttle>) to LanguageSpec.LanguageExtension
and register DatetimeExtension in UnifiedPplSpec with two rules:

1. DatetimeUdtNormalizeRule rewrites datetime UDT return types
   (EXPR_DATE/TIME/TIMESTAMP) on RexCall nodes to standard Calcite
   DATE/TIME/TIMESTAMP types via call.clone().

2. DatetimeOutputCastRule adds a final LogicalProject that casts
   standard datetime output columns to VARCHAR, aligning with PPL's
   wire-format contract (ISO string representation).

Both rules run as postAnalysisRules after the planning strategy produces
the RelNode, applied uniformly to both SQL and PPL paths.

No changes to core/ module — UDF definitions and implementors are
untouched. The mismatch between rewritten signatures and UDF
implementations is a known limitation addressed separately.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen self-assigned this May 6, 2026
@dai-chen dai-chen added PPL Piped processing language bugFix labels May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Add postAnalysisRules extension point to LanguageSpec and apply in planner

Relevant files:

  • api/src/main/java/org/opensearch/sql/api/spec/LanguageSpec.java
  • api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java

Sub-PR theme: Implement and register DatetimeExtension with UDT normalize and output cast rules

Relevant files:

  • api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeExtension.java
  • api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java
  • api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeOutputCastRule.java
  • api/src/main/java/org/opensearch/sql/api/spec/UnifiedPplSpec.java
  • api/src/test/java/org/opensearch/sql/api/spec/datetime/DatetimeExtensionTest.java

⚡ Recommended focus areas for review

Shallow Traversal

DatetimeOutputCastRule.visit(RelNode other) only wraps the single node it visits and does not recurse into child nodes via super.visit(other). This means only the root node of the plan gets the cast projection applied. If the root is not a datetime-producing node but a child is, the cast will be silently skipped. Compare with DatetimeUdtNormalizeRule which correctly calls super.visit(other) first.

public RelNode visit(RelNode other) {
  List<RelDataTypeField> fields = other.getRowType().getFieldList();
  if (fields.stream().noneMatch(f -> isDatetimeType(f.getType().getSqlTypeName()))) {
    return other;
  }

  RexBuilder rexBuilder = other.getCluster().getRexBuilder();
  List<RexNode> projects = new ArrayList<>(fields.size());
  List<String> names = new ArrayList<>(fields.size());

  for (RelDataTypeField field : fields) {
    RexNode ref = rexBuilder.makeInputRef(other, field.getIndex());
    if (isDatetimeType(field.getType().getSqlTypeName())) {
      projects.add(castToVarchar(rexBuilder, ref, field.getType()));
    } else {
      projects.add(ref);
    }
    names.add(field.getName());
  }
  return LogicalProject.create(other, List.of(), projects, names);
}
Idempotency Risk

If postAnalysisRules() is called more than once (e.g., in retry or re-planning scenarios), DatetimeOutputCastRule will wrap the plan in an additional LogicalProject(CAST → VARCHAR) on each invocation, since VARCHAR columns also satisfy isDatetimeType check — wait, they don't, but the outer CAST project itself has VARCHAR fields. However, if the rule is applied twice, the second application will see VARCHAR fields and skip, so this is safe. Still worth verifying that the rule is truly idempotent under repeated application.

public RelNode visit(RelNode other) {
  List<RelDataTypeField> fields = other.getRowType().getFieldList();
  if (fields.stream().noneMatch(f -> isDatetimeType(f.getType().getSqlTypeName()))) {
    return other;
  }

  RexBuilder rexBuilder = other.getCluster().getRexBuilder();
  List<RexNode> projects = new ArrayList<>(fields.size());
  List<String> names = new ArrayList<>(fields.size());

  for (RelDataTypeField field : fields) {
    RexNode ref = rexBuilder.makeInputRef(other, field.getIndex());
    if (isDatetimeType(field.getType().getSqlTypeName())) {
      projects.add(castToVarchar(rexBuilder, ref, field.getType()));
    } else {
      projects.add(ref);
    }
    names.add(field.getName());
  }
  return LogicalProject.create(other, List.of(), projects, names);
}
Assertion Gap

assertNoUdt only checks RexCall nodes for UDT types but does not check RexInputRef or RexLiteral nodes. If a UDT type leaks through a field reference or literal, the assertion would not catch it. Consider also asserting on RelDataTypeField types in the row type of each visited RelNode.

private static void assertNoUdt(RelNode plan) {
  plan.accept(
      new RelHomogeneousShuttle() {
        @Override
        public RelNode visit(RelNode other) {
          RelNode visited = super.visit(other);
          visited.accept(
              new RexShuttle() {
                @Override
                public RexNode visitCall(RexCall call) {
                  assertFalse(
                      "Found UDT type in RexCall: " + call,
                      call.getType() instanceof AbstractExprRelDataType<?>);
                  return super.visitCall(call);
                }
              });
          return visited;
        }
      });
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Missing recursive traversal of child nodes

The visit method does not call super.visit(other) before processing the node,
meaning child nodes are never recursively visited. This causes the rule to only
apply to the root node and miss datetime fields in nested RelNodes. Call
super.visit(other) first and operate on the returned node, consistent with how
DatetimeUdtNormalizeRule handles this.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeOutputCastRule.java [28-32]

 public RelNode visit(RelNode other) {
-    List<RelDataTypeField> fields = other.getRowType().getFieldList();
+    RelNode visited = super.visit(other);
+    List<RelDataTypeField> fields = visited.getRowType().getFieldList();
     if (fields.stream().noneMatch(f -> isDatetimeType(f.getType().getSqlTypeName()))) {
-      return other;
+      return visited;
     }
Suggestion importance[1-10]: 7

__

Why: The DatetimeOutputCastRule.visit method doesn't call super.visit(other) unlike DatetimeUdtNormalizeRule, meaning child nodes are never recursively visited. However, the rule is designed to wrap the root output with CAST for wire-format compatibility, so it may be intentionally applied only at the top level. The inconsistency with DatetimeUdtNormalizeRule is worth noting, but the impact depends on the intended behavior.

Medium
General
ResultSet resource not closed after use

The ResultSet is never closed after use, which can leak JDBC resources. Wrap
resultSet in a try-with-resources block or close it in a finally block to ensure
proper cleanup.

api/src/test/java/org/opensearch/sql/api/spec/datetime/DatetimeExtensionTest.java [167-174]

-ResultSet resultSet = statement.executeQuery();
-verify(resultSet)
-    .expectSchema(
-        col("hire_date", VARCHAR), col("start_time", VARCHAR), col("created_at", VARCHAR))
-    .expectData(
-        row("2024-01-16", "12:00:00", "2024-01-15 08:00:00"),
-        row("2024-06-20", "14:00:00", "2024-06-20 00:00:00"));
+try (ResultSet resultSet = statement.executeQuery()) {
+  verify(resultSet)
+      .expectSchema(
+          col("hire_date", VARCHAR), col("start_time", VARCHAR), col("created_at", VARCHAR))
+      .expectData(
+          row("2024-01-16", "12:00:00", "2024-01-15 08:00:00"),
+          row("2024-06-20", "14:00:00", "2024-06-20 00:00:00"));
+}
Suggestion importance[1-10]: 5

__

Why: The ResultSet is not closed after use, which can leak JDBC resources. The suggested fix of wrapping in try-with-resources is correct and the improved_code accurately reflects the change, though this is a test file so the impact is limited.

Low
Shared shuttle singletons risk concurrency issues

RelShuttle.accept returns a RelNode, but RelHomogeneousShuttle instances are
stateful — reusing the same INSTANCE singleton across multiple concurrent queries
could cause race conditions if the shuttle holds any mutable state. Consider
instantiating a new shuttle per invocation rather than using shared singletons
(DatetimeUdtNormalizeRule.INSTANCE, DatetimeOutputCastRule.INSTANCE), or ensure the
shuttles are truly stateless.

api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java [66-70]

+RelNode plan = strategy.plan(query);
+for (var shuttle : context.getLangSpec().postAnalysisRules()) {
+  plan = plan.accept(shuttle);
+}
+return plan;
 
-
Suggestion importance[1-10]: 4

__

Why: The concern about singleton INSTANCE shuttles being stateful is valid in principle, but the improved_code is identical to existing_code, making this more of a verification suggestion. The actual fix would need to be in DatetimeExtension.postAnalysisRules() to return new instances, not in UnifiedQueryPlanner.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugFix PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Unified query API fails with date types

1 participant