Skip to content

Datetime output cast format diverges between Calcite and DataFusion engines #5420

@ahkcs

Description

@ahkcs

Description

The DatetimeOutputCastRule introduced in #5408 wraps datetime
output columns in CAST(... AS VARCHAR) so PPL responses are
rendered as ISO strings. The string produced by that cast is
engine-dependent:

  • PPL Calcite path (UnifiedQueryCompiler) — emits ANSI SQL
    format 2024-01-15 12:00:00 (space separator, no T),
    consistent with SparkSQL, PostgreSQL, MySQL, Oracle, and
    SQL Server.
  • Analytics-engine path (DataFusion native runtime) — emits
    ISO 8601 format 2024-01-15T12:00:00 (with T separator).

This means the same PPL query against the same data returns
two different string formats depending on which execution
engine handled it, breaking the wire-format contract that
#5408 set out to enforce.

Reproduction

source=events | fields created_at
  • Calcite engine → "2024-01-15 12:00:00"
  • DataFusion engine → "2024-01-15T12:00:00"

Context

Called out as Note 2 of #5408. Filed as a follow-up so the
divergence is tracked outside that already-merged PR.

Possible directions

  1. Replace the implicit CAST(... AS VARCHAR) with an explicit
    DATE_FORMAT(..., '<pattern>') (or to_char /
    format_datetime) that pins the wire format on both engines.
  2. Push the formatting into the response formatter rather than
    the logical plan, so engine-side cast semantics never leak
    to the wire.
  3. Add a DataFusion-side cast adapter that emits the ANSI SQL
    string format to match Calcite.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions