You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a result, the current function implementation must have an override for each parameter type it supports, even when they could be implicitly cast. For example, a function foo that accepts a date as an argument should also accept "2020-12-10" as an argument. However, without implicit type coercion, we have to define both foo(date) and foo(string).
How Calcite Implements Implicit Type Coercion
Calcite checks operand types and attempts to coerce them if necessary during the validation phase.
For example, in FamilyOperandTypeChecker, when an argument type doesn't match expectations, Calcite tries to determine if the argument can be coerced to the expected type. The simplified code snippet below illustrates this process:
publicbooleancheckOperandTypes(
SqlCallBindingcallBinding,
booleanthrowOnFailure){
...
if (!checkSingleOperandType(callBinding, op.e, op.i, families.get(op.i), false)) {
// try to coerce type if it is allowed.booleancoerced = false;
if (callBinding.isTypeCoercionEnabled()) {
TypeCoerciontypeCoercion = callBinding.getValidator().getTypeCoercion();
ImmutableList.Builder<RelDataType> builder = ImmutableList.builder();
for (inti = 0; i < callBinding.getOperandCount(); i++) {
builder.add(callBinding.getOperandType(i));
}
ImmutableList<RelDataType> dataTypes = builder.build();
coerced = typeCoercion.builtinFunctionCoercion(callBinding, dataTypes, families);
}
}
In this process, builtinFunctionCoercion calls coerceOperandType to rewrite the operands when needed. The coerceOperandType method rewrites the query by wrapping the argument with a CAST call:
To determine whether an argument can be cast to the target type, Calcite defines rules in SqlTypeCoercionRule and SqlTypeAssignmentRule. Additionally, the TypeCoercion interface defines how a type should be cast to another.
Challenges in Applying Calcite's Type Coercion
The validation phase of Calcite SQL validates a parsed SqlNode AST. It resolves identifiers, derives return types, validates argument types, etc. However, since we created a new language interface outside of SQL, this phase isn't directly applicable and is skipped in our current implementation.
As a result, we can't directly utilize Calcite's type coercion mechanism.
Proposal
We faced a similar issue with function argument type checking. We implemented our own argument validation initially. However, later we discovered that other language interfaces—like calcite-piglet—successfully leverage Calcite's type validation.Update: with a deeper inspection, I found that piglet actually uses its own type check, without re-using Calcite's type checking mechanism at all.
Calcite-piglet translates its logical plans into a SQL AST (represented with SqlNode) to reuse Calcite's type checking. It first constructs its own AST via a custom parser, then converts each node into a corresponding SqlNode. This approach allows it to access Calcite's standard validation flow and leverage its built-in type checking. The key implementation is in PigRelToSqlConverter.
I propose we adopt a similar approach by converting PPL to SqlNode. However, we could still try doing so. This should eliminate many future issues by reusing more of Calcite's functionality.
My concern over this proposal
With this approach, we re-introduce the SqlNode layer. This contradicts our previous design, which skips the layer.
flowchart LR
PPL -->|parse| AST -->|convert| RelNode1[RelNode] -->|optimize| RelNode2[RelNode]
Loading
With this proposal (the blocks in blue are added phases)
flowchart LR
PPL -->|parse| AST -->|convert| RelNode1[RelNode] -->|convert| SqlNode1[SqlNode]:::blue -->|validate| SqlNode2[SqlNode]:::blue -->|convert| RelNode2[RelNode]:::blue -->|optimize| RelNode3[RelNode]
classDef blue fill:#2196F3,stroke:#0D47A1,color:white
Loading
It's like converting PPL to SQL for validation purpose, making people doubting why is the SqlNode phase skipped at the first place.
Approach
This solution requires the following steps:
Convert PPL's logical plan to SqlNode by creating a similar PplRelToSqlConverter
The handling of UDT needs more investigation
Refactor the current argument type validation with the converted SQL AST
Implement function type coercion based on this
Long-Term Goals
Automatic coercion makes defining functions simpler and easier to maintain.
Make use of more Calcite's validation mechanism to reduce maintenance effort and make the implementation more robust.
Alternative
flowchart LR
PPL -->|parse| AST -->|convert & cast| RelNode1[RelNode]:::blue -->|optimize| RelNode3[RelNode]
classDef blue fill:#2196F3,stroke:#0D47A1,color:white
Loading
Make coercion directly at the RelNode level
Merits: Aligns with our existing design
Potential drawbacks: We make less use of what's already there in Calcite. It may leads to more investment in validation in the future. For example, we could have eliminated the effort to implement type checkers if we introduced the SqlNode layer.
Problem Statement
Implicit type coercion is missing from the current PPL validation. It is useful in at least two scenarios:
Relevant issue: #3761
Status update:
SqlNode(Update: picked up with Use Calcite's validation system for type checking & coercion #4892)Current State
As a result, the current function implementation must have an override for each parameter type it supports, even when they could be implicitly cast. For example, a function
foothat accepts a date as an argument should also accept"2020-12-10"as an argument. However, without implicit type coercion, we have to define bothfoo(date)andfoo(string).How Calcite Implements Implicit Type Coercion
Calcite checks operand types and attempts to coerce them if necessary during the validation phase.
For example, in FamilyOperandTypeChecker, when an argument type doesn't match expectations, Calcite tries to determine if the argument can be coerced to the expected type. The simplified code snippet below illustrates this process:
In this process,
builtinFunctionCoercioncallscoerceOperandTypeto rewrite the operands when needed. The coerceOperandType method rewrites the query by wrapping the argument with aCASTcall:To determine whether an argument can be cast to the target type, Calcite defines rules in
SqlTypeCoercionRuleandSqlTypeAssignmentRule. Additionally, theTypeCoercioninterface defines how a type should be cast to another.Challenges in Applying Calcite's Type Coercion
The validation phase of Calcite SQL validates a parsed
SqlNodeAST. It resolves identifiers, derives return types, validates argument types, etc. However, since we created a new language interface outside of SQL, this phase isn't directly applicable and is skipped in our current implementation.As a result, we can't directly utilize Calcite's type coercion mechanism.
Proposal
We faced a similar issue with function argument type checking. We implemented our own argument validation initially.
However, later we discovered that other language interfaces—like calcite-piglet—successfully leverage Calcite's type validation.Update: with a deeper inspection, I found that piglet actually uses its own type check, without re-using Calcite's type checking mechanism at all.Calcite-piglet translates its logical plans into a SQL AST (represented with
SqlNode) to reuse Calcite's type checking. It first constructs its own AST via a custom parser, then converts each node into a correspondingSqlNode.This approach allows it to access Calcite's standard validation flow and leverage its built-in type checking. The key implementation is in PigRelToSqlConverter.I propose we adopt a similar approach by converting PPL toHowever, we could still try doing so. This should eliminate many future issues by reusing more of Calcite's functionality.SqlNode.My concern over this proposal
With this approach, we re-introduce the
SqlNodelayer. This contradicts our previous design, which skips the layer.It's like converting PPL to SQL for validation purpose, making people doubting why is the SqlNode phase skipped at the first place.
Approach
This solution requires the following steps:
PplRelToSqlConverterLong-Term Goals
Alternative
RelNodelevelSqlNodelayer.Implementation Discussion