Skip to content

Conversation

ZorinAnton
Copy link
Contributor

@ZorinAnton ZorinAnton commented Sep 11, 2025

Minimal reproducing query:

select charcol from (
  select charcol, count(*) from (values('a')) as t(charcol) group by rollup(charcol)
)

If this query gets converted to subtrait proto and then to substrait pojo, the latter looks as follows:

Plan{version=Version{major=0, minor=74, patch=0, producer=isthmus}, roots=[Root{input=Project{remap=Remap{indices=[2]}, input=Aggregate{input=VirtualTableScan{initialSchema=NamedStruct{struct=Struct{nullable=false, fields=[FixedChar{nullable=false, length=1}]}, names=[CHARCOL]}, rows=[StructLiteral{nullable=false, fields=[FixedCharLiteral{nullable=false, value=a}]}]}, groupings=[Grouping{expressions=[FieldReference{segments=[StructField{offset=0}], type=FixedChar{nullable=false, length=1}}]}, Grouping{expressions=[]}], measures=[Measure{function=AggregateFunctionInvocation{declaration=count:, arguments=[], options=[], aggregationPhase=INITIAL_TO_RESULT, sort=[], outputType=I64{nullable=false}, invocation=ALL}}]}, expressions=[FieldReference{segments=[StructField{offset=0}], type=FixedChar{nullable=false, length=1}}]}, names=[CHARCOL]}], expectedTypeUrls=[]}

The ROLLUP function makes grouping columns nullable. This was causing a Calcite validation error because the outer projection's CHARCOL column was incorrectly defined as non-nullable. This fix corrects the column type, allowing the conversion to succeed.

Plan{version=Version{major=0, minor=74, patch=0, producer=isthmus}, roots=[Root{input=Project{remap=Remap{indices=[2]}, input=Aggregate{input=VirtualTableScan{initialSchema=NamedStruct{struct=Struct{nullable=false, fields=[FixedChar{nullable=false, length=1}]}, names=[CHARCOL]}, rows=[StructLiteral{nullable=false, fields=[FixedCharLiteral{nullable=false, value=a}]}]}, groupings=[Grouping{expressions=[FieldReference{segments=[StructField{offset=0}], type=FixedChar{nullable=false, length=1}}]}, Grouping{expressions=[]}], measures=[Measure{function=AggregateFunctionInvocation{declaration=count:, arguments=[], options=[], aggregationPhase=INITIAL_TO_RESULT, sort=[], outputType=I64{nullable=false}, invocation=ALL}}]}, expressions=[FieldReference{segments=[StructField{offset=0}], type=FixedChar{nullable=true, length=1}}]}, names=[CHARCOL]}], expectedTypeUrls=[]}

@ZorinAnton ZorinAnton marked this pull request as draft September 11, 2025 11:09
@vbarua vbarua self-requested a review September 11, 2025 15:40
@ZorinAnton ZorinAnton marked this pull request as ready for review September 12, 2025 15:55
Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw your new update from this morning. So this issue is associated with the presence of multiple grouping set? It would be helpful to have a direct/minimal reproducer for this to be able to understand what exactly the issue is. The TPCDS test hits the issue, but it doesn't really help identify it.

We should also update the title indicate the issue that's being fixed. Fixing TCPDs 67 is a side-effect of fixing an actual underlying issue.

@ZorinAnton ZorinAnton changed the title fix tpcds query 67 fix(isthmus) : make arguments of rollup function nullable in substrait proto->Rel conversion Sep 13, 2025
@ZorinAnton
Copy link
Contributor Author

I've updated the PR title and commit message from "fix tpcds query 67" to be more descriptive.

I've added the new test case to the Substrait2SqlTest class. I wasn't entirely sure this was the right spot, as the class name is a bit confusing (it seems to cover a full sql<->substrait roundtrip). However, since it already contains tests for various SQL clauses, I thought adding the rollup test here would be consistent.

Please let me know if you think there's a better place for this test.

@ZorinAnton ZorinAnton requested a review from vbarua September 13, 2025 14:42
@vbarua
Copy link
Member

vbarua commented Sep 15, 2025

Thanks for the reproducer for this. I'm digging into this a bit because there's a general pattern of bugs we've also noticed due to disagreement between Calcite and Substrait as to nullabilities.

@vbarua
Copy link
Member

vbarua commented Sep 23, 2025

Haven't found time to dig into this, but I figured I would update on why I haven't merged. Effectively, tweaking the return type like you have isn't really in keeping with the spec for AggregateRel, which as defined now should output all inputs types as given without tweaking nullabilities. This isn't probably quite correct, and the issue arises when Calcite and Substrait have different expectations around the nullabilities of the columns. Calcite is probably correct in this case, but I think it's worth understanding exactly why it's wrong to update the core spec and make our AggregateRel more correct.

Variants of this have arisen before #336, and there is actually an issue to look into this further #379.

@ZorinAnton
Copy link
Contributor Author

AggregateRel, which as defined now should output all inputs types as given without tweaking nullabilities.

I couldn't find that statement in the spec. However, the logic applied in the PR was a bit different than in the spec, that says

The values for the grouping expression columns that are not part of the grouping set for a particular record will be set to null.

Instead, all grouping columns were made nullable in case of more than one grouping sets. I have fixed the logic according to the spec. This change doesn't affect the ROLLUP function (the target of this PR) because ROLLUP generates grouping sets in a way that no single grouping column appears in all of them.

ZorinAnton and others added 9 commits September 23, 2025 11:44
Replace the deprecated com.palantir.graal plugin, which was last
released in June 2022 and is no longer maintained.

Signed-off-by: Mark S. Lewis <[email protected]>
Alternate forms for TPC-DS queries 27, 36, 70 and 86. These forms
rewrite queries that use the GROUPING aggregate function, which does not
have a direct Substrait equivalent. These test cases now pass.

Signed-off-by: Mark S. Lewis <[email protected]>
There is an info level log in FunctionConverter for every every Calcite
operator for which there is no direct Substrait mapping. This message
makes the logs extremely busy and provides no value to end users. This
change reclassifies the message as debug log.

Signed-off-by: Mark S. Lewis <[email protected]>
substrait-io#517)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Benefits from Java 25 performance optimizations.

Signed-off-by: Mark S. Lewis <[email protected]>
@vbarua
Copy link
Member

vbarua commented Sep 23, 2025

Oh, you're right we do actually have something in the spec for grouping sets:

The columns for grouping expressions that do not appear in all grouping sets will be nullable (regardless of the nullability of the type returned by the grouping expression) to accomodate the null insertion.

and

To further disambiguate which record belongs to which grouping set, an aggregate relation with more than one grouping set receives an extra i32 column on the right-hand side. The value of this field will be the zero-based index of the grouping set that yielded the record.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants