-
Notifications
You must be signed in to change notification settings - Fork 90
fix(isthmus) : make arguments of rollup function nullable in substrait proto->Rel conversion #508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1a9fd66
to
c7ece87
Compare
c7ece87
to
c48a180
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw your new update from this morning. So this issue is associated with the presence of multiple grouping set? It would be helpful to have a direct/minimal reproducer for this to be able to understand what exactly the issue is. The TPCDS test hits the issue, but it doesn't really help identify it.
We should also update the title indicate the issue that's being fixed. Fixing TCPDs 67 is a side-effect of fixing an actual underlying issue.
c48a180
to
62f552d
Compare
I've updated the PR title and commit message from "fix tpcds query 67" to be more descriptive. I've added the new test case to the Substrait2SqlTest class. I wasn't entirely sure this was the right spot, as the class name is a bit confusing (it seems to cover a full sql<->substrait roundtrip). However, since it already contains tests for various SQL clauses, I thought adding the rollup test here would be consistent. Please let me know if you think there's a better place for this test. |
Thanks for the reproducer for this. I'm digging into this a bit because there's a general pattern of bugs we've also noticed due to disagreement between Calcite and Substrait as to nullabilities. |
Haven't found time to dig into this, but I figured I would update on why I haven't merged. Effectively, tweaking the return type like you have isn't really in keeping with the spec for AggregateRel, which as defined now should output all inputs types as given without tweaking nullabilities. This isn't probably quite correct, and the issue arises when Calcite and Substrait have different expectations around the nullabilities of the columns. Calcite is probably correct in this case, but I think it's worth understanding exactly why it's wrong to update the core spec and make our AggregateRel more correct. Variants of this have arisen before #336, and there is actually an issue to look into this further #379. |
I couldn't find that statement in the spec. However, the logic applied in the PR was a bit different than in the spec, that says
Instead, all grouping columns were made nullable in case of more than one grouping sets. I have fixed the logic according to the spec. This change doesn't affect the ROLLUP function (the target of this PR) because ROLLUP generates grouping sets in a way that no single grouping column appears in all of them. |
Replace the deprecated com.palantir.graal plugin, which was last released in June 2022 and is no longer maintained. Signed-off-by: Mark S. Lewis <[email protected]>
Alternate forms for TPC-DS queries 27, 36, 70 and 86. These forms rewrite queries that use the GROUPING aggregate function, which does not have a direct Substrait equivalent. These test cases now pass. Signed-off-by: Mark S. Lewis <[email protected]>
Signed-off-by: Mark S. Lewis <[email protected]>
There is an info level log in FunctionConverter for every every Calcite operator for which there is no direct Substrait mapping. This message makes the logs extremely busy and provides no value to end users. This change reclassifies the message as debug log. Signed-off-by: Mark S. Lewis <[email protected]>
substrait-io#517) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Benefits from Java 25 performance optimizations. Signed-off-by: Mark S. Lewis <[email protected]>
5c6f19b
to
00e7786
Compare
Oh, you're right we do actually have something in the spec for grouping sets:
and
|
Minimal reproducing query:
If this query gets converted to
subtrait proto
and then tosubstrait pojo
, the latter looks as follows:The ROLLUP function makes grouping columns nullable. This was causing a Calcite validation error because the outer projection's CHARCOL column was incorrectly defined as non-nullable. This fix corrects the column type, allowing the conversion to succeed.