Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6778] SOME rewrite for correlated queries does not handle null values correctly #4144

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

racevedoo
Copy link

No description provided.

@@ -405,8 +405,9 @@ private static RexNode rewriteSome(RexSubQuery e, Set<CorrelationId> variablesSe
// from emp group by name) as q on e.name = q.name
builder.push(e.rel)
.aggregate(builder.groupKey(),
builder.count(true, "c"),
builder.count(true, "d", builder.field(0)),
builder.count(false, "c"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the comment above describing the rewrite accurate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, forgot to update it. nice catch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this enough? where is dd used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it's fully updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NobiGo did you happen to author this code?
Then perhaps you are the most suitable person to approve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these would be very nice comments to add in the code.

@racevedoo racevedoo force-pushed the fix-calcite-6778 branch 2 times, most recently from 641299e to 398e761 Compare January 13, 2025 19:30
@@ -392,21 +392,22 @@ private static RexNode rewriteSome(RexSubQuery e, Set<CorrelationId> variablesSe
// then false // sub-query is empty for corresponding corr value
// when q.c = 0 then false // sub-query is empty
// when e.deptno is null then unknown
// when q.c <> q.d && q.d <= 1
// when q.c <> q.d && q.dd <= 1
Copy link
Contributor

@suibianwanwank suibianwanwank Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it, but should it bewhen q.d <> q.dd && q.d <= 1 here?
I guess what is trying to be expressed here is the scenario when deptno is null

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the q.c <> q.d comparison identifies if there are any null values, since count(*) counts null values and count(deptno) does not.
the q.dd <= 1 part (previously q.d <= 1) remains the with the same semantics (true if there is a single non-null value), since in this patch dd is count(distinct deptno), which is equivalent to d in the main branch.

so this when clause means "if there are any null values and there is a single non-null value".

in main, c = count(distinct *) and d = count(distinct deptno), which means that when there are null values and a single non-null value, c = d and this clause evaluates to false.

I know it's a tricky case, so let me know if additional clarifications are needed 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well explained. A little problem

q.dd <= 1: is true if there is a single non-null value

should indicate that there is at most one non-null value. I think it can be true when there is no null value. WDYT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, you're right, it means there is at most one non-null value.

if there's no null value, c should be equal to d, right? I can't think of a case with no null value that yield c <> d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that looks good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, these are very nice comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The logic here is a bit complicated, you can add corresponding code comments @racevedoo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added the comments. IMO the whole context would be too long to fit in a comment, so I wrote a summary.

People can always refer to this PR to get a more complete understanding. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the summary you have written?
I think it's always best to have the code be self-explanatory.
Once someone makes any edit to this code finding the information residing in other tools like git will become harder.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's in the case statement comment, but I'll write the additional details.

@racevedoo racevedoo requested a review from NobiGo January 15, 2025 10:47
@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Jan 16, 2025
@mihaibudiu
Copy link
Contributor

Can you please rebase this on main to resolve the conflicts?

@racevedoo
Copy link
Author

Can you please rebase this on main to resolve the conflicts?

sure, I'll do it later today/tomorrow

@racevedoo
Copy link
Author

just rebased. also added the detailed comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants