-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6778] SOME rewrite for correlated queries does not handle null values correctly #4144
base: main
Are you sure you want to change the base?
Conversation
@@ -405,8 +405,9 @@ private static RexNode rewriteSome(RexSubQuery e, Set<CorrelationId> variablesSe | |||
// from emp group by name) as q on e.name = q.name | |||
builder.push(e.rel) | |||
.aggregate(builder.groupKey(), | |||
builder.count(true, "c"), | |||
builder.count(true, "d", builder.field(0)), | |||
builder.count(false, "c"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the comment above describing the rewrite accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, forgot to update it. nice catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this enough? where is dd used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now it's fully updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NobiGo did you happen to author this code?
Then perhaps you are the most suitable person to approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these would be very nice comments to add in the code.
641299e
to
398e761
Compare
@@ -392,21 +392,22 @@ private static RexNode rewriteSome(RexSubQuery e, Set<CorrelationId> variablesSe | |||
// then false // sub-query is empty for corresponding corr value | |||
// when q.c = 0 then false // sub-query is empty | |||
// when e.deptno is null then unknown | |||
// when q.c <> q.d && q.d <= 1 | |||
// when q.c <> q.d && q.dd <= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested it, but should it bewhen q.d <> q.dd && q.d <= 1
here?
I guess what is trying to be expressed here is the scenario when deptno is null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the q.c <> q.d
comparison identifies if there are any null values, since count(*)
counts null values and count(deptno)
does not.
the q.dd <= 1
part (previously q.d <= 1
) remains the with the same semantics (true if there is a single non-null value), since in this patch dd
is count(distinct deptno)
, which is equivalent to d
in the main branch.
so this when
clause means "if there are any null values and there is a single non-null value".
in main
, c = count(distinct *)
and d = count(distinct deptno)
, which means that when there are null values and a single non-null value, c = d
and this clause evaluates to false.
I know it's a tricky case, so let me know if additional clarifications are needed 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well explained. A little problem
q.dd <= 1: is true if there is a single non-null value
should indicate that there is at most one non-null value. I think it can be true when there is no null value. WDYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, you're right, it means there is at most one non-null value.
if there's no null value, c
should be equal to d
, right? I can't think of a case with no null value that yield c <> d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, these are very nice comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. The logic here is a bit complicated, you can add corresponding code comments @racevedoo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added the comments. IMO the whole context would be too long to fit in a comment, so I wrote a summary.
People can always refer to this PR to get a more complete understanding. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the summary you have written?
I think it's always best to have the code be self-explanatory.
Once someone makes any edit to this code finding the information residing in other tools like git will become harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's in the case statement comment, but I'll write the additional details.
398e761
to
e201bfd
Compare
e201bfd
to
5b22225
Compare
Can you please rebase this on main to resolve the conflicts? |
sure, I'll do it later today/tomorrow |
…ll values correctly
5b22225
to
4b002e9
Compare
Quality Gate passedIssues Measures |
just rebased. also added the detailed comments |
No description provided.