-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fuse CASE(a > 0, b / a)
#14200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuse CASE(a > 0, b / a)
#14200
Conversation
This fuses the expression `CASE(a > 0, b / a)` into a single expressions that skips the partial evaluation required by the other implementations of `CASE`.
@@ -164,3 +166,177 @@ pub fn concat_elements_utf8view( | |||
} | |||
Ok(result.finish()) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#11570 talks about implementing a new kernel for the divide-if-gt-zero operation and I took a stab at it but realized it was pretty involved. Or maybe I'm not reading the right bits. It looks to me like the existing kernels for binary operations in arrow-rs
don't support Option
as a return. I've left this here to illustrate the kind of things that I think have to be added to make a whole new kernel for this. Though arrow-rs
is a big project and I'm probably missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If y'all this is the right way and it's worth it I can try to finish this, but it feels well outside the bounds of a good-first-issue
. Not that the code is hard, it's just that I don't know the benchmarking tools y'all have to see how it's performing.
/// | ||
/// It is executed as though `x / NULL_IF(y <= 0, y)` which is safe because arrow-rs | ||
/// skips evaluating fallible functions if one of their inputs is null. | ||
fn div_gt_0(&self, batch: &RecordBatch) -> Result<ColumnarValue> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If y'all like this path I'm happy to continue down it, adding the missing tests and removing the rough edges. I'm certainly plugging things in wrong, though the one test case we have now is passing.
when_then.1.evaluate(batch) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sort of wondering if it makes sense to just emit NULLIF
directly from try_new
. It feels awful heavy to have to pick apart the CASE arms at runtime.
div, | ||
) | ||
} | ||
ColumnarValue::Scalar(mask) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we only get here if the divisor is a literal? If so, we really don't need this branch at all - we just won't end up here.
Is there a constant folding phase that I'm dodging in these tests?
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
This fuses the expression
CASE(a > 0, b / a)
into a single expressions that skips the partial evaluation required by the other implementations ofCASE
.Which issue does this PR close?
Closes #11570
Rationale for this change
See issue.
What changes are included in this PR?
Two proposed solutions - one via
NULLIF
and another a hand rolled kernel for the entire operation.Are these changes tested?
The existing tests hit the new code path and I've added an assertion to confirm it. I've left NOCOMMITs for the two or three new tests that I'll need to write before merging this.
Are there any user-facing changes?
None.