Skip to content

Fuse CASE(a > 0, b / a) #14200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Fuse CASE(a > 0, b / a) #14200

wants to merge 5 commits into from

Conversation

nik9000
Copy link

@nik9000 nik9000 commented Jan 19, 2025

This fuses the expression CASE(a > 0, b / a) into a single expressions that skips the partial evaluation required by the other implementations of CASE.

Which issue does this PR close?

Closes #11570

Rationale for this change

See issue.

What changes are included in this PR?

Two proposed solutions - one via NULLIF and another a hand rolled kernel for the entire operation.

Are these changes tested?

The existing tests hit the new code path and I've added an assertion to confirm it. I've left NOCOMMITs for the two or three new tests that I'll need to write before merging this.

Are there any user-facing changes?

None.

This fuses the expression `CASE(a > 0, b / a)` into a single expressions
that skips the partial evaluation required by the other implementations
of `CASE`.
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jan 19, 2025
@@ -164,3 +166,177 @@ pub fn concat_elements_utf8view(
}
Ok(result.finish())
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11570 talks about implementing a new kernel for the divide-if-gt-zero operation and I took a stab at it but realized it was pretty involved. Or maybe I'm not reading the right bits. It looks to me like the existing kernels for binary operations in arrow-rs don't support Option as a return. I've left this here to illustrate the kind of things that I think have to be added to make a whole new kernel for this. Though arrow-rs is a big project and I'm probably missing something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If y'all this is the right way and it's worth it I can try to finish this, but it feels well outside the bounds of a good-first-issue. Not that the code is hard, it's just that I don't know the benchmarking tools y'all have to see how it's performing.

///
/// It is executed as though `x / NULL_IF(y <= 0, y)` which is safe because arrow-rs
/// skips evaluating fallible functions if one of their inputs is null.
fn div_gt_0(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If y'all like this path I'm happy to continue down it, adding the missing tests and removing the rough edges. I'm certainly plugging things in wrong, though the one test case we have now is passing.

when_then.1.evaluate(batch)
}
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sort of wondering if it makes sense to just emit NULLIF directly from try_new. It feels awful heavy to have to pick apart the CASE arms at runtime.

div,
)
}
ColumnarValue::Scalar(mask) => {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only get here if the divisor is a literal? If so, we really don't need this branch at all - we just won't end up here.

Is there a constant folding phase that I'm dodging in these tests?

Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Mar 22, 2025
@github-actions github-actions bot closed this Mar 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Changes to the physical-expr crates Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential optimization for CASE WHEN for protecting against divide by zero
1 participant