Fuse `CASE(a > 0, b / a)` #14200

nik9000 · 2025-01-19T15:43:09Z

This fuses the expression CASE(a > 0, b / a) into a single expressions that skips the partial evaluation required by the other implementations of CASE.

Which issue does this PR close?

Closes #11570

Rationale for this change

See issue.

What changes are included in this PR?

Two proposed solutions - one via NULLIF and another a hand rolled kernel for the entire operation.

Are these changes tested?

The existing tests hit the new code path and I've added an assertion to confirm it. I've left NOCOMMITs for the two or three new tests that I'll need to write before merging this.

Are there any user-facing changes?

None.

This fuses the expression `CASE(a > 0, b / a)` into a single expressions that skips the partial evaluation required by the other implementations of `CASE`.

nik9000 · 2025-01-19T15:45:25Z

datafusion/physical-expr/src/expressions/binary/kernels.rs

@@ -164,3 +166,177 @@ pub fn concat_elements_utf8view(
    }
    Ok(result.finish())
 }


#11570 talks about implementing a new kernel for the divide-if-gt-zero operation and I took a stab at it but realized it was pretty involved. Or maybe I'm not reading the right bits. It looks to me like the existing kernels for binary operations in arrow-rs don't support Option as a return. I've left this here to illustrate the kind of things that I think have to be added to make a whole new kernel for this. Though arrow-rs is a big project and I'm probably missing something.

If y'all this is the right way and it's worth it I can try to finish this, but it feels well outside the bounds of a good-first-issue. Not that the code is hard, it's just that I don't know the benchmarking tools y'all have to see how it's performing.

nik9000 · 2025-01-19T15:48:31Z

datafusion/physical-expr/src/expressions/case.rs

+    ///
+    /// It is executed as though `x / NULL_IF(y <= 0, y)` which is safe because arrow-rs
+    /// skips evaluating fallible functions if one of their inputs is null.
+    fn div_gt_0(&self, batch: &RecordBatch) -> Result<ColumnarValue> {


If y'all like this path I'm happy to continue down it, adding the missing tests and removing the rough edges. I'm certainly plugging things in wrong, though the one test case we have now is passing.

nik9000 · 2025-01-19T15:50:38Z

datafusion/physical-expr/src/expressions/case.rs

+                    when_then.1.evaluate(batch)
+                }
+            }
+        }


I'm sort of wondering if it makes sense to just emit NULLIF directly from try_new. It feels awful heavy to have to pick apart the CASE arms at runtime.

nik9000 · 2025-01-19T21:05:26Z

datafusion/physical-expr/src/expressions/case.rs

+                    div,
+                )
+            }
+            ColumnarValue::Scalar(mask) => {


Can we only get here if the divisor is a literal? If so, we really don't need this branch at all - we just won't end up here.

Is there a constant folding phase that I'm dodging in these tests?

github-actions · 2025-03-22T02:01:47Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

Fuse CASE(a > 0, b / a)

c7d0179

This fuses the expression `CASE(a > 0, b / a)` into a single expressions that skips the partial evaluation required by the other implementations of `CASE`.

github-actions bot added the physical-expr Changes to the physical-expr crates label Jan 19, 2025

nik9000 commented Jan 19, 2025

View reviewed changes

nik9000 added 4 commits January 19, 2025 12:28

Missing tests

1569bb4

more test

16f5740

dry

b05561e

One more

851d1c1

nik9000 commented Jan 20, 2025

View reviewed changes

github-actions bot added the Stale PR has not had any activity for some time label Mar 22, 2025

github-actions bot closed this Mar 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuse `CASE(a > 0, b / a)` #14200

Fuse `CASE(a > 0, b / a)` #14200

nik9000 commented Jan 19, 2025

nik9000 Jan 19, 2025

nik9000 Jan 19, 2025

nik9000 Jan 19, 2025

nik9000 Jan 19, 2025

nik9000 Jan 19, 2025

github-actions bot commented Mar 22, 2025

Fuse CASE(a > 0, b / a) #14200

Fuse CASE(a > 0, b / a) #14200

Conversation

nik9000 commented Jan 19, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

nik9000 Jan 19, 2025

Choose a reason for hiding this comment

nik9000 Jan 19, 2025

Choose a reason for hiding this comment

nik9000 Jan 19, 2025

Choose a reason for hiding this comment

nik9000 Jan 19, 2025

Choose a reason for hiding this comment

nik9000 Jan 19, 2025

Choose a reason for hiding this comment

github-actions bot commented Mar 22, 2025

Fuse `CASE(a > 0, b / a)` #14200

Fuse `CASE(a > 0, b / a)` #14200