perf[arrow-select]: add specialized REE interleave#9856
Conversation
|
cc @alamb or any other maintainer that owns this code |
| for (out_pos, &(arr, row)) in indices.iter().enumerate() { | ||
| let row = R::Native::from_usize(row).ok_or_else(|| { | ||
| ArrowError::InvalidArgumentError(format!( | ||
| "interleave_run_end: row index {row} out of range" |
There was a problem hiding this comment.
Similarly here, I don't think this check looks correct
There was a problem hiding this comment.
This is the same as above, this checks that the usize from interleave is in fact a valid index into the input arrays. Since the input arrays are REE<R>, this check must pass, otherwise the interleave indexes have been incorrectly formed.
There was a problem hiding this comment.
Im also a bit confused about this check, your checking if the row is out of bounds but couldn't you do this by checking the size of the array like
let current_array = values_arrays[arr]
if current_array.len() >= row { return arrow error( "row index {row} out of range"}
for example
` let mut builder = PrimitiveRunBuilder::<Int16Type, Int16Type>::new();
builder.extend([0, 0, 0, 1, 1, 0, 0, 1, 1, 1].into_iter().map(Some));
let a = builder.finish();
let mut builder = PrimitiveRunBuilder::<Int16Type, Int16Type>::new();
builder.extend([2, 2, 1, 1, 1, 0, 1, 0, 0, 0].into_iter().map(Some));
let b = builder.finish();
// logical: [1, 1, 1, 1, 1] across an a→b boundary; should compact to one run.
// greater than int16::max
let result = interleave(&[&a, &b], &[(0, 32766), (0, 4), (1, 2), (1, 3), (1, 4)]).unwrap();
let result = result.as_run::<Int16Type>();`
This code returns an error but the error comes from the call to get_physical_indices()
let phys = runs[arr_idx].get_physical_indices(&logical_rows)?; not the validation step that your doing within the loop.
There was a problem hiding this comment.
Yeah, I think the confusion is the error message. I will change that. What I'm really doing here is a usize->R conversion based on need so that I can use it in get_physical_indices below and erroring if it fails. I'm checking whether the index is even representable in the array's type not whether the index is out of bounds on the input.
|
|
||
| // Coalesce by physical-pair equality only: emit a new run when the | ||
| // (array_idx, physical_idx) pair changes between adjacent output rows. | ||
| // TODO: We could perform an equality check across sources to extend the |
There was a problem hiding this comment.
Yes, exactly. That PR would make sense in this block so we don't compact in the interleave fallback. This also means that the equality cost is only paid when interleave pairs select from different input run arrays (assumption is input run arrays are well formed). I'm concerned about the per-row slicing cost though. I think ideally you would have a cache of comparators but I believe that require some crate readjusting.
|
hey 😃 , Im working on #9865 which works to resolve #7710. I added a test case from my branch that isn't working on this branch currently. Im going to pull your changes down and push up a revised branch. |
|
@asubiotto I made a PR could you check it out? #9919 |
Hi, thanks for pushing that up. While I think we eventually want to do this I would prefer an incremental approach which is already much better than what we have today (deduping logical runs within the same source). The reason is that while I think we should eventually dedup based on values, I'm not too keen on the slice cost per value and I think we can probably work on a much more performant approach by building and reusing a comparator to reduce the dynamic dispatch overhead. This is why I think we should decouple the two changes: 1) Merge the specialized REE interleave and 2) Optimize the interleave by value deduplication across sources |
f103e47 to
6fd0803
Compare
|
I'm also seeing a need for better value equality checks for |
|
@asubiotto yea that makes sense to me. ill close the PR. |
|
run benchmark interleave_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing asubiotto/specializedreeinterleave (6fd0803) to b114241 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
@asubiotto could you merge up from main so we can compare the benchmark? |
The specialized interleave works by preserving run ends as much as possible by coalescing groups of adjacent logical indices pointing to the same source and calling interleave on the run end values. Future work could additionally coalesce values across sources, but this requires a value equality check. Signed-off-by: Alfonso Subiotto Marques <[email protected]>
6fd0803 to
b8165b1
Compare
|
Oops, done. |
|
run benchmark interleave_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing asubiotto/specializedreeinterleave (b8165b1) to 97ff198 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Thanks @asubiotto and @Jefffrey |
Benchmarks for this PR are in #9849. They have been separated out so we can compare this PR to main once the benchmarks have merged.
The specialized interleave works by preserving run ends as much as possible by coalescing groups of adjacent logical indices pointing to the same source and calling interleave on the run end values.
Future work could additionally coalesce values across sources, but this requires a value equality check.
Which issue does this PR close?
Rationale for this change
interleave_fallback on REE arrays is slow
What changes are included in this PR?
A specialized REE interleave implementation
Are these changes tested?
Yes, by existing tests.
Are there any user-facing changes?