-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement custom RecordBatch serde for shuffle for improved performance #1190
base: main
Are you sure you want to change the base?
Conversation
49d0c27
to
f7d8cce
Compare
use std::io::Write; | ||
use std::sync::Arc; | ||
|
||
pub fn write_batch_fast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to end up implementing a form of arrow (stream) IPC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered that we may be able to just use https://docs.rs/arrow-ipc/latest/arrow_ipc/writer/struct.IpcDataGenerator.html#method.encoded_batch and am going to look into that next
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to end up implementing a form of arrow (stream) IPC?
Yes, but without using flatbuffers to align and encode anything, just the raw bytes, and without the metadata messages.
native/core/benches/batch_serde.rs
Outdated
|
||
fn create_batch() -> RecordBatch { | ||
let schema = Arc::new(Schema::new(vec![ | ||
Field::new("c0", DataType::Utf8, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove interesting if other datatypes keep the same performance benefit
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1190 +/- ##
=============================================
- Coverage 56.94% 34.83% -22.11%
- Complexity 929 990 +61
=============================================
Files 112 116 +4
Lines 10985 43844 +32859
Branches 2119 9564 +7445
=============================================
+ Hits 6255 15274 +9019
- Misses 3617 25599 +21982
- Partials 1113 2971 +1858 ☔ View full report in Codecov by Sentry. |
Which issue does this PR close?
Closes #1189
Builds on #1192
Rationale for this change
Arrow IPC is a good general purpose serde framework but we can get better performance by implementing specialized code optimized for Comet, which encodes single batches to shuffle blocks.
This PR implements a new BatchWriter and BatchReader and updates shuffle writer to use them when possible (when all data types are supported), falling back to Arrow IPC for other cases.
Specializations include:
Microbenchmarks (encoding only, no compression)
Without compression, we see an almost 3x speedup in writes.
Note that the time saved is tiny compared to compression costs, but it still helps. With this PR I am seeing a TPC-H time of 329s compared to 336s in #1192, which this PR builds on.
Spark takes 644s, so with this PR, we are 1.96x faster than Spark. We need to shave off another 7 seconds now to get to 2x (we may get this with the new ParquetExec work).
Benchmark Results
Single node TPC-H.
Single node TPC-DS with optimized version of q72 (better join order).
TPC-H q3
Encoding + compression is now much closer to Gluten + Velox for the
lineitem
exchange (8.6s versus 6.2s).Comet:
Gluten + Velox:
What changes are included in this PR?
How are these changes tested?