-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hook for sharing join state in distributed execution #12523
base: main
Are you sure you want to change the base?
Conversation
Set as draft for now as I still need to integrate into |
let join_metrics = BuildProbeJoinMetrics::new(partition, &self.metrics); | ||
let left_fut = match self.mode { | ||
PartitionMode::CollectLeft => self.left_fut.once(|| { | ||
let reservation = | ||
MemoryConsumer::new("HashJoinInput").register(context.memory_pool()); | ||
|
||
let probe_threads = shared_state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If shared state is available we use the num_task_partitions
instead of output partitioning to determine the number of local probe threads
return Poll::Ready(Ok(StatefulStreamResult::Continue)); | ||
} | ||
|
||
if let Some(shared_state) = build_side.left_data.shared_state.as_ref() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When all local probe threads are complete and there is shared state we need to probe that before competing
@@ -2224,6 +2354,179 @@ mod tests { | |||
assert_batches_sorted_eq!(expected, &batches); | |||
} | |||
|
|||
struct Coordinator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage example here. In a real-world scenario the Coordinator
would likely be an external service which we communicate with through an rpc call
use datafusion_expr::Operator; | ||
use datafusion_physical_expr_common::datum::compare_op_for_nested; | ||
use futures::{ready, Stream, StreamExt, TryStreamExt}; | ||
use parking_lot::Mutex; | ||
|
||
/// `SharedJoinState` provides an extension point allowing | ||
/// `HashJoinStream` to share the `visited_indices_bitmap` of the build side of a join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it left side or right side indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left
@korowa FYI |
/// across probe tasks without shared memory. | ||
/// | ||
/// This can be used to, for example, implement a left outer join efficiently as a broadcast join | ||
/// if the left side is small |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it left side small or right? My feeling was the left(driving) table is huge and right is small
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's the opposite. The left (build) side is small and can be efficiently broadcast. Then the right (probe) side can be partitioned across multiple nodes with the build side broadcast to all of them.
} | ||
} | ||
|
||
fn merge_bitmap(m1: &mut BooleanBufferBuilder, m2: BooleanBuffer) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bitmap here is boolean bitmask of what was visited or matched by join/filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. What will be shared and merged is the JoinLeftData::visited_indices_bitmap
Maybe its easier to build some diagram in draw.io or something? |
Yeah, good idea I'll work something up |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Closes #12454
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?