-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: introduce JoinSetTracer
trait for tracing context propagation in spawned tasks
#14547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce JoinSetTracer
trait for tracing context propagation in spawned tasks
#14547
Conversation
f7001cb
to
615dc39
Compare
234deb9
to
bea0836
Compare
README.md
Outdated
@@ -126,6 +126,7 @@ Optional features: | |||
- `backtrace`: include backtrace information in error messages | |||
- `pyarrow`: conversions between PyArrow and DataFusion types | |||
- `serde`: enable arrow-schema's `serde` feature | |||
- `tracing`: propagates the current span across thread boundaries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Is the new tracing
feature required? Or should we just make it the default?
} | ||
impl<T: 'static> JoinSet<T> { | ||
/// [JoinSet::spawn](tokio::task::JoinSet::spawn) - Spawn a new task. | ||
pub fn spawn<F>(&mut self, task: F) -> AbortHandle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: All public stable methods of the tokio::task::JoinSet
are wrapped. Should only the methods used in DataFusion be wrapped for conciseness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping all methods makes sense to me. The developer would expect using this wrapper to be the same as the original one.
bea0836
to
6a025d8
Compare
@alamb and @erratic-pattern |
Hi @geoffreyclaude @NGA-TRAN -- I will plan to review this PR shortly. Sorry for the delay. |
ccf9a9c
to
703aa32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @geoffreyclaude -- I am sorry for the delay in responding.
Primarily my concern about this PR is that it adds new more dependencies (albiet optional ones) to DataFusion. We already spend a non trivial amount of maintenance time keeping up with dependencies / fixing related issues, etc .
I also worry that not everyone uses the tracing crate.
New features also increase the compile time, such as
the "TracingExec", but it's a pretty wonky workaround. It also fails when the source nodes of the plan spawn new tasks, as in the ParquetSink, which prevent the >underlying object_store from linking its spans to the main trace.
FWIW the way we have handled this is to pass a tracing span along in PartitionedFile::extensions, but that is non ideal
Questions / Suggestions
Is there some way to avoid adding a direct dependency on tracing
but still keep the API flexibility you need?
For example, perhaps we could create a different JoinSet
that instead of hard coding tracing
could provide the callbacks for spawning tasks 🤔
And thank you @alamb for the detailed response! I'll try to address your points one by one.
the
This is why I added a feature guard, as tracing definitely adds some compilation overhead. With the feature disabled, there should be no impact at all.
This works if you want to link the object store calls to the root span, but I believe it won't allow you to link them to the "real" parent span, the
I really like that idea, as it doesn't need to pull in additional dependencies, and lets users chose their instrumentation solution.
To me, I don't see a simpler way to open up full instrumentation of execution plans than what's done in this PR. It's super unfortunate that the Since the minimal change is pretty small, we can keep our fork of the change internally. But it also seemed like a good reason to push it upstream and contribute the full execution plan instrumentation code on top of it! |
61805b1
to
3cd150c
Compare
@alamb: I've gone ahead and refactored to allow "injecting" the tracing behavior at runtime, as a separate new commit. As predicted, the code is a bit thorny, especially due to the Boxing/Unboxing dance needed to get static Sized closures. But it works (see Note that I've removed the |
7cd9209
to
69a5c78
Compare
157a6c6
to
933c88b
Compare
Thanks @geoffreyclaude -- I will try and look at this. I think @goldmedal was also interested in this capability (using tracing via the |
I'll take a look this PR in a few day. |
933c88b
to
be86f76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @geoffreyclaude. It makes sense to me. The way to inject the join set tracer is awesome.
} | ||
impl<T: 'static> JoinSet<T> { | ||
/// [JoinSet::spawn](tokio::task::JoinSet::spawn) - Spawn a new task. | ||
pub fn spawn<F>(&mut self, task: F) -> AbortHandle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping all methods makes sense to me. The developer would expect using this wrapper to be the same as the original one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @geoffreyclaude and @goldmedal -- this looks really nice. The example is also 👨🍳 👌 and I think will permit other people to use this functionality more easily
Relates to #9415. Does not fully close the issue, but moves forward with a pre-requisite.
From my perspective this PR now closes that issue's stated request:
I would like to make it easy for people to add DataFusion ExecutionPlan level tracing to their systems as well.
Is it ok if we close it as well after this is merged?
(this is a very nice first contribution @geoffreyclaude -- thank you for sticking with it 🙏 ) |
@alamb and @goldmedal thanks for the review!
Ok for me at least! There'll probably be some minor adjustments to make once we start playing with it (maybe some other futures to wrap, for instance in I've also updated the PR title and description to match the latest changes, otherwise it could be confusing the readers that didn't follow the PR review. |
tracing
feature is enabledJoinSetTracer
trait for context propagation in spawned tasks
JoinSetTracer
trait for context propagation in spawned tasksJoinSetTracer
trait for tracing context propagation in spawned tasks
I also added this feature to the list of things we should document with the next release Thanks again @geoffreyclaude and @goldmedal |
Which issue does this PR close?
This PR lays the groundwork for optional instrumentation of async tasks in DataFusion.
Rationale for this change
This PR introduces a general mechanism enabling DataFusion to propagate user-defined context (such as tracing spans, logging, or metrics) across thread boundaries without depending on any specific instrumentation library.
Previously, tasks spawned on new threads—such as those performing repartitioning or Parquet file reads—would lose thread-local context, making instrumentation challenging for users. The introduced approach addresses this gap by allowing users to inject custom instrumentation via the new
JoinSetTracer
trait. This ensures context is preserved seamlessly, keeping DataFusion lightweight by not adding any direct instrumentation dependencies.What changes are included in this PR?
JoinSetTracer
trait: Defines how to instrument futures or blocking closures when tasks are spawned on threads.set_join_set_tracer
function for registering a custom tracer at startup. If no tracer is set, a no-op implementation is used by default.JoinSet
: Introduces a wrapper around Tokio'sJoinSet
that leverages the registered tracer (if available) to instrument spawned tasks transparently.datafusion-examples/examples/tracing.rs
, demonstrating how users can integrate their tracing implementations. This example does not impose any direct tracing dependency on DataFusion users.Are these changes tested?
Yes. There are no dedicated unit tests specifically for the tracer injection, but the example in
datafusion-examples/examples/tracing.rs
shows a working end-to-end setup usingtracing
. By running that example, you can confirm that tasks spawned on multiple threads inherit whichever span is active at the moment they are spawned—if a tracer is registered.Are there any user-facing changes?
JoinSetTracer
and callset_join_set_tracer(...)
. This approach is fully optional.The upshot is that DataFusion now provides a pluggable way to connect with tracing or other instrumentation without pulling those dependencies into DataFusion by default.