[branch-46] feat: introduce JoinSetTracer trait for tracing context propagation in spawned tasks #9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Relates to apache#9415, but does not fully close it. It lays groundwork for optional instrumentation of async tasks in DataFusion.
Approved upstream PR: apache#14547
Rationale for this change
This PR introduces a general mechanism enabling DataFusion to propagate user-defined context (such as tracing spans, logging, or metrics) across thread boundaries without depending on any specific instrumentation library.
Previously, tasks spawned on new threads—such as those performing repartitioning or Parquet file reads—would lose thread-local context, making instrumentation challenging for users. The introduced approach addresses this gap by allowing users to inject custom instrumentation via the new
JoinSetTracer
trait. This ensures context is preserved seamlessly, keeping DataFusion lightweight by not adding any direct instrumentation dependencies.What changes are included in this PR?
JoinSetTracer
trait: Defines how to instrument futures or blocking closures when tasks are spawned on threads.set_join_set_tracer
function for registering a custom tracer at startup. If no tracer is set, a no-op implementation is used by default.JoinSet
: Introduces a wrapper around Tokio'sJoinSet
that leverages the registered tracer (if available) to instrument spawned tasks transparently.datafusion-examples/examples/tracing.rs
, demonstrating how users can integrate their tracing implementations. This example does not impose any direct tracing dependency on DataFusion users.Are these changes tested?
Yes. There are no dedicated unit tests specifically for the tracer injection, but the example in
datafusion-examples/examples/tracing.rs
shows a working end-to-end setup usingtracing
. By running that example, you can confirm that tasks spawned on multiple threads inherit whichever span is active at the moment they are spawned—if a tracer is registered.Are there any user-facing changes?
JoinSetTracer
and callset_join_set_tracer(...)
. This approach is fully optional.The upshot is that DataFusion now provides a pluggable way to connect with tracing or other instrumentation without pulling those dependencies into DataFusion by default.