Skip to content

refactor(joins::utils): Replace OnceAsync/OnceFut with tokio's OnceCell #15431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ctsk
Copy link
Contributor

@ctsk ctsk commented Mar 26, 2025

Rationale for this change

I wrote this PR mostly to show that it can be done. Replacing OnceFut with a OnceCell has some advantages though:

  • It no longer stores a Future in the ExecutionPlan, but just the synchronization primitive. In theory this can allow to make the ExecutionPlan implement Clone too.
  • One fewer custom synchronization primitive to maintain

What changes are included in this PR?

  1. Introduce SharedResultOnceCell - A wrapper around tokio's OnceCell<SharedResult<Arc<T>>>. Arc's are used all over the place to avoid lifetime issues
  2. Replace usages of OnceAsync(OnceFut) with SharedResultOnceCell (SharedResultPending)
  3. Adjust variable names and doc comments in all users of OnceFut
  4. Remove OnceAsync + OnceFut

Are these changes tested?

This PR should not change the behaviour of the join implementations.

Are there any user-facing changes?

No.

@ctsk ctsk changed the title Refactor(joins::utils): Replace OnceAsync/OnceFut with tokio's OnceCell refactor(joins::utils): Replace OnceAsync/OnceFut with tokio's OnceCell Mar 26, 2025
@alamb
Copy link
Contributor

alamb commented Mar 27, 2025

Thanks @ctsk -- this idea sounds good -- I think we should run the benchmarks to make sure they don't cause issues. I'll do so shortly

@ctsk
Copy link
Contributor Author

ctsk commented Apr 1, 2025

I'm moving this back to the draft stage, because it will conflict with #15476 once I fix that PR.

I can't see a way to achieve the goals of this PR and #15476 at the same time with a single OnceCell. I suspect that if I pursue #15476, I can not avoid storing a SendableRecordBatchStream or some other Future in the ExecutionPlan.

So overall, it seems to me like this PR is less clear of a win.

@ctsk ctsk marked this pull request as draft April 1, 2025 13:37
@alamb
Copy link
Contributor

alamb commented Apr 1, 2025

@ctsk thank you so much for this PR (and others) can you share more about what you are doing with DataFusion? If you would like to have a brief introduction video call, feel free to email me at [email protected] and we can setup something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants