Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Add more criterion benchmarks for shuffle writer #1180

Closed
wants to merge 12 commits into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Dec 18, 2024

Which issue does this PR close?

Part of #1123

Rationale for this change

With the metrics that were added in #1175 we can now see how much time is spent in different areas of shuffle write. The next is to try and optimized this code and this will be easier if we have microbenchmarks for each area:

  • Evaluating partitioning expressions (there is opportunity for small saving with fast path for simple column references)
  • Hashing and calculating partition ids
  • Repartitioning the input batches
  • Encoding and compressing
  • Spilling

What changes are included in this PR?

  • Refactor shuffle write code to allow for micro benchmarking
  • Add new benchmarks

How are these changes tested?

No functional changes. Rely on existing tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant