-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf: Support automatically concat_batches for sort which will improve performance #15375
Comments
take |
cc @alamb ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ concat_batches_for_sort ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1 │ 2241.04ms │ 1816.69ms │ +1.23x faster │
│ Q2 │ 1841.01ms │ 1496.73ms │ +1.23x faster │
│ Q3 │ 12755.85ms │ 12770.18ms │ no change │
│ Q4 │ 4433.49ms │ 3278.70ms │ +1.35x faster │
│ Q5 │ 4414.15ms │ 4409.04ms │ no change │
│ Q6 │ 4543.09ms │ 4597.32ms │ no change │
│ Q7 │ 8012.85ms │ 9026.30ms │ 1.13x slower │
│ Q8 │ 6572.37ms │ 6049.51ms │ +1.09x faster │
│ Q9 │ 6734.63ms │ 6345.69ms │ +1.06x faster │
│ Q10 │ 9896.16ms │ 9564.17ms │ no change │
└──────────────┴────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main) │ 61444.64ms │
│ Total Time (concat_batches_for_sort) │ 59354.33ms │
│ Average Time (main) │ 6144.46ms │
│ Average Time (concat_batches_for_sort) │ 5935.43ms │
│ Queries Faster │ 5 │
│ Queries Slower │ 1 │
│ Queries with No Change │ 4 │
└────────────────────────────────────────┴────────────┘ |
This is a great observation, and the POC optimization has a high ROI. Here are some additional thoughts: This is just intuition, but I think the sorting phase should be faster than the sort-preserving merge phase. Because the sorting implementation is much simpler, it can rely on the existing optimized arrow row format and also quick sort implementation from the standard library. On the contrary, merging phase we have an in-house implementation of a loser tree heap, I think it's a bit complex so maybe it is also hard to optimize manually. ExampleThere is a sort query to run in 4 partitions, and each partition will process 100 input batches
ImplementationThe POC will copy the batches with |
Thank you @2010YOUY01 for review and good suggestion, i will improve my POC code and add more testing. |
Is your feature request related to a problem or challenge?
We should investigate and improve the sort code to support concat_batches for more cases besides the following case:
See details about the performance improvement:
#15348 (comment)
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: