Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize TPC-H q16 to use columnar shuffle rather than native shuffle #1208

Open
Tracked by #391
andygrove opened this issue Jan 1, 2025 · 0 comments · May be fixed by #1209
Open
Tracked by #391

Optimize TPC-H q16 to use columnar shuffle rather than native shuffle #1208

andygrove opened this issue Jan 1, 2025 · 0 comments · May be fixed by #1209
Labels
enhancement New feature or request performance

Comments

@andygrove
Copy link
Member

What is the problem the feature request solves?

By default, Comet always chooses Native Shuffle over Columnar Shuffle. However, TPC-H q16 performs better with Columnar Shuffle.

Columnar Shuffle Plan

CometSort
+- AQEShuffleRead
   +- CometSinkPlaceHolder
      +- CometColumnarExchange
         +- CometHashAggregate
            +- AQEShuffleRead
               +- CometColumnarExchange
                  +- CometHashAggregate
                     +- CometHashAggregate
                        +- AQEShuffleRead
                           +- CometColumnarExchange
                              +- CometHashAggregate
                                 +- CometProject
                                    +- CometHashJoin
                                       :- AQEShuffleRead
                                       :  +- CometColumnarExchange
                                       :     +-  BroadcastHashJoin [COMET: BroadcastHashJoin is not enabled because the following children are not native (BroadcastExchange)]
                                       :        :- ColumnarToRow
                                       :        :  +- CometFilter
                                       :        :     +- CometScan parquet
                                       :        +- BroadcastExchange
                                       :           +- ColumnarToRow
                                       :              +- CometProject
                                       :                 +- CometFilter
                                       :                    +- CometScan parquet
                                       +- AQEShuffleRead
                                          +- CometColumnarExchange
                                             +- CometFilter
                                                +- CometScan parquet

Native Shuffle Plan

 Sort [COMET: Sort is not native because the following children are not native (Exchange), Sort is not native because the following children are not native (AQEShuffleRead)]
+- AQEShuffleRead
   +-  Exchange [COMET: Native shuffle: unsupported Spark partitioning: org.apache.spark.sql.catalyst.plans.physical.RangePartitioning]
      +- HashAggregate
         +- AQEShuffleRead
            +-  Exchange [COMET: Exchange is not native because the following children are not native (HashAggregate)]
               +-  HashAggregate [COMET: HashAggregate is not native because the following children are not native (HashAggregate)]
                  +-  HashAggregate [COMET: HashAggregate is not native because the following children are not native (Exchange), HashAggregate is not native because the following children are not native (AQEShuffleRead)]
                     +- AQEShuffleRead
                        +-  Exchange [COMET: Exchange is not native because the following children are not native (HashAggregate)]
                           +-  HashAggregate [COMET: HashAggregate is not native because the following children are not native (Project)]
                              +-  Project [COMET: Project is not native because the following children are not native (ShuffledHashJoin)]
                                 +-  ShuffledHashJoin [COMET: ShuffleHashJoin disabled because the following children are not native (Exchange), ShuffleHashJoin disabled because the following children are not native (AQEShuffleRead, AQEShuffleRead)]
                                    :- AQEShuffleRead
                                    :  +-  Exchange [COMET: Exchange is not native because the following children are not native (BroadcastHashJoin)]
                                    :     +-  BroadcastHashJoin [COMET: BroadcastHashJoin is not enabled because the following children are not native (BroadcastExchange)]
                                    :        :- ColumnarToRow
                                    :        :  +- CometFilter
                                    :        :     +- CometScan parquet
                                    :        +- BroadcastExchange
                                    :           +- ColumnarToRow
                                    :              +- CometProject
                                    :                 +- CometFilter
                                    :                    +- CometScan parquet
                                    +- ColumnarToRow
                                       +- AQEShuffleRead
                                          +- CometExchange
                                             +- CometFilter
                                                +- CometScan parquet

Describe the potential solution

Ideally Comet should choose columnar shuffle in this case

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
1 participant