Skip to content

Optimize TPC-H q16 to use columnar shuffle rather than native shuffle #1208

Closed
@andygrove

Description

@andygrove

What is the problem the feature request solves?

By default, Comet always chooses Native Shuffle over Columnar Shuffle. However, TPC-H q16 performs better with Columnar Shuffle.

Columnar Shuffle Plan

CometSort
+- AQEShuffleRead
   +- CometSinkPlaceHolder
      +- CometColumnarExchange
         +- CometHashAggregate
            +- AQEShuffleRead
               +- CometColumnarExchange
                  +- CometHashAggregate
                     +- CometHashAggregate
                        +- AQEShuffleRead
                           +- CometColumnarExchange
                              +- CometHashAggregate
                                 +- CometProject
                                    +- CometHashJoin
                                       :- AQEShuffleRead
                                       :  +- CometColumnarExchange
                                       :     +-  BroadcastHashJoin [COMET: BroadcastHashJoin is not enabled because the following children are not native (BroadcastExchange)]
                                       :        :- ColumnarToRow
                                       :        :  +- CometFilter
                                       :        :     +- CometScan parquet
                                       :        +- BroadcastExchange
                                       :           +- ColumnarToRow
                                       :              +- CometProject
                                       :                 +- CometFilter
                                       :                    +- CometScan parquet
                                       +- AQEShuffleRead
                                          +- CometColumnarExchange
                                             +- CometFilter
                                                +- CometScan parquet

Native Shuffle Plan

 Sort [COMET: Sort is not native because the following children are not native (Exchange), Sort is not native because the following children are not native (AQEShuffleRead)]
+- AQEShuffleRead
   +-  Exchange [COMET: Native shuffle: unsupported Spark partitioning: org.apache.spark.sql.catalyst.plans.physical.RangePartitioning]
      +- HashAggregate
         +- AQEShuffleRead
            +-  Exchange [COMET: Exchange is not native because the following children are not native (HashAggregate)]
               +-  HashAggregate [COMET: HashAggregate is not native because the following children are not native (HashAggregate)]
                  +-  HashAggregate [COMET: HashAggregate is not native because the following children are not native (Exchange), HashAggregate is not native because the following children are not native (AQEShuffleRead)]
                     +- AQEShuffleRead
                        +-  Exchange [COMET: Exchange is not native because the following children are not native (HashAggregate)]
                           +-  HashAggregate [COMET: HashAggregate is not native because the following children are not native (Project)]
                              +-  Project [COMET: Project is not native because the following children are not native (ShuffledHashJoin)]
                                 +-  ShuffledHashJoin [COMET: ShuffleHashJoin disabled because the following children are not native (Exchange), ShuffleHashJoin disabled because the following children are not native (AQEShuffleRead, AQEShuffleRead)]
                                    :- AQEShuffleRead
                                    :  +-  Exchange [COMET: Exchange is not native because the following children are not native (BroadcastHashJoin)]
                                    :     +-  BroadcastHashJoin [COMET: BroadcastHashJoin is not enabled because the following children are not native (BroadcastExchange)]
                                    :        :- ColumnarToRow
                                    :        :  +- CometFilter
                                    :        :     +- CometScan parquet
                                    :        +- BroadcastExchange
                                    :           +- ColumnarToRow
                                    :              +- CometProject
                                    :                 +- CometFilter
                                    :                    +- CometScan parquet
                                    +- ColumnarToRow
                                       +- AQEShuffleRead
                                          +- CometExchange
                                             +- CometFilter
                                                +- CometScan parquet

Describe the potential solution

Ideally Comet should choose columnar shuffle in this case

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions