Skip to content

fix: prevent native sort crash for Struct(Map(...)) keys#4157

Merged
andygrove merged 5 commits into
apache:mainfrom
0lai0:feat-4123-Struct/Map
May 7, 2026
Merged

fix: prevent native sort crash for Struct(Map(...)) keys#4157
andygrove merged 5 commits into
apache:mainfrom
0lai0:feat-4123-Struct/Map

Conversation

@0lai0
Copy link
Copy Markdown
Contributor

@0lai0 0lai0 commented Apr 30, 2026

Which issue does this PR close?

Closes #4123

Rationale for this change

Spark 4.1.1 SQL coverage exposed a case where Comet allows native aggregation for grouping keys that contain nested MapType (for example StructType(MapType(...))).
Although planning succeeds, downstream native sort/row-format handling does not support this shape, so the query fails at execution time instead of falling back to Spark during planning.

This PR makes aggregate planning reject map-containing grouping key types so these queries reliably fall back to Spark.

What changes are included in this PR?

  • add a recursive containsMapType helper in CometBaseAggregate
  • make aggregate planning fall back when any grouping expression contains MapType directly or nested in StructType/ArrayType
  • add regression test in CometAggregateSuite for grouping on Struct(Map(...))
  • add regression test in CometAggregateSuite for grouping on Array(Map(...))

How are these changes tested?

  • ./mvnw test -Pspark-4.1 -Dtest=none -Dsuites="org.apache.comet.exec.CometAggregateSuite grouping on struct containing map should fallback to Spark"
  • ./mvnw test -Pspark-4.1 -Dtest=none -Dsuites="org.apache.comet.exec.CometAggregateSuite group by array of map falls back to Spark (issue #4123)"
  • ./mvnw test -Pspark-4.1 -Dtest=none -Dsuites="org.apache.comet.exec.CometAggregateSuite"

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @0lai0

}
}

test("grouping on struct containing map should fallback to Spark") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to skip this for Spark 3.x with assume(isSpark40Plus)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually isSpark41Plus

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for point this out. I missed that. I'll add it in next commit.

@andygrove
Copy link
Copy Markdown
Member

@0lai0 test failure on 3.4:

- grouping on struct containing map should fallback to Spark *** FAILED *** (35 milliseconds)
  org.apache.spark.sql.AnalysisException: expression t.col1 cannot be used as a grouping expression because its data type struct<data:map<string,string>> is not an orderable data type.; line 6 pos 9;
Project [col1.data[key]#3959]
+- Sort [col1#3957.data[key] ASC NULLS FIRST], true
   +- Project [col1.data[key]#3959, col1#3957]
      +- Filter isnotnull(col1#3957.data[num])
         +- Aggregate [col1#3957], [col1#3957.data[key] AS col1.data[key]#3959, col1#3957]
            +- SubqueryAlias t
               +- LocalRelation [col1#3957]

@andygrove andygrove added this to the 0.16.0 milestone May 6, 2026
@0lai0
Copy link
Copy Markdown
Contributor Author

0lai0 commented May 6, 2026

Thanks @andygrove for review. PR updated.

@andygrove
Copy link
Copy Markdown
Member

@0lai0 I think you need to skip more tests with assume(isSpark41Plus, ...) ?

@0lai0
Copy link
Copy Markdown
Contributor Author

0lai0 commented May 6, 2026

That sounds like a good idea; I could try it locally. Thanks @andygrove.

@andygrove andygrove moved this from Todo to In progress in Comet Development May 6, 2026
@andygrove
Copy link
Copy Markdown
Member

CI is green now @0lai0 🎉

Could you fix the conflict?

@0lai0
Copy link
Copy Markdown
Contributor Author

0lai0 commented May 7, 2026

Sure, Thanks @andygrove. PR updated! 🎉

@andygrove andygrove merged commit 7cf04c8 into apache:main May 7, 2026
282 of 286 checks passed
@github-project-automation github-project-automation Bot moved this from In progress to Done in Comet Development May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Comet native sort lacks row-format support for Struct(Map(...)) sort keys

3 participants