feat: [1941-Part2]: Introduce map_to_list scalar function #2312

rishvin · 2025-09-05T05:36:17Z

Which issue does this PR close?

Addresses Part of #1941

Rationale for this change

Introduces map_to_list which converts a MapArray to ListArray<Struct<K, V>.

This PR has been split from original PR #2221. Please see that for details.

What changes are included in this PR?

A new map_to_list physical function.

How are these changes tested?

Added unit tests.

rishvin · 2025-09-05T05:37:09Z

@comphead / @andygrove/ @parthchandra split the second function.

codecov-commenter · 2025-09-05T18:53:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.62%. Comparing base (f09f8af) to head (da1a397).
⚠️ Report is 488 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2312      +/-   ##
============================================
+ Coverage     56.12%   57.62%   +1.49%     
- Complexity      976     1297     +321     
============================================
  Files           119      147      +28     
  Lines         11743    13497    +1754     
  Branches       2251     2390     +139     
============================================
+ Hits           6591     7777    +1186     
- Misses         4012     4451     +439     
- Partials       1140     1269     +129

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

comphead · 2025-09-09T17:39:12Z

Sorry @rishvin it looks like some conflicts in there

rishvin · 2025-09-10T03:24:51Z

Sorry @rishvin it looks like some conflicts in there

Thanks @comphead , resolved the conflict. I forced pushed to change my commit message from chore to feat.

rishvin · 2025-09-11T00:33:35Z

Spark SQL Tests / spark-sql-sql_core-1/ubuntu-24.04/spark-3.5.6/java-11 (pull_request)

This failure looks unrelated to the PR changes.

comphead

Thanks @rishvin what exactly spark sql function this PR addresses?
I checked https://spark.apache.org/docs/latest/api/sql/search.html?q=map I dont see map_to_list

Are you referring to https://spark.apache.org/docs/latest/api/sql/#map_entries ?
If so this function already supported

rishvin · 2025-09-12T01:01:09Z

Thanks @rishvin what exactly spark sql function this PR addresses? I checked https://spark.apache.org/docs/latest/api/sql/search.html?q=map I dont see map_to_list

Are you referring to https://spark.apache.org/docs/latest/api/sql/#map_entries ? If so this function already supported

Hey @comphead : This is not spark sql function. We are using it internally to support grouping on map type.

I know there has been some confusion around MapSort lately, hence providing the full context here regarding the original PR: #2221. There is also subtle difference between map_to_list and map_entries, mentioned later.

Spark4.0+ when doing grouping on Map type applies a logical rule to wrap the grouping column with MapSort expression. This expression class sorts the Map elements by keys, such that the output performed after evaluation is an ordered map. This ordered map can then be supplied for grouping because it will produce the correct result. For eg. these 2 map elements - {a: 1, b: 2} and {b:2, a:1}, regardless of ordering of keys in the map are essentially the same, hence should be part of the same group. The MapSort will sort them as {a: 1, b: 2} and groupby will put them in the same group {a: 1, b: 2}.

Today, writing only the equivalent MapSort expression in Comet along will not work. This is because DataFusion does not directly support grouping on Map type. If we attempt to pass a Map type to DataFusion, it will throw with Not implemented ... error. However, DataFusion does support grouping on a more canonical form like List. So, if we change the Map type to List type, then Datafusion will work and give the correct result. To do so, I added this internal function map_to_list that will convert the Map type to the List type before passing it to the grouping expression. This translation from map to list type is done here, when building the grouping expressions.

But now passing map_to_list to the grouping expression has one problem, the output grouping keys produced by the HashAggregate will be a List instead of Map. The result produced cannot be directly consumed by the downstream operators because they expect the grouping key type to be Map, however, we are passing them List type. We will hit schema-mismatch exception. So, we must convert the List type back to Map again. This translation is done by map_from_list internal function, which convert the output grouping keys back to Map type. This function is implemented in part3 here. And the conversion of List to Map type happens here by applying a projection on top of HashAggregate.

These conversion are totally done in native code, localized to HashAggregate. Since, we maintain the schema requirements, the result produced by the HashAggregate can be consumed downstream Spark operators also.

--
Regarding the second part of using map_entries:

I checked the implementation in Datafusion, here, which has one difference. It doesn't carry forward the field names. It is creating key field with name key and value field with name value. This means, if the original map has field names as keys and values (plural), they will not be retained when doing this conversion. Also, it does not carry forward sorted flag which is very specific to Map type.

The map_to_list addresses this by cloning the incoming Map's field and piggybacking the sorted flag in the metadata hashmap of the field. This is being done here.

And when converting the list back to map, the original field and metadata is consumed to construct the map type back. This is done here.

Retaining these metadata information is important, otherwise, we may run into schema mismatch issues. I ran into some schema related issue when implementing this.

So, I think we may be able to use map_entries but we should retain the field and map specific metadata.

Let me know if you have any further questions and your thoughts around this.

comphead · 2025-09-12T14:54:36Z

Thanks @rishvin appreciate if you can take #2388 fill up the context and we can go through PRs once again

rishvin changed the title ~~chore: [1941-Part2]: Introduce map_to_list scalar function~~ feat: [1941-Part2]: Introduce map_to_list scalar function Sep 7, 2025

feat: [1941-Part2]: Introduce map_to_list scalar function

da1a397

rishvin force-pushed the 1941-part2-introduce-map-to-list-function branch from d90c182 to da1a397 Compare September 10, 2025 03:23

comphead reviewed Sep 11, 2025

View reviewed changes

rishvin mentioned this pull request Sep 13, 2025

Support map types for aggregation #2388

Open

rishvin marked this pull request as draft September 13, 2025 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: [1941-Part2]: Introduce map_to_list scalar function #2312

feat: [1941-Part2]: Introduce map_to_list scalar function #2312

Uh oh!

rishvin commented Sep 5, 2025

Uh oh!

rishvin commented Sep 5, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Sep 5, 2025 •

edited

Loading

Uh oh!

comphead commented Sep 9, 2025

Uh oh!

rishvin commented Sep 10, 2025

Uh oh!

rishvin commented Sep 11, 2025

Uh oh!

comphead left a comment

Uh oh!

rishvin commented Sep 12, 2025 •

edited

Loading

Uh oh!

comphead commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: [1941-Part2]: Introduce map_to_list scalar function #2312

Are you sure you want to change the base?

feat: [1941-Part2]: Introduce map_to_list scalar function #2312

Uh oh!

Conversation

rishvin commented Sep 5, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

rishvin commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

comphead commented Sep 9, 2025

Uh oh!

rishvin commented Sep 10, 2025

Uh oh!

rishvin commented Sep 11, 2025

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

rishvin commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

comphead commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rishvin commented Sep 5, 2025 •

edited

Loading

codecov-commenter commented Sep 5, 2025 •

edited

Loading

rishvin commented Sep 12, 2025 •

edited

Loading