Skip to content

feat: support array_distinct#1306

Closed
NoeB wants to merge 1 commit intoapache:mainfrom
NoeB:feat/array_distinct
Closed

feat: support array_distinct#1306
NoeB wants to merge 1 commit intoapache:mainfrom
NoeB:feat/array_distinct

Conversation

@NoeB
Copy link
Copy Markdown
Contributor

@NoeB NoeB commented Jan 18, 2025

Which issue does this PR close?

Part of #1042

Rationale for this change

What changes are included in this PR?

How are these changes tested?

spark.read.parquet(path.toString).createOrReplaceTempView("t1")
checkSparkAnswerAndOperator(spark.sql("Select array_distinct(array(_2, _3,_4)) from t1"))
checkSparkAnswerAndOperator(
spark.sql("Select array_distinct(array(_2,_4, null)) from t1"))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results for this check do not match on my machine but I am unsure why because in the spark shell (with and without comet) and also DataFusion cli It works as expected

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the different order is that datafusion sorts the data before it removes duplicates

@andygrove
Copy link
Copy Markdown
Member

Thanks for the contribution @NoeB, but this PR has become stale and there is now #1923 so I will close this one.

@andygrove andygrove closed this Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants