0.4.0
Pre-release
Pre-release
DataFusion Comet 0.4.0 Changelog
This release consists of 51 commits from 10 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
- fix: Use the number of rows from underlying arrays instead of logical row count from RecordBatch #972 (viirya)
- fix: The spilled_bytes metric of CometSortExec should be size instead of time #984 (Kontinuation)
- fix: Properly handle Java exceptions without error messages; fix loading of comet native library from java.library.path #982 (Kontinuation)
- fix: Fallback to Spark if scan has meta columns #997 (viirya)
- fix: Fallback to Spark if named_struct contains duplicate field names #1016 (viirya)
- fix: Make comet-git-info.properties optional #1027 (andygrove)
- fix: TopK operator should return correct results on dictionary column with nulls #1033 (viirya)
- fix: need default value for getSizeAsMb(EXECUTOR_MEMORY.key) #1046 (neyama)
Performance related:
- perf: Remove one redundant CopyExec for SMJ #962 (andygrove)
- perf: Add experimental feature to replace SortMergeJoin with ShuffledHashJoin #1007 (andygrove)
- perf: Cache jstrings during metrics collection #1029 (mbutrovich)
Implemented enhancements:
- feat: Support
GetArrayStructFields
expression #993 (Kimahriman) - feat: Implement bloom_filter_agg #987 (mbutrovich)
- feat: Support more types with BloomFilterAgg #1039 (mbutrovich)
- feat: Implement CAST from struct to string #1066 (andygrove)
- feat: Use official DataFusion 43 release #1070 (andygrove)
- feat: Implement CAST between struct types #1074 (andygrove)
- feat: support array_append #1072 (NoeB)
- feat: Require offHeap memory to be enabled (always use unified memory) #1062 (andygrove)
Documentation updates:
- doc: add documentation interlinks #975 (comphead)
- docs: Add IntelliJ documentation for generated source code #985 (mbutrovich)
- docs: Update tuning guide #995 (andygrove)
- docs: Various documentation improvements #1005 (andygrove)
- docs: clarify that Maven central only has jars for Linux #1009 (andygrove)
- doc: fix K8s links and doc #1058 (comphead)
- docs: Update benchmarking.md #1085 (rluvaton-flarion)
Other:
- chore: Generate changelog for 0.3.0 release #964 (andygrove)
- chore: fix publish-to-maven script #966 (andygrove)
- chore: Update benchmarks results based on 0.3.0-rc1 #969 (andygrove)
- chore: update rem expression guide #976 (kazuyukitanimura)
- chore: Enable additional CreateArray tests #928 (Kimahriman)
- chore: fix compatibility guide #978 (kazuyukitanimura)
- chore: Update for 0.3.0 release, prepare for 0.4.0 development #970 (andygrove)
- chore: Don't transform the HashAggregate to CometHashAggregate if Comet shuffle is disabled #991 (viirya)
- chore: Make parquet reader options Comet options instead of Hadoop options #968 (parthchandra)
- chore: remove legacy comet-spark-shell #1013 (andygrove)
- chore: Reserve memory for native shuffle writer per partition #988 (viirya)
- chore: Bump arrow-rs to 53.1.0 and datafusion #1001 (kazuyukitanimura)
- chore: Revert "chore: Reserve memory for native shuffle writer per partition (#988)" #1020 (viirya)
- minor: Remove hard-coded version number from Dockerfile #1025 (andygrove)
- chore: Reserve memory for native shuffle writer per partition #1022 (viirya)
- chore: Improve error handling when native lib fails to load #1000 (andygrove)
- chore: Use twox-hash 2.0 xxhash64 oneshot api instead of custom implementation #1041 (NoeB)
- chore: Refactor Arrow Array and Schema allocation in ColumnReader and MetadataColumnReader #1047 (viirya)
- minor: Refactor binary expr serde to reduce code duplication #1053 (andygrove)
- chore: Upgrade to DataFusion 43.0.0-rc1 #1057 (andygrove)
- chore: Refactor UnaryExpr and MathExpr in protobuf #1056 (andygrove)
- minor: use defaults instead of hard-coding values #1060 (andygrove)
- minor: refactor UnaryExpr handling to make code more concise #1065 (andygrove)
- chore: Refactor binary and math expression serde code #1069 (andygrove)
- chore: Simplify CometShuffleMemoryAllocator to use Spark unified memory allocator #1063 (viirya)
- test: Restore one test in CometExecSuite by adding COMET_SHUFFLE_MODE config #1087 (viirya)
Credits
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
19 Andy Grove
13 Matt Butrovich
8 Liang-Chi Hsieh
3 KAZUYUKI TANIMURA
2 Adam Binford
2 Kristin Cowalcijk
1 NoeB
1 Oleks V
1 Parth Chandra
1 neyama
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.