Skip to content

HIVE-28665: Iceberg: Upgrade iceberg version to 1.9.1 #5846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

kasakrisz
Copy link
Contributor

What changes were proposed in this pull request?

Backport patches from iceberg:

mr
+ ec2c2e978e40a4bf7bf6a3a0bccbe7e3a68a0cc1 mr:Fix ugi not correct in WORKER_POOL (#10661)
+ a5c8f9cd4557639d39eed716d499e9837d13b88e Build: Remove unused variables, fields and parameters (#11101)
+ 168a9839425f9d3e2518ed345b7847ecb7cabb89 ThreadPools introduce newExitingWorkerPool and newFixedThreadPool for clearer semantics (#11073)
+ 5e279c868f8e2087f88ae779d8cc8474768bc5e2 Build, Spark, Flink: Bump junit from 5.10.1 to 5.11.1 (#11262)
+ 0280885ac95bdf763556a84bb9d7c6fd9c8c5e2a Pig: Remove iceberg-pig (#11380)
+ b38951db6a7061a595605229c21c1a1912a3a4c1 Data, Flink, MR, Spark: Test deletes with format-version=3 (#11538)
+ da53495bc1bb52db37cdd1ced5c2377001c9d482 Core, Flink, Spark, KafkaConnect: Remove usage of deprecated path API (#11744)
+ b9b61b1d72ebb192d5e90453ff7030ece73d2603 Avro: Support default values for generic data (#11786)

hive-metastore
+ d17a7f189afa25c6be37df1415f4e2f8594effbe Core: Remove deprecated APIs for 1.7.0 (#10818)
+ cf02ffac4329141b30bca265cafb9987f64f6cc4 AWS, Core, Hive: Extract FileIO closing into separate FileIOTracker class (#10893)
+ e449d3405cfdb304c94835845bd8f34a73b4a517 Hive: Add View support for HIVE catalog (#9852)
+ 6a5ae1ae6a01f1395ee70e046537cd87b990c4ae Core: Switch usage to DataFileSet / DeleteFileSet (#11158)
+ 6e9e07aa0f35197bb23b218d23716e928d1c2814 Hive: Bugfix for incorrect Deletion of Snapshot Metadata Due to OutOfMemoryError (#11576)
+ a95943e5561c78c18062852e7f8027a191562e08 Core: Propagate custom metrics reporter when table is created/replaced through Transaction (#11671)
+ da53495bc1bb52db37cdd1ced5c2377001c9d482 Core, Flink, Spark, KafkaConnect: Remove usage of deprecated path API (#11744)
+ a3dcfd19fd1b2a709f7bdf013b83836953d49c6f Hive: Optimize tableExists API in hive catalog (#11597)
+ e1d2271ad911d4224ad53ac2e0142b28984e5f0b Hive: Optimize viewExists API in hive catalog (#11813)
+ f129588461ad02c2fa2021af30b4e9bca70eee93 Core: Add support for view-default property in catalog (#11064)
- a05b2b53b792a6215732aa8b1118adfba0cf3317 Hive: Use correct classloader to load SQL script (#12140)
+ c02ebe4740b22d6f5a78b636aea2d918037b2751 Core: Set missing table-default property in RESTSessionCatalog (#11646)
+ d35cf23eb63cc88d4fdb6ae1db8e63630b073d13 Core: Add missing table-override property to REST catalog (#12548)
+ 8f6ebb5b36a0263edfcb04e0c104b26225f95b07 Core: Add `view-override` catalog property (#12534)
+ cbf34a6ab980ab9b7bb01e45c9e2bcfad8e2986b Build: Enforce error message check on Exception assertions (#12624)
+ c661a71091e496393c743ddd879d9e1a0f2747b2 Core, Hive: Double check commit status in case of commit conflict for NoLock (#12637)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Run existing and backported tests

@kasakrisz kasakrisz marked this pull request as draft June 4, 2025 13:37
@deniskuzZ
Copy link
Member

deniskuzZ commented Jun 4, 2025

@kasakrisz, please drop the following

  1. from iceberg-handler
    org.apache.iceberg.data.PartitionStatsHandler
    org.apache.iceberg.data.TestPartitionStatsHandler

  2. from patched-iceberg-core
    org.apache.iceberg.avro.*
    org.apache.iceberg.BaseScan.java
    org.apache.iceberg.Partitioning.java
    org.apache.iceberg.PartitionStats.java
    org.apache.iceberg.PartitionStatsUtil.java

  3. cleanup excludes in patched-iceberg-core pom (remove the dropped classes)

<excludes>
    **/HadoopInputFile.class
    **/HadoopTableOperations.class
    **/StructLikeMap.class
    **/StructLikeWrapper.class
    org.apache.iceberg.PartitionsTable.class
</excludes>

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests

@kasakrisz kasakrisz marked this pull request as ready for review June 5, 2025 11:44
import org.apache.thrift.TException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class HiveCatalog extends BaseMetastoreCatalog implements SupportsNamespaces, Configurable {
public class HiveCatalog extends BaseMetastoreViewCatalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so nice. The lack of view support is one of the bottlenecks when I try HIVE-28059 with other query engines.

Copy link
Member

@deniskuzZ deniskuzZ Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okumin, are you using RestCatalog locally? if yes, please suggest the default configs for the docker image: https://github.com/apache/hive/pull/5834/files#diff-cfa8481579367e1e9127939f845eba4d2ee8c796d6858bcd0ca0a4b5fbfb8019R35-R42

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm still halfway and I don't have very concrete suggestions. I have not successfully integrated it with other query engines.

@Aggarwal-Raghav
Copy link
Contributor

Should we upgrade parquet to 1.15.2 just to be in sync with iceberg 1.9.1? Let me know I can raise the PR for the same.

@kasakrisz
Copy link
Contributor Author

kasakrisz commented Jun 10, 2025

Should we upgrade parquet to 1.15.2 just to be in sync with iceberg 1.9.1? Let me know I can raise the PR for the same.

@Aggarwal-Raghav
Yes please go ahead.

kasakrisz added 25 commits June 10, 2025 15:29
… execution TestHiveIcebergRestrictDataFiles.testRestrictDataFiles
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants