Skip to content

Conversation

sagarlakshmipathy
Copy link
Contributor

Important Read

#548

What is the purpose of the pull request

  • Update Glue and Snowflake docs to show better catalog sync methods for iceberg tables

Brief change log

  • Updated Glue catalog doc
  • Updated Snowflake integration doc

Verify this pull request

  • Trivial docs work
  • Checked with npm start locally

@sagarlakshmipathy
Copy link
Contributor Author

@vinishjail97 can you review?

Copy link
Contributor

@ashvina ashvina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

#### Pre-requisites:

* Build Apache XTable™ (Incubating) from [source](https://github.com/apache/incubator-xtable)
* Download `iceberg-aws-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Clarification] Are AWS libraries required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you suggest keeping it cloud agnostic? I have only tried with AWS S3 for Snowflake. I'm not even sure what libraries would be needed for GCP and Azure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Snowflake, we don't need iceberg-aws, it contains integrations with glue, dynamodb etc.
https://github.com/apache/iceberg/tree/main/aws/src/integration/java/org/apache/iceberg/aws

I'm not even sure what libraries would be needed for GCP and Azure

For snowflake we need permissions (IAM for AWS, service account for GCP etc.) and external volume setup.
https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume#create-an-external-volume

XTable can already read from S3/GCS/Azure Blob/HDFS using the hadoop library dependencies.
https://github.com/apache/incubator-xtable/blob/main/pom.xml#L360

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For snowflake we need permissions (IAM for AWS, service account for GCP etc.) and external volume setup.

Please confirm if my understanding below is correct.
Iceberg supports various catalogs, including JDBC and REST. The Snowflake catalog appears to be JDBC-based [1]. Therefore, when connecting XTable to the Snowflake catalog and updating Iceberg tables, a Snowflake JDBC driver should be a dependency [2]. Iceberg’s JDBC catalog clients should not need Spark or AWS dependencies. However, if someone wants to follow this tutorial end-to-end, they may need Spark runtime and AWS libraries.

If this is correct, it would be helpful to separate the prereqs into two sections: one for what XTable needs and another for the tutorial prerequisites.

[1] https://www.snowflake.com/en/blog/iceberg-tables-catalog-support-available-now/
[2] https://iceberg.apache.org/docs/1.5.0/jdbc/

Copy link
Contributor

@vinishjail97 vinishjail97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sagarlakshmipathy Added comments.


**Pre-requisites:**
* Download iceberg-aws-X.X.X.jar from the [Maven repository](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws)
* Download bundle-X.X.X.jar from the [Maven repository](https://mvnrepository.com/artifact/software.amazon.awssdk/bundle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download AWS Java SDK bundle-X.X.X.jar ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

This is unclear from docs.

Comment on lines +64 to +66
* Download `bundle-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/software.amazon.awssdk/bundle)
* Download `iceberg-spark-runtime-3.X_2.12/X.X.X.jar` from [here](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.2_2.12/1.4.2/)
* Download `snowflake-jdbc-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include AWS Java SDK for aws bundle download.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants