-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support S3 Table Buckets with S3TablesCatalog #1429
base: main
Are you sure you want to change the base?
Conversation
I was able to work around the issue above by using |
Thanks for working on this @felixscherz Feel free to tag me when its ready for review :) |
I think you can now review this PR if you have time @kevinjqliu :) I currently run tests by setting the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, i added a few comments to clarify the catalog behaviors
I'm a little hesitant to merge this in given that we have to run tests against a production S3 endpoint. Maybe we can mock the endpoint?
I ran the tests locally
And these 3 testa failed, everything else is ✅
|
Thank you for the review! I removed tests related to boto3 and set the AWS region explicitly for the test run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few more comments.
I was able to run the test locally
AWS_REGION=us-east-2 ARN=... poetry run pytest tests/catalog/test_s3tables.py
after making a few local changes
- poetry update boto3
- add
aws_region
fixture - pass aws_region to catalog
Could you update the PR description so others can test this PR out?
398e2d7
to
05e4dfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixscherz Thanks for the great contribution! Looking forward to adding this to PyIceberg! I left some comments. Please let me know what you think.
def commit_table( | ||
self, table: Table, requirements: Tuple[TableRequirement, ...], updates: Tuple[TableUpdate, ...] | ||
) -> CommitTableResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find the logic for cases when table not exist, which means create_table_transaction
will not be supported in the current version.
iceberg-python/pyiceberg/catalog/__init__.py
Lines 754 to 765 in e41c428
def create_table_transaction( | |
self, | |
identifier: Union[str, Identifier], | |
schema: Union[Schema, "pa.Schema"], | |
location: Optional[str] = None, | |
partition_spec: PartitionSpec = UNPARTITIONED_PARTITION_SPEC, | |
sort_order: SortOrder = UNSORTED_SORT_ORDER, | |
properties: Properties = EMPTY_DICT, | |
) -> CreateTableTransaction: | |
return CreateTableTransaction( | |
self._create_staged_table(identifier, schema, location, partition_spec, sort_order, properties) | |
) |
We do not have to support everything in the initial PR. But it will be good to override create_table_transaction
as "Not Implemented" for the s3tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added exceptions for this case for now along with a test. I will have a look at how to implement this properly
pyiceberg/catalog/s3tables.py
Outdated
try: | ||
self.s3tables.create_table( | ||
tableBucketARN=self.table_bucket_arn, namespace=namespace, name=table_name, format="ICEBERG" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anything goes wrong after this point, I think we should clean up the created s3 table by s3tables' delete_table
endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a try/except to delete the s3 table in case something goes wrong with writing the initial metadata.
can you run |
894cbc9
to
2e1c383
Compare
@felixscherz could you rebase this against main? i see that getmoto/moto/8470 is now merged, thanks for driving that! |
99e569b
to
54b8e87
Compare
I rebased onto the main. I prepared the unit tests using the new |
@kevinjqliu |
f30b7e6
to
804a468
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for adding both the unit test and integration test. And for driving downstream dependency to add support for S3Tables!! (getmoto/moto#8470)
I pushed a few changes to resolve merge conflicts. And I verified the integration test locally
AWS_REGION=us-east-2 AWS_TEST_S3_TABLE_BUCKET_ARN=arn:aws:s3tables:us-east-2:033327485438:bucket/s3-table poetry run pytest tests/catalog/integration_test_s3tables.py
@geruh would you have another look at this PR when you have time? |
hey @felixscherz thanks for following up on this PR. I haven't forgotten about this! :) I'll do another review soon |
I believe this PR is no longer relevant since AWS started support the REST Catalog: #1404 (comment) :) |
@felixscherz does this mean that nothing special is needed to use PyIceberg against an S3 Table bucket now? Thank you for your work on this -- our team has been waiting to use table buckets specifically until we could migrate some pyiceberg-based services. |
Yes! You use the Rest catalog and should be able to follow this guide: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-integrating-open-source.html. |
Hi, this is in regards to #1404.
I created a first draft of an
S3TablesCatalog
that uses the S3 Table Bucket API for catalog operations.How to run tests
Since moto does not support mocking the S3 Tables API yet (WIP: getmoto/moto#8470) we have to run tests against a live AWS account. To do that, create an S3 Tables Bucket in one of the supported regions and then set the table bucket ARN and AWS Region as environment variables