-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyIceberg appending data creates snapshots incompatible with Athena/Spark #1424
Comments
hi @Samreay thanks for reporting this issue! Very odd that its 1+MAX_VALUE. I took a look at the write path and didn't see anything that stood put that would cause this issue. |
I'm also not seeing how this could happen, to test this, I also ran this script:
I think we should add some checks to ensure that the |
The above replicates the logic of iceberg-python/pyiceberg/table/metadata.py Lines 322 to 333 in b981780
|
I'll see if I can track down the snapshot metadata. I'm also not sure how this would happen, but we've been exclusively using pyiceberg to create, remove, and append data to our iceberg tables. Granted, the creation is using the glue catalog, so I suppose there's potential for Amazon to be muddying the waters here. |
I think the snapshot id is generated on the client side. So its possible only if glue is also committing the table. If you can share the metadata json, that would be helpful! |
Apache Iceberg version
0.8.0
Please describe the bug 🐞
We append data to our iceberg table using the
Table.overwrite
function, and this is saving out snapshots which have IDs that cannot be parsed by athena'sOPTIMIZE
command, or Sparks:Java's long max value is
9223372036854775807
PyIceberg (or something under the hood, it might not be pyiceberg) has created a snapshot with value
9223372036854775808
, literally 1+MAX_VALUEWillingness to contribute
The text was updated successfully, but these errors were encountered: