-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Equality delete column constraints are not enforced #12971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello, just checking—are you planning to work on a PR for this yourself? |
Do we say in the spec you can’t use those types? I was checking and didn’t
see anything specific
…On Thu, May 8, 2025 at 8:24 AM JeonDaehong ***@***.***> wrote:
*JeonDaehong* left a comment (apache/iceberg#12971)
<#12971 (comment)>
Hello, just checking—are you planning to work on a PR for this yourself?
—
Reply to this email directly, view it on GitHub
<#12971 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADE2YPCPCEGBSUXQQLPLHT25NLIFAVCNFSM6AAAAAB4NQUM5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNRTGA2TMMBZGM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@RussellSpitzer I was referencing the equality-delete doc which says it has the same constraints as the identifier-fields with the exception that optional columns and fields nested under optional structs are allowed. The identifier-fields doc says that only primitive fields are allowed, and I assume this means they should of primitive type (struct is a nested type)? Do I understand the doc correctly? |
@JeonDaehong Yes I would like to give it a try. Thanks for checking! |
The problem is that the identifier fields rule (e.g. excluding FLOAT and DOUBLE types) is not enforced in your case? Yes, I would agree that it should be enforced, as floating numbers can exhibit differences across platforms and hence are not suitable for.
I am not sure we should add such enforcement to Parquet/Orc writer. It seems that In your case someone is producing the equality delete file with custom code using Iceberg Java SDK. Is the identifier fields set for the table schema? Validation above should fail such schema change. If a writer is not conforming to the spec, it is an implementation bug of the writer. |
I was cheering for the PR back then ! :D |
@stevenzwu If we directly use the file format writers to write the delete files, then yes I think
Or is it allowed when that Is it allowed to have an equality delete, e.g.
if column |
Ah interesting I take it you mean that when we use
But I am curious isn't this method validating the constraints for the identifier fields? From the equality-delete spec it says there are some exceptions right?
FWIW the case I am seeing is directly using the delete writer, e.g. you can see this usage for Trino testing; I was working on this Trino PR to allow equality deletes on columns of struct type (row type in Trino). |
Uh oh!
There was an error while loading. Please reload this page.
Apache Iceberg version
1.8.1
Query engine
None
Please describe the bug 🐞
From the equality-delete-files doc, only primitive types are allowed to be used as equality delete columns, excluding
FLOAT
andDOUBLE
types.We bumped into a use case where downstream is generating delete files directly with
DeleteWriteBuilder
and the delete column is not supported. This issue was only discovered when reading the table.It seems that DeleteWriteBuilder::buildEqualityWriter for all the formats should check on the constraints based on the schema and
equalityFieldIds
Willingness to contribute
The text was updated successfully, but these errors were encountered: