-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Validate encoded Thrift lists match the schema #9924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8b690d7
6590d51
64383c0
37f3772
bd95890
68cb310
e16ddc3
97edc48
9e51a54
9ce6ceb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -98,7 +98,7 @@ fn test_arrow_gh_41317() { | |
| let err = read_file("ARROW-GH-41317.parquet").unwrap_err(); | ||
| assert_eq!( | ||
| err.to_string(), | ||
| "External: Parquet argument error: Parquet error: StructArrayReader out of sync in read_records, expected 5 read, got 2" | ||
| "Parquet error: Expected list element type of I32 but got I16" | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When first added, this test expected
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this seems like a reasonable error mesage to me. The data is still rejected (and in this case earlier)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My concern is that the file is intended to test a completely different error. I wonder if it's worth going in with a hex editor to fix the bad list so the only error is the column mismatch. I think I'll file an issue on parquet-testing. |
||
| ); | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data for this test is
The
x01at the start decodes as a delta of 0 with a field type ofBooleanTrue. Because delta is 0, a varint is read to obtain the field id, which consumes the0and returns a field id of 0, which is then skipped as unknown. The50(hex0x32) encodes a delta of 3, with a field type ofBooleanFalse. Because ani64is expected, the82(hexx52) is consumed and returned as the value fornum_rows(field 3).65(hexx41) is delta 4 -> field 7, with a type ofBooleanTrue. Field 7 is a list of structs, so the73(hexx49) encodes 4 elements of typeList. With the fix in this PR, theListis compared to the expectedStructtype, and errors. Without the fix, encoding continues until a different error is detected.Checking the expected field types would detect this even earlier (when the 3rd byte is consumed), but that is left for future work.