Skip to content

Conversation

@sclmn
Copy link

@sclmn sclmn commented Nov 19, 2025

Currently, the code panics which prevents handling parquet files with 0-length dictionary and null values in the column

Currently, the code panics which prevents handling parquet files with 0-length dictionary and null values in the column
@github-actions github-actions bot added the parquet Changes to the parquet crate label Nov 19, 2025
@etseidl
Copy link
Contributor

etseidl commented Nov 20, 2025

Thanks @sclmn. Do you have a reproducer for this? I tried running your added test without your changes, and the result is not a panic, but the expected Err. Is something else panicking because it doesn't handle the error correctly?

@sclmn
Copy link
Author

sclmn commented Nov 20, 2025

Sorry, I didn't realize this got submitted.

The ideal fix is not to return an error. For a column chunk with just nulls a dictionary with 0 length seems fine.

@etseidl
Copy link
Contributor

etseidl commented Nov 20, 2025

For a column chunk with just nulls a dictionary with 0 length seems fine.

But the Parquet spec mandates that at least the bitwidth is written to the data page. https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8

0 would be a valid bitwidth, followed by no encoded data.

@sclmn
Copy link
Author

sclmn commented Nov 20, 2025

Thanks, that makes sense.

The C++ version allows this and some writers produce files like this.

@sclmn sclmn closed this Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants