-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-48168: [C++][Parquet] Fix setting column-specific options when writing an encrypted Dataset #48170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This duplicates the Python test so we don't really need both.
cpp/src/parquet/properties.h
Outdated
| content_defined_chunking_options_( | ||
| properties.content_defined_chunking_options()) {} | ||
| properties.content_defined_chunking_options()) { | ||
| for (const auto& [col_path, col_props] : properties.column_properties_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that WriterProperties and WriterProperties::Builder use different forms to represent column properties. Is it possible to simplify this by using the same form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the builder needs to work this way so that different column-specific properties can be changed as well as changing default column properties without the order of operations affecting the result. I don't think the WriterProperties can be changed to use the Builder's representation without breaking the API as ColumnProperties is part of the public API and returned as a reference:
arrow/cpp/src/parquet/properties.h
Line 911 in e6ad2c4
| const ColumnProperties& column_properties( |
rok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, strange we don't currently do this. Just a minor change request.
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 39c865e. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Fixes creation of writer properties when writing a Parquet Dataset with encryption, so that column-specific settings aren't lost.
What changes are included in this PR?
Changes the
WriterProperties::Builderconstructor that uses existing properties so that it also sets any column-specific settings.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes, this fixes a user-facing bug.