Skip to content

Conversation

@Vishwanatha-HD
Copy link
Contributor

@Vishwanatha-HD Vishwanatha-HD commented Nov 21, 2025

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the column reader & writer logic. Column Reader & Writer are the main part of most of the parquet & arrow-parquet testcases.

What changes are included in this PR?

The fix includes changes to following files:
cpp/src/parquet/column_reader.cc
cpp/src/parquet/column_writer.cc
cpp/src/parquet/column_writer.h

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No

GitHub main Issue link: #48151

@github-actions
Copy link

⚠️ GitHub issue #48204 has been automatically assigned in GitHub to PR creator.

@kou kou changed the title GH-48204 Fix Column Reader & Writer logic to enable Parquet DB suppor… GH-48204: [C++][Parquet] Fix Column Reader & Writer logic to enable Parquet DB support on s390x Nov 22, 2025
Comment on lines 269 to +273
auto last_day_nanos = last_day_units * NanosecondsPerUnit;
#if ARROW_LITTLE_ENDIAN
// impala_timestamp will be unaligned every other entry so do memcpy instead
// of assign and reinterpret cast to avoid undefined behavior.
std::memcpy(impala_timestamp, &last_day_nanos, sizeof(int64_t));
#else
(*impala_timestamp).value[0] = static_cast<uint32_t>(last_day_nanos);
(*impala_timestamp).value[1] = static_cast<uint32_t>(last_day_nanos >> 32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the following instead of #if?

auto last_day_nanos = last_day_units * NanosecondsPerUnit;
auto last_day_nanos_little_endian = ::arrow::bit_util::ToLittleEndian(last_day_nanos);
std::memcpy(impala_timestamp, &last_day_nanos_little_endian, sizeof(int64_t));

Comment on lines 137 to 141
#if ARROW_LITTLE_ENDIAN
if (num_bytes < 0 || num_bytes > data_size - 4) {
#else
if (num_bytes < 0 || num_bytes > data_size) {
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou You added - 4 in #6848. Do you think that we need - 4 with big endian too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping @kou . I've re-read through this code and I now think the original change was a mistake. I'll submit a separate issue/PR to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vishwanatha-HD Can you rebase/merge from git main and remove this change?

@k8ika0s
Copy link

k8ika0s commented Nov 23, 2025

@Vishwanatha-HD

Working through this one, I’m reminded how many odd little corners show up when Arrow’s layout meets Parquet’s expectations — especially around levels, decimals, and the legacy INT96 bits.

Looking at the pieces that overlap with the work I’ve been doing, the overall direction makes sense. A few notes from what I’ve seen on real s390x hardware:

• BIT_PACKED level headers
Your patch keeps the data_size - 4 bound under ARROW_LITTLE_ENDIAN, whereas my tree leans on accepting the full BIT_PACKED buffer and logging failures rather than early-bounding. Neither approach is wrong, but on BE machines I’ve found that the “minus 4” guard sometimes rejects buffers that are actually fine, depending on how many values the upstream encoder produced.

• Decimal serialization
This is one of the trickier spots. Parquet expects decimals in a big-endian 128-bit payload, but Arrow materializes them in little-endian limbs even on BE hardware. In my implementation I reverse the Arrow words ([low, high][high, low]) before handing them to the writer so the final byte stream matches the canonical Parquet format.
Your patch uses ToBigEndian on each limb directly in host order, which works for many cases but can produce a differently ordered representation when Arrow’s in-memory layout doesn’t match the 128-bit big-endian wire format. Just sharing that in case you’ve seen similar behavior when mixing different decimal widths.

• Half-floats in FLBA
The BE path you added with ToLittleEndian(values[i]) aligns with the intent. I ended up staging the FLBA structs and the 2-byte payloads together in one scratch buffer, mostly because some downstream consumers treat the pointer lifetime very strictly. Either way, normalizing those 2-byte halves before page assembly helps avoid the cross-architecture drift I’ve run into.

• Paging / DoInBatches
Your rewrite to enforce max_rows_per_page is a meaningful cleanup. My patches didn’t touch this area, so no conflicts there — but just to mention it, keeping the paging logic predictable on BE made debugging the level stream quite a bit easier for me.

• INT96 (Impala timestamp)
Your implementation writes host-order limbs on BE and memcpy on LE. In my case I leaned heavily on always emitting LE limbs so the decode path doesn’t have to branch on architecture. Both approaches work as long as the corresponding reader expects the same convention.

None of this is blocking — just trying to pass along the details I’ve seen crop up when running the full parquet-encode → parquet-decode cycle on big-endian hardware.

if constexpr (std::is_same_v<ArrowType, ::arrow::Decimal64Type>) {
*p++ = ::arrow::bit_util::ToBigEndian(u64_in[0]);
} else if constexpr (std::is_same_v<ArrowType, ::arrow::Decimal128Type>) {
#if ARROW_LITTLE_ENDIAN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a step back and read the comments above:

// Requires a custom serializer because decimal in parquet are in big-endian
// format. Thus, a temporary local buffer is required.

If we're on a big-endian system, this entire code is unnecessary and we can just use the FIXED_LEN_BYTE_ARRAY SerializeFunctor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou.. Thanks for your review comments.. I will probably work on this change in the next pass. Thanks..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vishwanatha-HD Please don't resolve discussions until they are actually resolved. This one hasn't been addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou.. Ok.. sure.. thanks..

@Vishwanatha-HD Please don't resolve discussions until they are actually resolved. This one hasn't been addressed.

Copy link
Contributor Author

@Vishwanatha-HD Vishwanatha-HD Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou.. I have rebased this to git main and removed the below piece of code..

-#if ARROW_LITTLE_ENDIAN
-      if (num_bytes < 0 || num_bytes > data_size - 4) {
-#else
       if (num_bytes < 0 || num_bytes > data_size) {            ----------->>>>> Only retaining this line now.. 
-#endif

@github-actions github-actions bot added awaiting committer review Awaiting committer review awaiting changes Awaiting changes and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Nov 24, 2025
@pitrou
Copy link
Member

pitrou commented Nov 24, 2025

I'm frankly surprised that so few changes are required, given that Parquet C++ was never successfully tested on BE systems before. @Vishwanatha-HD Did you try to read the files in https://github.com/apache/parquet-testing/tree/master/data and check the contents were properly decoded?

@Vishwanatha-HD
Copy link
Contributor Author

I'm frankly surprised that so few changes are required, given that Parquet C++ was never successfully tested on BE systems before. @Vishwanatha-HD Did you try to read the files in https://github.com/apache/parquet-testing/tree/master/data and check the contents were properly decoded?

Hi @pitrou..
thanks for your comments.. Please note that this is not just the changes that are required to enable support on s390x.. There are other 12 PRs that I have raised.. The main issue link is: #48151
Please check that and you will get the links to all other remaining PRs.

@pitrou
Copy link
Member

pitrou commented Nov 24, 2025

Thanks @Vishwanatha-HD , and thanks for splitting up like this.

@Vishwanatha-HD
Copy link
Contributor Author

Thanks @Vishwanatha-HD , and thanks for splitting up like this.

@pitrou.. Sure.. my pleasure.. Thanks alot for spending so much time and reviewing the code change and give your comments.. I appreciate it.. !!

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 24, 2025
Copy link
Contributor Author

@Vishwanatha-HD Vishwanatha-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed all the review comments.. Thanks..

@Vishwanatha-HD Vishwanatha-HD force-pushed the fixColumnReaderWriter branch 2 times, most recently from 3e0b644 to f05e139 Compare November 28, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants