Skip to content

Conversation

@Vishwanatha-HD
Copy link
Contributor

@Vishwanatha-HD Vishwanatha-HD commented Nov 21, 2025

…t DB support on s390x

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes all the remaining testcase issues. The fix mainly involves the byte swapping in order to take care of endianness issues.

What changes are included in this PR?

The fix includes changes to following testcase files:
cpp/src/arrow/dataset/file_parquet_test.cc
cpp/src/arrow/util/byte_stream_split_test.cc
cpp/src/parquet/arrow/arrow_reader_writer_test.cc
cpp/src/parquet/column_writer_test.cc
cpp/src/parquet/encoding_test.cc
cpp/src/parquet/level_conversion_test.cc
cpp/src/parquet/metadata_test.cc
cpp/src/parquet/reader_test.cc
cpp/src/parquet/statistics_test.cc
cpp/src/parquet/types_test.cc

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No

GitHub main Issue link: #48151

@github-actions
Copy link

⚠️ GitHub issue #48198 has been automatically assigned in GitHub to PR creator.

@k8ika0s
Copy link

k8ika0s commented Nov 23, 2025

@Vishwanatha-HD

Looking over this batch of tests, a few patterns in how the expected bytes are formed caught my eye. Nothing alarming — just places where BE ends up taking a slightly different journey than LE depending on where the normalization happens.

• Half-floats: the assertions use the raw bits from the array, so BE naturally ends up comparing against host-order patterns unless those values get flipped before they reach the check.

• INT96 and other scalar stats: some expected strings are built straight from native-order limbs, which means the two architectures diverge a bit unless they’re pushed through one consistent conversion point.

• ByteStreamSplit: since the inputs to the encoder come in as native-order integers, the resulting split streams follow whatever the host layout is unless they’re normalized up front.

• Page-index and stats checks: a few comparisons still assume host-order for floats/doubles, while others already expect swapped limbs.

Individually these are all reasonable, but they do lead to small differences on BE because the tests anchor themselves at slightly different spots in the pipeline. Pulling the expectations toward one shared byte layout in a couple of places would likely smooth that out across hosts.

@Vishwanatha-HD
Copy link
Contributor Author

@Vishwanatha-HD

Looking over this batch of tests, a few patterns in how the expected bytes are formed caught my eye. Nothing alarming — just places where BE ends up taking a slightly different journey than LE depending on where the normalization happens.

• Half-floats: the assertions use the raw bits from the array, so BE naturally ends up comparing against host-order patterns unless those values get flipped before they reach the check.

• INT96 and other scalar stats: some expected strings are built straight from native-order limbs, which means the two architectures diverge a bit unless they’re pushed through one consistent conversion point.

• ByteStreamSplit: since the inputs to the encoder come in as native-order integers, the resulting split streams follow whatever the host layout is unless they’re normalized up front.

• Page-index and stats checks: a few comparisons still assume host-order for floats/doubles, while others already expect swapped limbs.

Individually these are all reasonable, but they do lead to small differences on BE because the tests anchor themselves at slightly different spots in the pipeline. Pulling the expectations toward one shared byte layout in a couple of places would likely smooth that out across hosts.

@k8ika0s.. Thanks for your review comments.. But please note that we are already taking care of the conversion required in encoders and decoders.. Hence we need to take care of the corresponding testcases accordingly..
I have tested with my changes on the s390x systems and also on Openshift AI workloads.. It works properly.. Hence there is no concern with these changes..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants