GH-48198: [C++][Parquet] Fix all the testcase issues to enable Parque… #48200

Vishwanatha-HD · 2025-11-21T12:11:36Z

…t DB support on s390x

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes all the remaining testcase issues. The fix mainly involves the byte swapping in order to take care of endianness issues.

What changes are included in this PR?

The fix includes changes to following testcase files:
cpp/src/arrow/dataset/file_parquet_test.cc
cpp/src/arrow/util/byte_stream_split_test.cc
cpp/src/parquet/arrow/arrow_reader_writer_test.cc
cpp/src/parquet/column_writer_test.cc
cpp/src/parquet/encoding_test.cc
cpp/src/parquet/level_conversion_test.cc
cpp/src/parquet/metadata_test.cc
cpp/src/parquet/reader_test.cc
cpp/src/parquet/statistics_test.cc
cpp/src/parquet/types_test.cc

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No

GitHub main Issue link: #48151

GitHub Issue: [C++][Parquet] Fix all the Testcase issues to enable Parquet DB support on s390x #48198

github-actions · 2025-11-21T12:12:01Z

⚠️ GitHub issue #48198 has been automatically assigned in GitHub to PR creator.

…Parquet DB support on s390x

k8ika0s · 2025-11-23T22:47:31Z

@Vishwanatha-HD

Looking over this batch of tests, a few patterns in how the expected bytes are formed caught my eye. Nothing alarming — just places where BE ends up taking a slightly different journey than LE depending on where the normalization happens.

• Half-floats: the assertions use the raw bits from the array, so BE naturally ends up comparing against host-order patterns unless those values get flipped before they reach the check.

• INT96 and other scalar stats: some expected strings are built straight from native-order limbs, which means the two architectures diverge a bit unless they’re pushed through one consistent conversion point.

• ByteStreamSplit: since the inputs to the encoder come in as native-order integers, the resulting split streams follow whatever the host layout is unless they’re normalized up front.

• Page-index and stats checks: a few comparisons still assume host-order for floats/doubles, while others already expect swapped limbs.

Individually these are all reasonable, but they do lead to small differences on BE because the tests anchor themselves at slightly different spots in the pipeline. Pulling the expectations toward one shared byte layout in a couple of places would likely smooth that out across hosts.

Vishwanatha-HD · 2025-11-24T13:28:03Z

@Vishwanatha-HD

Looking over this batch of tests, a few patterns in how the expected bytes are formed caught my eye. Nothing alarming — just places where BE ends up taking a slightly different journey than LE depending on where the normalization happens.

• Half-floats: the assertions use the raw bits from the array, so BE naturally ends up comparing against host-order patterns unless those values get flipped before they reach the check.

• INT96 and other scalar stats: some expected strings are built straight from native-order limbs, which means the two architectures diverge a bit unless they’re pushed through one consistent conversion point.

• ByteStreamSplit: since the inputs to the encoder come in as native-order integers, the resulting split streams follow whatever the host layout is unless they’re normalized up front.

• Page-index and stats checks: a few comparisons still assume host-order for floats/doubles, while others already expect swapped limbs.

Individually these are all reasonable, but they do lead to small differences on BE because the tests anchor themselves at slightly different spots in the pipeline. Pulling the expectations toward one shared byte layout in a couple of places would likely smooth that out across hosts.

@k8ika0s.. Thanks for your review comments.. But please note that we are already taking care of the conversion required in encoders and decoders.. Hence we need to take care of the corresponding testcases accordingly..
I have tested with my changes on the s390x systems and also on Openshift AI workloads.. It works properly.. Hence there is no concern with these changes..

Vishwanatha-HD requested a review from wgtmac as a code owner November 21, 2025 12:11

github-actions bot added Component: Parquet Component: C++ awaiting review Awaiting review labels Nov 21, 2025

This was referenced Nov 21, 2025

[C++][Parquet] Fix all the Testcase issues to enable Parquet DB support on s390x #48198

Open

[C++][Parquet] Enable Parquet DB support on Big Endian (IBM Z) systems #48151

Open

apacheGH-48198: [C++][Parquet] Fix all the testcase issues to enable …

21c045e

…Parquet DB support on s390x

Vishwanatha-HD force-pushed the fixTestCaseIssues branch from f0370d0 to 21c045e Compare November 22, 2025 04:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH-48198: [C++][Parquet] Fix all the testcase issues to enable Parque… #48200

GH-48198: [C++][Parquet] Fix all the testcase issues to enable Parque… #48200

Uh oh!

Vishwanatha-HD commented Nov 21, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

k8ika0s commented Nov 23, 2025

Uh oh!

Vishwanatha-HD commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GH-48198: [C++][Parquet] Fix all the testcase issues to enable Parque… #48200

Are you sure you want to change the base?

GH-48198: [C++][Parquet] Fix all the testcase issues to enable Parque… #48200

Uh oh!

Conversation

Vishwanatha-HD commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

k8ika0s commented Nov 23, 2025

Uh oh!

Vishwanatha-HD commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Vishwanatha-HD commented Nov 21, 2025 •

edited

Loading