Skip to content

Conversation

@Vishwanatha-HD
Copy link
Contributor

@Vishwanatha-HD Vishwanatha-HD commented Nov 21, 2025

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the "util & level_conversion" logic.

What changes are included in this PR?

The fix includes changes to following files:
cpp/src/parquet/level_conversion_inc.h
cpp/src/parquet/test_util.h

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No.

GitHub main Issue link: #48151

@github-actions
Copy link

⚠️ GitHub issue #48218 has been automatically assigned in GitHub to PR creator.

@Vishwanatha-HD Vishwanatha-HD force-pushed the fixUtilLevelConvrsnInfra branch from c668e32 to b3539a4 Compare November 22, 2025 05:06
@kou kou changed the title GH-48218 Fix Util & Level Conversion logic to enable Parquet DB suppo… GH-48218: [C++][Parquet] Fix Util & Level Conversion logic to enable Parquet DB support on s390x Nov 22, 2025
@k8ika0s
Copy link

k8ika0s commented Nov 23, 2025

@Vishwanatha-HD

Something I’ve bumped into on s390x is that the swap decision sometimes needs to follow the encoded byte-order marker more closely than the compile-time ARROW_LITTLE_ENDIAN macro. A couple of the downstream readers (especially in geospatial stats) assume “LE on disk” is its own concept, even if the host is BE. So I’m curious how this change behaves when the host is big-endian but the consuming path is trying to normalize everything toward a canonical LE interpretation.

On the level-bitmap side, I’ve occasionally seen the bitmap comparison logic behave differently depending on whether the input originated on a BE machine or came directly from Arrow buffers that were already LE-canonical. Wrapping GreaterThanBitmap with a swap might be exactly what’s needed here — I’m mostly wondering how consistently the upstream paths feed this comparison LE vs. host-order data.

Not flagging this as wrong; just passing along some of the odd little behaviors I’ve seen on the BE side of things. Happy to help run this through a few combinations on actual s390x hardware if that would be useful.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@pitrou pitrou changed the title GH-48218: [C++][Parquet] Fix Util & Level Conversion logic to enable Parquet DB support on s390x GH-48218: [C++][Parquet] Fix Util & Level Conversion logic on big-endian Nov 24, 2025
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Nov 24, 2025
@pitrou pitrou merged commit 1044022 into apache:main Nov 24, 2025
49 of 53 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Nov 24, 2025
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 2 benchmarking runs that have been run so far on merge-commit 1044022.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants