Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions cpp/src/arrow/util/bit_stream_utils_internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,9 @@ inline bool BitReader::GetVlqInt(Int* v) {
// In all case, we read a byte-aligned value, skipping remaining bits
const uint8_t* data = NULLPTR;
int max_size = 0;
#if ARROW_LITTLE_ENDIAN
// The data that we will pass to the LEB128 parser.
// In all case, we read a byte-aligned value, skipping remaining bits.

// Number of bytes left in the buffered values, not including the current
// byte (i.e., there may be an additional fraction of a byte).
Expand All @@ -381,6 +384,17 @@ inline bool BitReader::GetVlqInt(Int* v) {
max_size = bytes_left();
data = buffer_ + (max_bytes_ - max_size);
}
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @k8ika0s that this is adding a bunch of BE-specific code where we could on the contrary simplify the existing code, since the "cache" isn't really useful here: the decoding step is the same nevertheless.

I'm experimenting with this simplification in this PR: #48237 , mostly to run benchmarks as the code should be correct anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou.. I modified the s390x code paths to as mentioned above i.e.
max_size = bytes_left();
data = buffer_ + (max_bytes_ - max_size);

Unfortunately, things didnt work.. Itseems like we are modfifying the max_size and data in a similar way as on LE systems, but things are different on s390x.. Hence, I am unable to make this change.. Thanks..

// For VLQ reading, always read directly from buffer to avoid endianness issues
// with buffered_values_ on big-endian systems like s390x.
// Calculate current position in buffer accounting for bit offset.
const int current_byte_offset = byte_offset_ + bit_util::BytesForBits(bit_offset_);
const int bytes_left_in_buffer = max_bytes_ - current_byte_offset;

// Always read from buffer directly to avoid endianness issues
data = buffer_ + current_byte_offset;
max_size = bytes_left_in_buffer;
#endif

const auto bytes_read = bit_util::ParseLeadingLEB128(data, max_size, v);
if (ARROW_PREDICT_FALSE(bytes_read == 0)) {
Expand Down
Loading