Support nested repeated protobuf non-message fields#4748
Conversation
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the from_protobuf JNI decode path to support repeated non-message fields nested inside protobuf messages, adding the necessary nested count/scan kernels on the CUDA side and expanding Java tests to cover nested packed repeated scalars, enum-as-string, strings, and bytes.
Changes:
- Add CUDA kernel launch APIs + implementations to count/scan repeated field occurrences inside nested parent message locations.
- Update nested struct building to construct LIST children for nested repeated scalar/string/bytes/enum-as-string fields (while continuing to defer repeated nested
MessageType). - Add Java test coverage for nested repeated decoding, including presence/absence semantics and wire-type mismatch handling.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/test/java/com/nvidia/spark/rapids/jni/ProtobufTest.java | Adds nested repeated tests + a helper to assert LIST offsets for nested repeated outputs. |
| src/main/cpp/src/protobuf/protobuf_kernels.cuh | Declares new host launch wrappers for nested repeated count/scan kernels. |
| src/main/cpp/src/protobuf/protobuf_kernels.cu | Implements nested repeated count/scan kernels and updates nested repeated validation behavior. |
| src/main/cpp/src/protobuf/protobuf_host_helpers.hpp | Adjusts nested repeated child builder declaration signature to match new implementation. |
| src/main/cpp/src/protobuf/protobuf_builders.cu | Builds LIST children for nested repeated non-message fields using the new nested repeated kernels. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Greptile SummaryThis PR adds support for decoding repeated non-message protobuf fields that are nested inside parent struct messages, completing the JNI-3b.5/JNI-3b.6 slice. It introduces a two-pass count/scan GPU kernel pair that enumerates repeated-field occurrences within nested parent message boundaries, then builds LIST children for scalar, string, bytes, and enum-as-string repeated fields.
Confidence Score: 5/5The new count/scan kernel pair is well-guarded: null parents are handled early, over- and under-write mismatches set the error flag, bounds are checked via CUDF_EXPECTS, and buffer ownership through list_offsets.release() is safe because all operations share the same CUDA stream. All new GPU kernels correctly initialise output arrays, guard against null parent locations, and cross-check their count/scan passes through REPEATED_COUNT_MISMATCH detection. The list offset construction, null-mask derivation from parent locations, and lazy top_row_indices computation are logically sound. The one structural concern — one host-device sync per repeated field rather than a batched approach — is a future performance consideration, not a correctness issue. protobuf_builders.cu deserves a second look when schema batching for nested repeated fields is considered in a follow-up PR, since build_repeated_child_list_column currently inserts a thrust::reduce host sync per field. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Host as build_repeated_child_list_column
participant CK as count_repeated_in_nested_kernel (GPU)
participant H as Host (reduce)
participant SK as scan_repeated_in_nested_kernel (GPU)
participant EK as extract/enum kernels (GPU)
participant Col as cudf::make_lists_column
Host->>CK: launch (parent.num_rows threads) fill d_rep_info[row].count
CK-->>Host: async on stream
Host->>H: thrust::transform d_rep_info to d_rep_counts then thrust::reduce to total_count_64 host sync
H-->>Host: total_count int64 to int32 after check
Host->>Host: make_list_offsets_from_counts exclusive_scan d_rep_counts plus fill sentinel
Host->>SK: launch (parent.num_rows threads) fill d_occurrences write_idx to write_end
SK-->>Host: async on stream
Host->>EK: extract_typed_column or extract_varint or extract_and_build_string_or_bytes_column total_count threads nested_repeated_location_provider
EK-->>Host: child_values column
Host->>Col: make_list_column_with_parent_nulls offsets_col plus child_values plus parent null mask
Col-->>Host: LIST column
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Host as build_repeated_child_list_column
participant CK as count_repeated_in_nested_kernel (GPU)
participant H as Host (reduce)
participant SK as scan_repeated_in_nested_kernel (GPU)
participant EK as extract/enum kernels (GPU)
participant Col as cudf::make_lists_column
Host->>CK: launch (parent.num_rows threads) fill d_rep_info[row].count
CK-->>Host: async on stream
Host->>H: thrust::transform d_rep_info to d_rep_counts then thrust::reduce to total_count_64 host sync
H-->>Host: total_count int64 to int32 after check
Host->>Host: make_list_offsets_from_counts exclusive_scan d_rep_counts plus fill sentinel
Host->>SK: launch (parent.num_rows threads) fill d_occurrences write_idx to write_end
SK-->>Host: async on stream
Host->>EK: extract_typed_column or extract_varint or extract_and_build_string_or_bytes_column total_count threads nested_repeated_location_provider
EK-->>Host: child_values column
Host->>Col: make_list_column_with_parent_nulls offsets_col plus child_values plus parent null mask
Col-->>Host: LIST column
Reviews (2): Last reviewed commit: "address comments" | Re-trigger Greptile |
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
| // message Outer { Inner inner = 1; } | ||
| byte[] packedIds = concatBytes(encodeVarint(10), encodeVarint(20), encodeVarint(30)); | ||
| Byte[] inner = concat( | ||
| box(tag(1, WT_LEN)), box(encodeVarint(packedIds.length)), box(packedIds)); |
There was a problem hiding this comment.
This is just encodeBytes, right? Also for all the other packed-payload sites in the new tests…
| box(tag(1, WT_LEN)), box(encodeVarint(packedIds.length)), box(packedIds)); | |
| box(tag(1, WT_LEN)), encodeBytes(packedIds)); |
| // message Outer { Inner inner = 1; } | ||
| Byte[][] rows = new Byte[][] { | ||
| concat(box(tag(1, WT_LEN)), encodeMessage(new Byte[]{})), | ||
| new Byte[] {} |
There was a problem hiding this comment.
[Optional] Should we be consistent about the spaces in these? We have a bunch with no space between ] and {, and a bunch with. Let's pick one?
| assertEquals(0, inner.getNullCount(), "Nested struct rows should stay present"); | ||
| assertFalse(hostInner.isNull(0)); | ||
| assertFalse(hostInner.isNull(1)); | ||
| assertListOffsets(priorities, 0, 2, 4); |
There was a problem hiding this comment.
[Really optional] I wonder if we should be using a higher-level comparison, something like assertStructColumnsAreEqual , or even assertColumnsAreEqual…
| struct protobuf_input_view { | ||
| uint8_t const* message_data; | ||
| cudf::size_type message_data_size; | ||
| cudf::size_type const* row_offsets; |
There was a problem hiding this comment.
Simplifying multi-argument functions by grouping arguments into structs is a good call, but row_offsets requires num_rows, so I would move the latter to protobuf_input_view to guarantee they are provided together. nested_parent_view would be impossible to use without protobuf_input_view, but it's never used without protobuf_input_view anyway.
Also, protobuf_input_view and protobuf_decode_state can now be reused in many other functions — are you leaving that cleanup for a follow-up?
| struct protobuf_decode_state { | ||
| rmm::device_uvector<bool>* row_force_null; | ||
| rmm::device_uvector<protobuf_error>* error; | ||
| bool propagate_invalid_enum_rows; |
There was a problem hiding this comment.
Since the associated parameter defaulted to true, you could default this one to true as well — then you'd only need to initialize it when overriding to false…
| auto const expected_wire_type = get_expected_wire_type(f); | ||
| // Values come from the dedicated nested repeated path; here we only validate compatible | ||
| // occurrences so malformed payloads surface while wrong-wire occurrences remain unknown. | ||
| if (!repeated_wire_type_matches(wt, expected_wire_type)) { return true; } |
There was a problem hiding this comment.
[Optional] This is the same check as at the start of walk_repeated_element, so this is just to prevent that from reporting protobuf_error::WIRE_TYPE, right? Is it possible to change walk_repeated_element to have a mode where it simply returns true instead of reporting an error? Especially since this appears more than once…
| * schema slice whose repeated fields are already known to be direct children of the parent | ||
| * message location. | ||
| */ | ||
| CUDF_KERNEL void count_repeated_in_nested_kernel( |
There was a problem hiding this comment.
How much overlap is there with count_repeated_fields_kernel?
This is the same overlap I flagged earlier in #4644; we can keep deferring, but the eventual refactor surface grows as a result…
| count_action); | ||
| }; | ||
|
|
||
| if (!scan_message_field_locations(msg_start, |
There was a problem hiding this comment.
[Optional] This effectively ignores the return value, so should we just assign it to a dummy variable?
| auto const parent = | ||
| nested_parent_view{d_parent_locs.data(), d_parent_locs.size(), num_rows, nullptr}; |
There was a problem hiding this comment.
This local doesn't help you — you can inline it in the call (and lose the type)…
| state.error->data(), | ||
| stream); | ||
|
|
||
| rmm::device_uvector<int32_t> d_rep_counts(parent.num_rows, stream, scratch_mr); |
There was a problem hiding this comment.
Can we use make_transform_iterator instead (#4504 (comment))?
Description
This adds the JNI-3b.5/JNI-3b.6 slice for
from_protobuf: repeated non-message fields inside nested protobuf messages.The change:
MessageTypeunsupported for a later repeated-message PRThis is part of NVIDIA/cudf-spark#14069.
Testing
git diff --check