Skip to content

Support nested repeated protobuf non-message fields#4748

Open
thirtiseven wants to merge 7 commits into
NVIDIA:mainfrom
thirtiseven:protobuf-jni-3b5-3b6-nested-repeated
Open

Support nested repeated protobuf non-message fields#4748
thirtiseven wants to merge 7 commits into
NVIDIA:mainfrom
thirtiseven:protobuf-jni-3b5-3b6-nested-repeated

Conversation

@thirtiseven

Copy link
Copy Markdown
Collaborator

Description

This adds the JNI-3b.5/JNI-3b.6 slice for from_protobuf: repeated non-message fields inside nested protobuf messages.

The change:

  • adds count/scan kernels for repeated fields inside nested parent messages
  • builds LIST children for nested repeated scalar, string, bytes, and enum-as-string fields
  • keeps nested repeated MessageType unsupported for a later repeated-message PR
  • adds Java coverage for nested packed repeated ints, enum-as-string, strings, and bytes

This is part of NVIDIA/cudf-spark#14069.

Testing

  • git diff --check
  • JNI build/local protobuf tests passed locally

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the from_protobuf JNI decode path to support repeated non-message fields nested inside protobuf messages, adding the necessary nested count/scan kernels on the CUDA side and expanding Java tests to cover nested packed repeated scalars, enum-as-string, strings, and bytes.

Changes:

  • Add CUDA kernel launch APIs + implementations to count/scan repeated field occurrences inside nested parent message locations.
  • Update nested struct building to construct LIST children for nested repeated scalar/string/bytes/enum-as-string fields (while continuing to defer repeated nested MessageType).
  • Add Java test coverage for nested repeated decoding, including presence/absence semantics and wire-type mismatch handling.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/test/java/com/nvidia/spark/rapids/jni/ProtobufTest.java Adds nested repeated tests + a helper to assert LIST offsets for nested repeated outputs.
src/main/cpp/src/protobuf/protobuf_kernels.cuh Declares new host launch wrappers for nested repeated count/scan kernels.
src/main/cpp/src/protobuf/protobuf_kernels.cu Implements nested repeated count/scan kernels and updates nested repeated validation behavior.
src/main/cpp/src/protobuf/protobuf_host_helpers.hpp Adjusts nested repeated child builder declaration signature to match new implementation.
src/main/cpp/src/protobuf/protobuf_builders.cu Builds LIST children for nested repeated non-message fields using the new nested repeated kernels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/cpp/src/protobuf/protobuf_kernels.cu Outdated
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven marked this pull request as ready for review June 26, 2026 09:35
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds support for decoding repeated non-message protobuf fields that are nested inside parent struct messages, completing the JNI-3b.5/JNI-3b.6 slice. It introduces a two-pass count/scan GPU kernel pair that enumerates repeated-field occurrences within nested parent message boundaries, then builds LIST children for scalar, string, bytes, and enum-as-string repeated fields.

  • Adds count_repeated_in_nested_kernel / scan_repeated_in_nested_kernel with a shared repeated_wire_type_matches predicate; wire-type mismatches (non-packed) are now silently skipped rather than raising an error in strict mode (a deliberate trade-off discussed in prior review threads).
  • Introduces three lightweight view structs (protobuf_input_view, nested_parent_view, protobuf_decode_state) that replace the long flat argument lists of build_nested_struct_column / build_repeated_child_list_column, and adds six Java test methods covering packed ints, enum-as-string, strings, bytes, empty/absent parents, and deep nesting.

Confidence Score: 5/5

The new count/scan kernel pair is well-guarded: null parents are handled early, over- and under-write mismatches set the error flag, bounds are checked via CUDF_EXPECTS, and buffer ownership through list_offsets.release() is safe because all operations share the same CUDA stream.

All new GPU kernels correctly initialise output arrays, guard against null parent locations, and cross-check their count/scan passes through REPEATED_COUNT_MISMATCH detection. The list offset construction, null-mask derivation from parent locations, and lazy top_row_indices computation are logically sound. The one structural concern — one host-device sync per repeated field rather than a batched approach — is a future performance consideration, not a correctness issue.

protobuf_builders.cu deserves a second look when schema batching for nested repeated fields is considered in a follow-up PR, since build_repeated_child_list_column currently inserts a thrust::reduce host sync per field.

Important Files Changed

Filename Overview
src/main/cpp/src/protobuf/protobuf_builders.cu Adds make_list_offsets_from_counts, make_null_mask_from_parent_locations, make_single_repeated_field_schema, make_list_column_with_parent_nulls, and the main build_repeated_child_list_column. The num_rows>0
src/main/cpp/src/protobuf/protobuf_kernels.cu Adds count_repeated_in_nested_kernel, scan_repeated_in_nested_kernel, and repeated_wire_type_matches. Null parents are handled early, over/under-write mismatches set REPEATED_COUNT_MISMATCH, and message bounds are checked in both kernels.
src/main/cpp/src/protobuf/protobuf_host_helpers.hpp Introduces protobuf_input_view, nested_parent_view, and protobuf_decode_state view structs. Raw-pointer members in protobuf_decode_state are semantically non-owning mutable aliases (noted in prior review thread as intentional). Validate helpers provide clear pre-condition checking at each entry point.
src/main/cpp/src/protobuf/protobuf_kernels.cuh Adds launch_count_repeated_in_nested and launch_scan_repeated_in_nested declarations, and defines nested_repeated_location_provider which correctly computes data_offset as row_start + parent_offset + occ.offset.
src/main/cpp/src/protobuf/protobuf.cu Mechanical refactor of the build_nested_struct_column call site to use the new view types. Correct.
src/test/java/com/nvidia/spark/rapids/jni/ProtobufTest.java Adds six test methods covering packed ints, enum-as-string, strings, bytes, empty/absent parents, and deep nesting. Wrong-wire-type test correctly updated to reflect the new skip-not-error behaviour. assertListOffsets helper is a clean addition.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Host as build_repeated_child_list_column
    participant CK as count_repeated_in_nested_kernel (GPU)
    participant H as Host (reduce)
    participant SK as scan_repeated_in_nested_kernel (GPU)
    participant EK as extract/enum kernels (GPU)
    participant Col as cudf::make_lists_column

    Host->>CK: launch (parent.num_rows threads) fill d_rep_info[row].count
    CK-->>Host: async on stream
    Host->>H: thrust::transform d_rep_info to d_rep_counts then thrust::reduce to total_count_64 host sync
    H-->>Host: total_count int64 to int32 after check
    Host->>Host: make_list_offsets_from_counts exclusive_scan d_rep_counts plus fill sentinel
    Host->>SK: launch (parent.num_rows threads) fill d_occurrences write_idx to write_end
    SK-->>Host: async on stream
    Host->>EK: extract_typed_column or extract_varint or extract_and_build_string_or_bytes_column total_count threads nested_repeated_location_provider
    EK-->>Host: child_values column
    Host->>Col: make_list_column_with_parent_nulls offsets_col plus child_values plus parent null mask
    Col-->>Host: LIST column
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Host as build_repeated_child_list_column
    participant CK as count_repeated_in_nested_kernel (GPU)
    participant H as Host (reduce)
    participant SK as scan_repeated_in_nested_kernel (GPU)
    participant EK as extract/enum kernels (GPU)
    participant Col as cudf::make_lists_column

    Host->>CK: launch (parent.num_rows threads) fill d_rep_info[row].count
    CK-->>Host: async on stream
    Host->>H: thrust::transform d_rep_info to d_rep_counts then thrust::reduce to total_count_64 host sync
    H-->>Host: total_count int64 to int32 after check
    Host->>Host: make_list_offsets_from_counts exclusive_scan d_rep_counts plus fill sentinel
    Host->>SK: launch (parent.num_rows threads) fill d_occurrences write_idx to write_end
    SK-->>Host: async on stream
    Host->>EK: extract_typed_column or extract_varint or extract_and_build_string_or_bytes_column total_count threads nested_repeated_location_provider
    EK-->>Host: child_values column
    Host->>Col: make_list_column_with_parent_nulls offsets_col plus child_values plus parent null mask
    Col-->>Host: LIST column
Loading

Reviews (2): Last reviewed commit: "address comments" | Re-trigger Greptile

Comment thread src/main/cpp/src/protobuf/protobuf_kernels.cu
Comment thread src/main/cpp/src/protobuf/protobuf_host_helpers.hpp
Comment thread src/main/cpp/src/protobuf/protobuf_builders.cu
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven self-assigned this Jun 26, 2026
// message Outer { Inner inner = 1; }
byte[] packedIds = concatBytes(encodeVarint(10), encodeVarint(20), encodeVarint(30));
Byte[] inner = concat(
box(tag(1, WT_LEN)), box(encodeVarint(packedIds.length)), box(packedIds));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just encodeBytes, right? Also for all the other packed-payload sites in the new tests…

Suggested change
box(tag(1, WT_LEN)), box(encodeVarint(packedIds.length)), box(packedIds));
box(tag(1, WT_LEN)), encodeBytes(packedIds));

// message Outer { Inner inner = 1; }
Byte[][] rows = new Byte[][] {
concat(box(tag(1, WT_LEN)), encodeMessage(new Byte[]{})),
new Byte[] {}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Optional] Should we be consistent about the spaces in these? We have a bunch with no space between ] and {, and a bunch with. Let's pick one?

assertEquals(0, inner.getNullCount(), "Nested struct rows should stay present");
assertFalse(hostInner.isNull(0));
assertFalse(hostInner.isNull(1));
assertListOffsets(priorities, 0, 2, 4);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Really optional] I wonder if we should be using a higher-level comparison, something like assertStructColumnsAreEqual , or even assertColumnsAreEqual

struct protobuf_input_view {
uint8_t const* message_data;
cudf::size_type message_data_size;
cudf::size_type const* row_offsets;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplifying multi-argument functions by grouping arguments into structs is a good call, but row_offsets requires num_rows, so I would move the latter to protobuf_input_view to guarantee they are provided together. nested_parent_view would be impossible to use without protobuf_input_view, but it's never used without protobuf_input_view anyway.

Also, protobuf_input_view and protobuf_decode_state can now be reused in many other functions — are you leaving that cleanup for a follow-up?

struct protobuf_decode_state {
rmm::device_uvector<bool>* row_force_null;
rmm::device_uvector<protobuf_error>* error;
bool propagate_invalid_enum_rows;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the associated parameter defaulted to true, you could default this one to true as well — then you'd only need to initialize it when overriding to false

auto const expected_wire_type = get_expected_wire_type(f);
// Values come from the dedicated nested repeated path; here we only validate compatible
// occurrences so malformed payloads surface while wrong-wire occurrences remain unknown.
if (!repeated_wire_type_matches(wt, expected_wire_type)) { return true; }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Optional] This is the same check as at the start of walk_repeated_element, so this is just to prevent that from reporting protobuf_error::WIRE_TYPE, right? Is it possible to change walk_repeated_element to have a mode where it simply returns true instead of reporting an error? Especially since this appears more than once…

* schema slice whose repeated fields are already known to be direct children of the parent
* message location.
*/
CUDF_KERNEL void count_repeated_in_nested_kernel(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much overlap is there with count_repeated_fields_kernel?

This is the same overlap I flagged earlier in #4644; we can keep deferring, but the eventual refactor surface grows as a result…

count_action);
};

if (!scan_message_field_locations(msg_start,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Optional] This effectively ignores the return value, so should we just assign it to a dummy variable?

Comment on lines +1224 to +1225
auto const parent =
nested_parent_view{d_parent_locs.data(), d_parent_locs.size(), num_rows, nullptr};

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This local doesn't help you — you can inline it in the call (and lose the type)…

state.error->data(),
stream);

rmm::device_uvector<int32_t> d_rep_counts(parent.num_rows, stream, scratch_mr);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use make_transform_iterator instead (#4504 (comment))?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants