Skip to content

Add JNI Variant extraction support#4756

Open
nartal1 wants to merge 5 commits into
NVIDIA:mainfrom
nartal1:variant-support-jni
Open

Add JNI Variant extraction support#4756
nartal1 wants to merge 5 commits into
NVIDIA:mainfrom
nartal1:variant-support-jni

Conversation

@nartal1

@nartal1 nartal1 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Description

Adds VariantUtils Java/JNI bindings for cuDF's experimental Parquet Variant extraction APIs:

  • get_variant_field → extract raw Variant LIST<UINT8> value bytes for a path
  • cast_variant → decode raw Variant value bytes to a supported cuDF type
  • extract_variant_field → extract and decode in one native call

The Spark RAPIDS integration will be added in a follow-up PR after this JNI API lands.

The wrapper currently supports the cuDF path grammar for object fields, e.g. x, $.x, and $.a.b. Target types STRING, INT8, INT16, INT32, and INT64.

  • Input is expected to be cuDF's Variant materialization: STRUCT(metadata LIST<UINT8>, value LIST<UINT8>, ...).
  • Java validates null arguments and unsupported target types before crossing JNI where applicable.
  • VariantUtils.isAvailable() lets Spark RAPIDS check whether the loaded JNI library includes the Variant extraction symbols.

Validation

Added a new test suite VariantUtilsTest for:

  • string and integer extraction
  • nested paths and $-prefixed paths
  • two-step getVariantFieldValue + castVariantValue
  • empty inputs
  • null parent rows
  • null, empty, and malformed paths
  • unsupported target types

Tests run: Success:17, Failures: 0, Errors: 0, Skipped: 0

@nartal1 nartal1 self-assigned this Jun 26, 2026
@nartal1 nartal1 force-pushed the variant-support-jni branch from ec12d3c to 5df586d Compare June 26, 2026 00:18
@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces VariantUtils, a new JNI bridge exposing cuDF's experimental Parquet Variant extraction APIs (get_variant_field, cast_variant, extract_variant_field) to Java/Spark RAPIDS. It adds the C++ JNI implementation, the Java wrapper class, and a comprehensive test suite.

  • VariantUtils.java exposes three public methods with Java-side null and type validation before crossing the JNI boundary; an isAvailable() capability probe is included for optional feature detection.
  • VariantUtilsJni.cpp implements the four native entry points following established JNI conventions (JNI_NULL_CHECK, JNI_TRY/JNI_CATCH, auto_set_device, release_as_jlong).
  • VariantUtilsTest.java covers 13 test cases including string/integer extraction, nested paths, two-step get+cast, empty inputs, null structs, null/malformed paths, and unsupported type rejection.

Confidence Score: 5/5

This PR is safe to merge. It introduces new JNI bindings with no changes to existing code paths.

All three public methods have correct Java-side null guards, validateTargetType uses List.contains() which delegates to DType.equals() (confirmed overridden in cudf-java), the JNI implementation follows established conventions, and 13 tests cover the primary and edge-case paths. No regressions are possible since nothing existing is modified.

No files require special attention.

Important Files Changed

Filename Overview
src/main/java/com/nvidia/spark/rapids/jni/VariantUtils.java New JNI wrapper with correct null-checks on all public methods; validateTargetType uses List.contains() (value equality via DType.equals()). The SUPPORTED_TYPES list is small (5 elements), making the O(n) scan negligible in practice.
src/main/cpp/src/VariantUtilsJni.cpp Follows established JNI conventions for null-checks, device setup, stream/memory-resource passing, and ownership transfer via release_as_jlong. No issues found.
src/test/java/com/nvidia/spark/rapids/jni/VariantUtilsTest.java Broad coverage across 13 test cases; one test assertion (nullCastArgumentsThrow: castVariantValue(null, DType.FLOAT64) -> NPE) implicitly depends on the order that null and type checks execute in the implementation.
src/main/cpp/CMakeLists.txt Adds VariantUtilsJni.cpp to the library source list in alphabetical order. No issues.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Caller as Spark RAPIDS Caller
    participant Java as VariantUtils.java
    participant JNI as VariantUtilsJni.cpp
    participant cuDF as cuDF Experimental API

    alt extractVariantField (one-shot)
        Caller->>Java: extractVariantField(variantStruct, path, targetType)
        Java->>Java: requireNonNull(variantStruct, path)
        Java->>Java: validateTargetType(targetType)
        Java->>JNI: extractVariantField(handle, path, typeId)
        JNI->>JNI: JNI_NULL_CHECK(handle, path)
        JNI->>cuDF: extract_variant_field(col, path, data_type)
        cuDF-->>JNI: unique_ptr column
        JNI-->>Java: jlong handle
        Java-->>Caller: ColumnVector
    end

    alt two-step get + cast
        Caller->>Java: getVariantFieldValue(variantStruct, path)
        Java->>JNI: getVariantFieldValue(handle, path)
        JNI->>cuDF: get_variant_field(col, path)
        cuDF-->>JNI: unique_ptr column LIST UINT8
        JNI-->>Java: jlong handle
        Java-->>Caller: ColumnVector (raw bytes)

        Caller->>Java: castVariantValue(valueBytes, targetType)
        Java->>Java: validateTargetType(targetType)
        Java->>JNI: castVariantValue(handle, typeId)
        JNI->>cuDF: cast_variant(col, data_type)
        cuDF-->>JNI: unique_ptr column
        JNI-->>Java: jlong handle
        Java-->>Caller: ColumnVector (decoded)
    end

    alt isAvailable check
        Caller->>Java: isAvailable()
        Java->>JNI: isAvailableNative()
        JNI-->>Java: JNI_TRUE
        Java-->>Caller: true
        Note over Java: UnsatisfiedLinkError -> false (old library)
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Caller as Spark RAPIDS Caller
    participant Java as VariantUtils.java
    participant JNI as VariantUtilsJni.cpp
    participant cuDF as cuDF Experimental API

    alt extractVariantField (one-shot)
        Caller->>Java: extractVariantField(variantStruct, path, targetType)
        Java->>Java: requireNonNull(variantStruct, path)
        Java->>Java: validateTargetType(targetType)
        Java->>JNI: extractVariantField(handle, path, typeId)
        JNI->>JNI: JNI_NULL_CHECK(handle, path)
        JNI->>cuDF: extract_variant_field(col, path, data_type)
        cuDF-->>JNI: unique_ptr column
        JNI-->>Java: jlong handle
        Java-->>Caller: ColumnVector
    end

    alt two-step get + cast
        Caller->>Java: getVariantFieldValue(variantStruct, path)
        Java->>JNI: getVariantFieldValue(handle, path)
        JNI->>cuDF: get_variant_field(col, path)
        cuDF-->>JNI: unique_ptr column LIST UINT8
        JNI-->>Java: jlong handle
        Java-->>Caller: ColumnVector (raw bytes)

        Caller->>Java: castVariantValue(valueBytes, targetType)
        Java->>Java: validateTargetType(targetType)
        Java->>JNI: castVariantValue(handle, typeId)
        JNI->>cuDF: cast_variant(col, data_type)
        cuDF-->>JNI: unique_ptr column
        JNI-->>Java: jlong handle
        Java-->>Caller: ColumnVector (decoded)
    end

    alt isAvailable check
        Caller->>Java: isAvailable()
        Java->>JNI: isAvailableNative()
        JNI-->>Java: JNI_TRUE
        Java-->>Caller: true
        Note over Java: UnsatisfiedLinkError -> false (old library)
    end
Loading

Reviews (5): Last reviewed commit: "addressed review comment" | Re-trigger Greptile

Comment thread src/main/java/com/nvidia/spark/rapids/jni/VariantUtils.java Outdated
Comment thread src/main/java/com/nvidia/spark/rapids/jni/VariantUtils.java
Add native and Java bindings for cuDF Variant field extraction.

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1 nartal1 marked this pull request as draft June 26, 2026 00:24
@nartal1 nartal1 force-pushed the variant-support-jni branch from 5df586d to c0c8348 Compare June 26, 2026 00:32
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1 nartal1 marked this pull request as ready for review June 26, 2026 18:08
@nartal1 nartal1 requested a review from Copilot June 26, 2026 18:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Java/JNI bindings to expose cuDF’s experimental Parquet Variant extraction/casting APIs to the cudf-spark-jni layer, along with a new JUnit test suite to validate basic extraction behavior and error handling.

Changes:

  • Introduces VariantUtils Java API wrapping new native Variant extraction/casting entry points.
  • Adds corresponding JNI implementations (VariantUtilsJni.cpp) calling cuDF experimental Variant APIs.
  • Adds VariantUtilsTest coverage for common paths, null handling, and unsupported types; wires native source into the C++ build.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/main/java/com/nvidia/spark/rapids/jni/VariantUtils.java New Java-facing API for Variant field extraction/casting with basic argument/type validation.
src/main/cpp/src/VariantUtilsJni.cpp JNI layer that calls cuDF experimental Variant APIs and exposes availability symbol.
src/main/cpp/CMakeLists.txt Adds the new JNI source file to the native build.
src/test/java/com/nvidia/spark/rapids/jni/VariantUtilsTest.java New JUnit suite validating Variant extraction behavior and error cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/test/java/com/nvidia/spark/rapids/jni/VariantUtilsTest.java
Comment thread src/test/java/com/nvidia/spark/rapids/jni/VariantUtilsTest.java
Signed-off-by: Niranjan Artal <nartal@nvidia.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread src/main/java/com/nvidia/spark/rapids/jni/VariantUtils.java
Comment thread src/main/java/com/nvidia/spark/rapids/jni/VariantUtils.java
Comment thread src/main/java/com/nvidia/spark/rapids/jni/VariantUtils.java
nartal1 added 2 commits June 26, 2026 15:12
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1

nartal1 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants