Add JNI Variant extraction support#4756
Open
nartal1 wants to merge 5 commits into
Open
Conversation
ec12d3c to
5df586d
Compare
Contributor
Add native and Java bindings for cuDF Variant field extraction. Signed-off-by: Niranjan Artal <nartal@nvidia.com>
5df586d to
c0c8348
Compare
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds Java/JNI bindings to expose cuDF’s experimental Parquet Variant extraction/casting APIs to the cudf-spark-jni layer, along with a new JUnit test suite to validate basic extraction behavior and error handling.
Changes:
- Introduces
VariantUtilsJava API wrapping new native Variant extraction/casting entry points. - Adds corresponding JNI implementations (
VariantUtilsJni.cpp) calling cuDF experimental Variant APIs. - Adds
VariantUtilsTestcoverage for common paths, null handling, and unsupported types; wires native source into the C++ build.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/main/java/com/nvidia/spark/rapids/jni/VariantUtils.java | New Java-facing API for Variant field extraction/casting with basic argument/type validation. |
| src/main/cpp/src/VariantUtilsJni.cpp | JNI layer that calls cuDF experimental Variant APIs and exposes availability symbol. |
| src/main/cpp/CMakeLists.txt | Adds the new JNI source file to the native build. |
| src/test/java/com/nvidia/spark/rapids/jni/VariantUtilsTest.java | New JUnit suite validating Variant extraction behavior and error cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Collaborator
Author
|
build |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds
VariantUtilsJava/JNI bindings for cuDF's experimental Parquet Variant extraction APIs:get_variant_field→ extract raw VariantLIST<UINT8>value bytes for a pathcast_variant→ decode raw Variant value bytes to a supported cuDF typeextract_variant_field→ extract and decode in one native callThe Spark RAPIDS integration will be added in a follow-up PR after this JNI API lands.
The wrapper currently supports the cuDF path grammar for object fields, e.g.
x,$.x, and$.a.b. Target typesSTRING,INT8,INT16,INT32, andINT64.STRUCT(metadata LIST<UINT8>, value LIST<UINT8>, ...).VariantUtils.isAvailable()lets Spark RAPIDS check whether the loaded JNI library includes the Variant extraction symbols.Validation
Added a new test suite
VariantUtilsTestfor:$-prefixed pathsgetVariantFieldValue+castVariantValueTests run: Success:17, Failures: 0, Errors: 0, Skipped: 0