diff --git a/LICENSE b/LICENSE
index 261eeb9e9f8..884b334376f 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,3 +1,406 @@
+Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+This code is dual-licensed with documentation/skills under the CC-BY-4.0 AND source code under Apache-2.0 license terms.
+
+
+Attribution 4.0 International
+
+=======================================================================
+
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+
+Using Creative Commons Public Licenses
+
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+
+ Considerations for licensors: Our public licenses are
+ intended for use by those authorized to give the public
+ permission to use material in ways otherwise restricted by
+ copyright and certain other rights. Our licenses are
+ irrevocable. Licensors should read and understand the terms
+ and conditions of the license they choose before applying it.
+ Licensors should also secure all rights necessary before
+ applying our licenses so that the public can reuse the
+ material as expected. Licensors should clearly mark any
+ material not subject to the license. This includes other CC-
+ licensed material, or material used under an exception or
+ limitation to copyright. More considerations for licensors:
+ wiki.creativecommons.org/Considerations_for_licensors
+
+ Considerations for the public: By using one of our public
+ licenses, a licensor grants the public permission to use the
+ licensed material under specified terms and conditions. If
+ the licensor's permission is not necessary for any reason--for
+ example, because of any applicable exception or limitation to
+ copyright--then that use is not regulated by the license. Our
+ licenses grant only permissions under copyright and certain
+ other rights that a licensor has authority to grant. Use of
+ the licensed material may still be restricted for other
+ reasons, including because others have copyright or other
+ rights in the material. A licensor may make special requests,
+ such as asking that all changes be marked or described.
+ Although not required by our licenses, you are encouraged to
+ respect those requests where reasonable. More considerations
+ for the public:
+ wiki.creativecommons.org/Considerations_for_licensees
+
+=======================================================================
+
+Creative Commons Attribution 4.0 International Public License
+
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution 4.0 International Public License ("Public License"). To the
+extent this Public License may be interpreted as a contract, You are
+granted the Licensed Rights in consideration of Your acceptance of
+these terms and conditions, and the Licensor grants You such rights in
+consideration of benefits the Licensor receives from making the
+Licensed Material available under these terms and conditions.
+
+
+Section 1 -- Definitions.
+
+ a. Adapted Material means material subject to Copyright and Similar
+ Rights that is derived from or based upon the Licensed Material
+ and in which the Licensed Material is translated, altered,
+ arranged, transformed, or otherwise modified in a manner requiring
+ permission under the Copyright and Similar Rights held by the
+ Licensor. For purposes of this Public License, where the Licensed
+ Material is a musical work, performance, or sound recording,
+ Adapted Material is always produced where the Licensed Material is
+ synched in timed relation with a moving image.
+
+ b. Adapter's License means the license You apply to Your Copyright
+ and Similar Rights in Your contributions to Adapted Material in
+ accordance with the terms and conditions of this Public License.
+
+ c. Copyright and Similar Rights means copyright and/or similar rights
+ closely related to copyright including, without limitation,
+ performance, broadcast, sound recording, and Sui Generis Database
+ Rights, without regard to how the rights are labeled or
+ categorized. For purposes of this Public License, the rights
+ specified in Section 2(b)(1)-(2) are not Copyright and Similar
+ Rights.
+
+ d. Effective Technological Measures means those measures that, in the
+ absence of proper authority, may not be circumvented under laws
+ fulfilling obligations under Article 11 of the WIPO Copyright
+ Treaty adopted on December 20, 1996, and/or similar international
+ agreements.
+
+ e. Exceptions and Limitations means fair use, fair dealing, and/or
+ any other exception or limitation to Copyright and Similar Rights
+ that applies to Your use of the Licensed Material.
+
+ f. Licensed Material means the artistic or literary work, database,
+ or other material to which the Licensor applied this Public
+ License.
+
+ g. Licensed Rights means the rights granted to You subject to the
+ terms and conditions of this Public License, which are limited to
+ all Copyright and Similar Rights that apply to Your use of the
+ Licensed Material and that the Licensor has authority to license.
+
+ h. Licensor means the individual(s) or entity(ies) granting rights
+ under this Public License.
+
+ i. Share means to provide material to the public by any means or
+ process that requires permission under the Licensed Rights, such
+ as reproduction, public display, public performance, distribution,
+ dissemination, communication, or importation, and to make material
+ available to the public including in ways that members of the
+ public may access the material from a place and at a time
+ individually chosen by them.
+
+ j. Sui Generis Database Rights means rights other than copyright
+ resulting from Directive 96/9/EC of the European Parliament and of
+ the Council of 11 March 1996 on the legal protection of databases,
+ as amended and/or succeeded, as well as other essentially
+ equivalent rights anywhere in the world.
+
+ k. You means the individual or entity exercising the Licensed Rights
+ under this Public License. Your has a corresponding meaning.
+
+
+Section 2 -- Scope.
+
+ a. License grant.
+
+ 1. Subject to the terms and conditions of this Public License,
+ the Licensor hereby grants You a worldwide, royalty-free,
+ non-sublicensable, non-exclusive, irrevocable license to
+ exercise the Licensed Rights in the Licensed Material to:
+
+ a. reproduce and Share the Licensed Material, in whole or
+ in part; and
+
+ b. produce, reproduce, and Share Adapted Material.
+
+ 2. Exceptions and Limitations. For the avoidance of doubt, where
+ Exceptions and Limitations apply to Your use, this Public
+ License does not apply, and You do not need to comply with
+ its terms and conditions.
+
+ 3. Term. The term of this Public License is specified in Section
+ 6(a).
+
+ 4. Media and formats; technical modifications allowed. The
+ Licensor authorizes You to exercise the Licensed Rights in
+ all media and formats whether now known or hereafter created,
+ and to make technical modifications necessary to do so. The
+ Licensor waives and/or agrees not to assert any right or
+ authority to forbid You from making technical modifications
+ necessary to exercise the Licensed Rights, including
+ technical modifications necessary to circumvent Effective
+ Technological Measures. For purposes of this Public License,
+ simply making modifications authorized by this Section 2(a)
+ (4) never produces Adapted Material.
+
+ 5. Downstream recipients.
+
+ a. Offer from the Licensor -- Licensed Material. Every
+ recipient of the Licensed Material automatically
+ receives an offer from the Licensor to exercise the
+ Licensed Rights under the terms and conditions of this
+ Public License.
+
+ b. No downstream restrictions. You may not offer or impose
+ any additional or different terms or conditions on, or
+ apply any Effective Technological Measures to, the
+ Licensed Material if doing so restricts exercise of the
+ Licensed Rights by any recipient of the Licensed
+ Material.
+
+ 6. No endorsement. Nothing in this Public License constitutes or
+ may be construed as permission to assert or imply that You
+ are, or that Your use of the Licensed Material is, connected
+ with, or sponsored, endorsed, or granted official status by,
+ the Licensor or others designated to receive attribution as
+ provided in Section 3(a)(1)(A)(i).
+
+ b. Other rights.
+
+ 1. Moral rights, such as the right of integrity, are not
+ licensed under this Public License, nor are publicity,
+ privacy, and/or other similar personality rights; however, to
+ the extent possible, the Licensor waives and/or agrees not to
+ assert any such rights held by the Licensor to the limited
+ extent necessary to allow You to exercise the Licensed
+ Rights, but not otherwise.
+
+ 2. Patent and trademark rights are not licensed under this
+ Public License.
+
+ 3. To the extent possible, the Licensor waives any right to
+ collect royalties from You for the exercise of the Licensed
+ Rights, whether directly or through a collecting society
+ under any voluntary or waivable statutory or compulsory
+ licensing scheme. In all other cases the Licensor expressly
+ reserves any right to collect such royalties.
+
+
+Section 3 -- License Conditions.
+
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+
+ a. Attribution.
+
+ 1. If You Share the Licensed Material (including in modified
+ form), You must:
+
+ a. retain the following if it is supplied by the Licensor
+ with the Licensed Material:
+
+ i. identification of the creator(s) of the Licensed
+ Material and any others designated to receive
+ attribution, in any reasonable manner requested by
+ the Licensor (including by pseudonym if
+ designated);
+
+ ii. a copyright notice;
+
+ iii. a notice that refers to this Public License;
+
+ iv. a notice that refers to the disclaimer of
+ warranties;
+
+ v. a URI or hyperlink to the Licensed Material to the
+ extent reasonably practicable;
+
+ b. indicate if You modified the Licensed Material and
+ retain an indication of any previous modifications; and
+
+ c. indicate the Licensed Material is licensed under this
+ Public License, and include the text of, or the URI or
+ hyperlink to, this Public License.
+
+ 2. You may satisfy the conditions in Section 3(a)(1) in any
+ reasonable manner based on the medium, means, and context in
+ which You Share the Licensed Material. For example, it may be
+ reasonable to satisfy the conditions by providing a URI or
+ hyperlink to a resource that includes the required
+ information.
+
+ 3. If requested by the Licensor, You must remove any of the
+ information required by Section 3(a)(1)(A) to the extent
+ reasonably practicable.
+
+ 4. If You Share Adapted Material You produce, the Adapter's
+ License You apply must not prevent recipients of the Adapted
+ Material from complying with this Public License.
+
+
+Section 4 -- Sui Generis Database Rights.
+
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+
+ a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+ to extract, reuse, reproduce, and Share all or a substantial
+ portion of the contents of the database;
+
+ b. if You include all or a substantial portion of the database
+ contents in a database in which You have Sui Generis Database
+ Rights, then the database in which You have Sui Generis Database
+ Rights (but not its individual contents) is Adapted Material; and
+
+ c. You must comply with the conditions in Section 3(a) if You Share
+ all or a substantial portion of the contents of the database.
+
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+
+
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+
+ a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+ EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+ AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+ ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+ IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+ WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+ PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+ ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+ KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+ ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+
+ b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+ TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+ NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+ INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+ COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+ USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+ ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+ DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+ IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+
+ c. The disclaimer of warranties and limitation of liability provided
+ above shall be interpreted in a manner that, to the extent
+ possible, most closely approximates an absolute disclaimer and
+ waiver of all liability.
+
+
+Section 6 -- Term and Termination.
+
+ a. This Public License applies for the term of the Copyright and
+ Similar Rights licensed here. However, if You fail to comply with
+ this Public License, then Your rights under this Public License
+ terminate automatically.
+
+ b. Where Your right to use the Licensed Material has terminated under
+ Section 6(a), it reinstates:
+
+ 1. automatically as of the date the violation is cured, provided
+ it is cured within 30 days of Your discovery of the
+ violation; or
+
+ 2. upon express reinstatement by the Licensor.
+
+ For the avoidance of doubt, this Section 6(b) does not affect any
+ right the Licensor may have to seek remedies for Your violations
+ of this Public License.
+
+ c. For the avoidance of doubt, the Licensor may also offer the
+ Licensed Material under separate terms or conditions or stop
+ distributing the Licensed Material at any time; however, doing so
+ will not terminate this Public License.
+
+ d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+ License.
+
+
+Section 7 -- Other Terms and Conditions.
+
+ a. The Licensor shall not be bound by any additional or different
+ terms or conditions communicated by You unless expressly agreed.
+
+ b. Any arrangements, understandings, or agreements regarding the
+ Licensed Material not stated herein are separate from and
+ independent of the terms and conditions of this Public License.
+
+
+Section 8 -- Interpretation.
+
+ a. For the avoidance of doubt, this Public License does not, and
+ shall not be interpreted to, reduce, limit, restrict, or impose
+ conditions on any use of the Licensed Material that could lawfully
+ be made without permission under this Public License.
+
+ b. To the extent possible, if any provision of this Public License is
+ deemed unenforceable, it shall be automatically reformed to the
+ minimum extent necessary to make it enforceable. If the provision
+ cannot be reformed, it shall be severed from this Public License
+ without affecting the enforceability of the remaining terms and
+ conditions.
+
+ c. No term or condition of this Public License will be waived and no
+ failure to comply consented to unless expressly agreed to by the
+ Licensor.
+
+ d. Nothing in this Public License constitutes or may be interpreted
+ as a limitation upon, or waiver of, any privileges and immunities
+ that apply to the Licensor or You, including from the legal
+ processes of any jurisdiction or authority.
+
+
+=======================================================================
+
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the “Licensor.” The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+
+Creative Commons may be contacted at creativecommons.org.
+
+
+
+
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
@@ -186,7 +589,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.
- Copyright [yyyy] [name of copyright owner]
+ Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
diff --git a/build/make-scala-version-build-files.sh b/build/make-scala-version-build-files.sh
index 21bf4471147..295ff44de1e 100755
--- a/build/make-scala-version-build-files.sh
+++ b/build/make-scala-version-build-files.sh
@@ -1,6 +1,6 @@
#!/usr/bin/env bash
#
-# Copyright (c) 2023-2025, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2023-2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -78,6 +78,11 @@ for f in $(git ls-files '**pom.xml'); do
echo "Skipping $f"
continue
fi
+ # Skills package their own pom.xml templates. Ignore those.
+ if [[ $f == skills/* ]]; then
+ echo "Skipping $f"
+ continue
+ fi
echo $f
tof="$TO_DIR/$f"
mkdir -p $(dirname $tof)
diff --git a/pom.xml b/pom.xml
index 450211bcc4a..cc51b5215a0 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1654,6 +1654,8 @@ This will force full Scala code rebuild in downstream modules.
**/target/**/***/cufile.log**/cudf_log.txt
+
+ skills/**thirdparty/parquet-testing/**
@@ -1704,6 +1706,7 @@ This will force full Scala code rebuild in downstream modules.
+
diff --git a/scala2.13/pom.xml b/scala2.13/pom.xml
index 6b9a9aa8d68..69882d86cc1 100644
--- a/scala2.13/pom.xml
+++ b/scala2.13/pom.xml
@@ -1654,6 +1654,8 @@ This will force full Scala code rebuild in downstream modules.
**/target/**/***/cufile.log**/cudf_log.txt
+
+ skills/**thirdparty/parquet-testing/**
@@ -1704,6 +1706,7 @@ This will force full Scala code rebuild in downstream modules.
+
diff --git a/skills/.gitignore b/skills/.gitignore
new file mode 100644
index 00000000000..16a0e72b69f
--- /dev/null
+++ b/skills/.gitignore
@@ -0,0 +1,165 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[codz]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/_build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+# Usually these files are written by a python script from a template
+# before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py.cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pdm
+.pdm-python
+.pdm-build/
+
+# pixi
+.pixi
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.envrc
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# Abstra
+.abstra/
+
+# Visual Studio Code
+.vscode/
+.cursor/
+.claude/
+
+# Ruff stuff:
+.ruff_cache/
+
+# PyPI configuration file
+.pypirc
+
+# Marimo
+marimo/_static/
+marimo/_lsp/
+__marimo__/
+
+# Streamlit
+.streamlit/secrets.toml
+
+# Scala
+.scala-build/
+.metals/
+.bsp/
+
+# Maven config under skills is source, not generated output.
+!**/.mvn/
+!**/.mvn/**
diff --git a/skills/README.md b/skills/README.md
new file mode 100644
index 00000000000..ca60dcfdad8
--- /dev/null
+++ b/skills/README.md
@@ -0,0 +1,174 @@
+# Project Aether Agent Skills
+
+Aether Agent is a set of skills to convert Apache Spark User-Defined Functions (UDFs) for GPU acceleration with the [RAPIDS Accelerator for Apache Spark](https://github.com/NVIDIA/spark-rapids). It provides:
+
+1. **Test generation** -- Create unit tests and test data for existing UDFs.
+2. **Conversion** -- Convert a UDF to a GPU-compatible implementation (SQL, cuDF RapidsUDF, or native CUDA RapidsUDF).
+3. **Benchmarking** -- Generate synthetic data and benchmark the original UDF against the GPU implementation.
+4. **Optimization** -- Iteratively profile and optimize a cuDF RapidsUDF for GPU performance.
+
+
+Table of Contents
+
+- [Installation](#installation)
+- [Supported Formats](#supported-formats)
+- [Prerequisites](#prerequisites)
+- [Selecting an LLM](#selecting-an-llm)
+- [Quick Start](#quick-start)
+ - [Using Skills](#using-skills)
+ - [Try the Workflow](#try-the-workflow)
+
+
+
+## Installation
+
+Install via the [skills CLI](https://github.com/vercel-labs/skills). Installing all skills is recommended, as they are designed to work together.
+
+```bash
+npx skills add NVIDIA/spark-rapids --skill '*' [--agent ]
+```
+
+## Supported Formats
+
+| UDF Type | cuDF RapidsUDF | CUDA RapidsUDF | Spark SQL |
+|-----------|----------------|------------------------|-----------|
+| Java UDF | Yes | Yes | Yes |
+| Hive UDF | Yes | Yes | Yes |
+| Scala UDF | Yes | Yes | Yes |
+| Java UDAF | -- | -- | Yes |
+| Hive UDAF | -- | -- | Yes |
+| Scala UDAF | -- | -- | Yes |
+
+## Prerequisites
+
+- **[Maven](https://maven.apache.org/install.html)** is required to build/compile UDFs.
+- **[JDK](https://docs.oracle.com/en/java/javase/index.html)** must be installed on the system.
+- **Local GPU** with [CUDA toolkit](https://developer.nvidia.com/cuda/toolkit) is required (see [Spark RAPIDS compatibility](https://nvidia.github.io/spark-rapids/docs/download.html) for version requirements).
+
+If a local GPU is not available, another option is to run Aether Agent from a cloud instance, such as AWS EC2.
+
+## Selecting an LLM
+
+For best results, we recommend the latest reasoning models from OpenAI, Anthropic, or Google. As a good proxy, models near the top of the [Terminal-Bench 2.0 leaderboard](https://www.tbench.ai/leaderboard/terminal-bench/2.0) tend to perform well.
+
+## Quick Start
+
+Skills require any IDE or LLM that supports the [agent skills spec](https://agentskills.io) (e.g., Cursor, Codex, Claude Code).
+
+### Using Skills
+
+Skills follow a multi-step workflow:
+
+1. **[udf-gen-test](udf-gen-test/SKILL.md)** -- Generate a unit test for the UDF
+2. **[udf-convert-to-cudf](udf-convert-to-cudf/SKILL.md)**, **[udf-convert-to-cuda](udf-convert-to-cuda/SKILL.md)**, or **[udf-convert-to-sql](udf-convert-to-sql/SKILL.md)** -- Convert the UDF to a GPU-compatible implementation
+3. **[udf-judge-conversion](udf-judge-conversion/SKILL.md)** -- Review generated tests and implementations for coverage gaps, bugs, and edge cases
+4. **[udf-benchmark](udf-benchmark/SKILL.md)** -- Benchmark CPU vs GPU performance
+5. **[udf-optimize-cudf](udf-optimize-cudf/SKILL.md)** -- Iteratively profile and optimize the cuDF RapidsUDF
+
+To invoke a skill, use your IDE's skill command, or simply describe the task and let the agent load the skill automatically.
+
+```bash
+# Manual invocation
+❯ Use the /udf-gen-test skill to generate a unit test for @FormatPhoneUDF.java
+
+# Automatic invocation
+❯ Generate a unit test for @FormatPhoneUDF.java
+```
+
+Each skill builds on the output of the previous one -- udf-gen-test produces a project with a passing unit test, which the conversion skills use as input, and the udf-benchmark skill uses the conversion output.
+
+You can invoke multiple steps in a single prompt:
+
+```bash
+❯ Generate a unit test for @FormatPhoneUDF.java, then convert it to cuDF, native CUDA, or SQL and benchmark
+```
+
+### Try the Workflow
+
+Once you've installed the skills, try the workflow with one of the provided example UDFs:
+- Java: [FormatPhoneUDF.java](examples/FormatPhoneUDF.java)
+- Hive: [IntegerMultiplyBy2UDF.java](examples/IntegerMultiplyBy2UDF.java)
+- Scala: [CalculateRiskUDF.scala](examples/CalculateRiskUDF.scala)
+
+#### Workflow
+
+(Example dialogue is from Claude Code).
+
+1. Open your project in your IDE of choice with the skills installed.
+2. Ask the agent to generate a unit test:
+
+ ```bash
+ ❯ Please generate a unit test for @FormatPhoneUDF.java
+ ⎿ Read FormatPhoneUDF.java (23 lines)
+
+ ● Skill(udf-gen-test)
+ ⎿ Successfully loaded skill
+ ● I'll generate a unit test for FormatPhoneUDF.java. Let me start by setting up the project.
+ ● ...
+ ● Unit test generation is complete. Here's a summary:
+
+ Results
+
+ - Project directory: FormatPhoneUDF/
+ - Unit test: FormatPhoneUDF/src/test/java/com/udf/UnitTest.java
+ - Status: All tests passing
+ - Test coverage: 12 cases
+ ```
+
+3. Once the test passes, convert the UDF:
+
+ ```bash
+ ❯ Please convert the UDF to cuDF
+
+ ● I'll use the udf-convert-to-cudf skill to convert this UDF to a GPU-accelerated RapidsUDF.
+ ● Skill(udf-convert-to-cudf)
+ ⎿ Successfully loaded skill
+ ● ...
+ ● Results
+
+ The RapidsUDF conversion is complete.
+
+ - RapidsUDF: FormatPhoneUDF/src/main/java/com/udf/FormatPhoneRapidsUDF.java
+ - Comparison test: passes with no memory leaks
+ ```
+
+ Alternatively, ask for a judge pass as part of the conversion request:
+
+ ```bash
+ ❯ Please convert the UDF to cuDF and run the judge review agent on the result
+ ```
+
+ **Note:** The conversion agent will only invoke the judge when you explicitly ask for it. Otherwise, the agent performs a local self-review. You can also invoke the udf-judge-conversion skill on its own, at any time.
+
+4. Finally, benchmark the result:
+
+ ```bash
+ ❯ Please benchmark the implementations on 100M rows.
+
+ ● Skill(udf-benchmark)
+ ⎿ Successfully loaded skill
+ ● ...
+ ● Benchmark Results — 100M rows
+
+ ┌─────────┬─────────────┐
+ │ Mode │ Runtime (s) │
+ ├─────────┼─────────────┤
+ │ CPU │ 16.27 │
+ ├─────────┼─────────────┤
+ │ GPU │ 6.52 │
+ ├─────────┼─────────────┤
+ │ Speedup │ 2.50x │
+ └─────────┴─────────────┘
+
+ The GPU RapidsUDF implementation is 2.5x faster than the CPU UDF on 100 million rows.
+ ```
+
+5. Optionally for cuDF RapidsUDF conversions, optimize the implementation:
+
+ ```bash
+ ❯ Please optimize the implementation
+
+ ● Skill(udf-optimize-cudf)
+ ⎿ Successfully loaded skill
+ ● ...
+ ```
diff --git a/skills/docs/dev/VERSIONS.md b/skills/docs/dev/VERSIONS.md
new file mode 100644
index 00000000000..83956663ca3
--- /dev/null
+++ b/skills/docs/dev/VERSIONS.md
@@ -0,0 +1,61 @@
+# Version Update Guide
+
+## Files To Update
+
+### Java udf-gen-test Maven template
+
+File: `skills/udf-gen-test/templates/java/pom.xml`
+
+Update these properties together:
+
+- ``
+- ``
+- ``
+- `` if the RAPIDS artifact classifier changes
+- ``
+- ``
+
+### Scala udf-gen-test Maven template
+
+File: `skills/udf-gen-test/templates/scala/pom.xml`
+
+Update these properties together:
+
+- ``
+- ``
+- ``
+- ``
+- `` if the RAPIDS artifact classifier changes
+- ``
+- ``
+
+### Native CUDA build image
+
+File: `skills/udf-convert-to-cuda/templates/cuda/Dockerfile`
+
+Update this default value:
+
+- `CUDA_VERSION` must match the CUDA toolkit version spark-rapids is built against (the same version the native build uses on the host).
+
+### Native CUDA dependency extraction
+
+File: `skills/udf-convert-to-cuda/templates/cuda/native/scripts/extract-cudf-libs.sh`
+
+Update these default values:
+
+- `SCALA_VERSION`
+- `RAPIDS4SPARK_VERSION`
+- `CUDA_VERSION` if the RAPIDS artifact classifier changes
+- `CUDF_BRANCH`
+
+### Native CUDA CMake template
+
+File: `skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/CMakeLists.txt`
+
+Update these values:
+
+- `RAPIDS_CMAKE_BRANCH`
+- `project(RAPIDSUDFJNI VERSION ...)`
+- `rapids_cpm_find(cudf ...)`
+
+`RAPIDS_CMAKE_BRANCH` should generally match the RAPIDS/cuDF branch or tag used by the Maven templates and `extract-cudf-libs.sh`. The `rapids_cpm_find(cudf...)` version should use the RAPIDS major/minor CPM version, for example `26.04.00` for `26.04.0`.
diff --git a/skills/examples/CalculateRiskUDF.scala b/skills/examples/CalculateRiskUDF.scala
new file mode 100644
index 00000000000..8e9dc6070c8
--- /dev/null
+++ b/skills/examples/CalculateRiskUDF.scala
@@ -0,0 +1,24 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package examples
+
+/**
+ * Calculate risk score based on credit score.
+ *
+ * @param creditScore Credit score
+ * @return Risk score
+ */
+class CalculateRiskUDF extends Function1[Integer, String] with Serializable {
+ override def apply(creditScore: Integer): String = {
+ Option(creditScore) match {
+ case Some(score) if score >= 750 => "LOW"
+ case Some(score) if score >= 650 => "MEDIUM"
+ case Some(score) if score >= 500 => "HIGH"
+ case Some(score) if score < 500 => "VERY_HIGH"
+ case None => "UNKNOWN"
+ }
+ }
+}
diff --git a/skills/examples/FormatPhoneUDF.java b/skills/examples/FormatPhoneUDF.java
new file mode 100644
index 00000000000..f747887911d
--- /dev/null
+++ b/skills/examples/FormatPhoneUDF.java
@@ -0,0 +1,26 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package examples;
+
+import org.apache.spark.sql.api.java.UDF1;
+
+/** Strip non-digit characters and format as (XXX) XXX-XXXX. */
+public class FormatPhoneUDF implements UDF1 {
+ @Override
+ public String call(String phone) throws Exception {
+ if (phone == null) {
+ return null;
+ }
+ String digits = phone.replaceAll("[^0-9]", "");
+ if (digits.length() != 10) {
+ return null;
+ }
+ return String.format("(%s) %s-%s",
+ digits.substring(0, 3),
+ digits.substring(3, 6),
+ digits.substring(6));
+ }
+}
diff --git a/skills/examples/IntegerMultiplyBy2UDF.java b/skills/examples/IntegerMultiplyBy2UDF.java
new file mode 100644
index 00000000000..c6b0cb0055f
--- /dev/null
+++ b/skills/examples/IntegerMultiplyBy2UDF.java
@@ -0,0 +1,78 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package examples;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.log4j.Logger;
+
+@Description(name = "integer_multiply_by_2", value = "_FUNC_(x) - Returns x * 2 for integer values")
+public class IntegerMultiplyBy2UDF extends GenericUDF {
+ private static final Logger LOG = Logger.getLogger(IntegerMultiplyBy2UDF.class);
+ private PrimitiveObjectInspector inputOI;
+
+ @Override
+ public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
+ if (arguments.length != 1) {
+ throw new UDFArgumentException("Exactly one argument is expected.");
+ }
+
+ ObjectInspector oi = arguments[0];
+ if (oi.getCategory() != ObjectInspector.Category.PRIMITIVE) {
+ throw new UDFArgumentTypeException(0, "Argument must be PRIMITIVE, but " + oi.getCategory().name() + " was passed.");
+ }
+
+ inputOI = (PrimitiveObjectInspector) oi;
+
+ // Check if input is numeric
+ if (inputOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.INT &&
+ inputOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.LONG &&
+ inputOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.SHORT &&
+ inputOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.BYTE) {
+ throw new UDFArgumentTypeException(0, "Argument must be numeric (INT/LONG/SHORT/BYTE), but " + inputOI.getPrimitiveCategory().name() + " was passed.");
+ }
+
+ // Return LongWritable type for the result
+ return PrimitiveObjectInspectorFactory.writableLongObjectInspector;
+ }
+
+ @Override
+ public Object evaluate(DeferredObject[] arguments) throws HiveException {
+ if (arguments == null || arguments.length != 1) {
+ return null;
+ }
+
+ Object input = arguments[0].get();
+ if (input == null) {
+ return null;
+ }
+
+ long value = getLongValue(input);
+ return new LongWritable(value * 2);
+ }
+
+ @Override
+ public String getDisplayString(String[] children) {
+ return "integer_multiply_by_2(" + (children != null ? String.join(",", children) : "") + ")";
+ }
+
+ private long getLongValue(Object obj) {
+ if (obj instanceof Number) {
+ return ((Number) obj).longValue();
+ } else {
+ throw new IllegalArgumentException("Cannot convert " + obj.getClass().getName() + " to long");
+ }
+ }
+}
diff --git a/skills/udf-benchmark/CUDF_MICROBENCHMARKS.md b/skills/udf-benchmark/CUDF_MICROBENCHMARKS.md
new file mode 100644
index 00000000000..703384979de
--- /dev/null
+++ b/skills/udf-benchmark/CUDF_MICROBENCHMARKS.md
@@ -0,0 +1,30 @@
+
+
+# cuDF Microbenchmarks
+
+Measures fine-grained CPU vs. GPU performance without Spark overhead on in-memory data.
+
+## Contents
+- [ ] Implement MicroBenchRunner
+- [ ] Run microbenchmarks
+
+## Implement MicroBenchRunner
+
+Fill in the three TODO methods following the docstrings.
+
+## Run Microbenchmarks
+
+Generate data first (reuse from GenData output), then run:
+```bash
+./run_micro_benchmark.sh --mode all --data-path data/bench_data__rows.parquet --rows
+```
+
+Note that the specified number of rows will be coalesced into a single cuDF table.
+A large table size (>1GB) will demonstrate better GPU performance.
+
+## Next Steps
+
+To profile and iteratively optimize GPU performance, use the **udf-optimize-cudf** skill.
diff --git a/skills/udf-benchmark/SKILL.md b/skills/udf-benchmark/SKILL.md
new file mode 100644
index 00000000000..d1efc8c8d24
--- /dev/null
+++ b/skills/udf-benchmark/SKILL.md
@@ -0,0 +1,82 @@
+---
+name: udf-benchmark
+description: Assists with benchmarking and profiling the performance of an Apache Spark UDF on the GPU. This is step 3 of 3 in the UDF conversion workflow (udf-gen-test -> udf-convert-to-* -> udf-benchmark). Use this skill when you have a CPU UDF and a RapidsUDF or SQL implementation, and need to benchmark the performance of the CPU UDF against the GPU implementation.
+license: CC-BY-4.0 AND Apache-2.0
+metadata:
+ spdx-file-copyright-text: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+model: inherit
+---
+
+# UDF Benchmark
+
+## Workflow
+
+- [ ] Step 1: Implement BenchUtils (fill in TODO methods)
+- [ ] Step 2: Validate with a small dataset
+- [ ] Step 3: Generate full benchmark data and run benchmarks
+- [ ] Step 4: cuDF microbenchmarks (skip for SQL targets)
+
+**Before making any edits, create a visible TODO checklist for every workflow step in this skill and keep it updated.** Do not produce a final answer until every required checklist item is marked complete.
+
+## Prerequisites
+
+- Project directory from Steps 1-2 (udf-gen-test, udf-convert-to-*) with passing tests
+
+Derive `` and `` from the UDF class name.
+
+> **Note:** Commands require access to `/tmp` (Spark temp storage) and `/dev` (GPU device). If commands fail due to sandbox restrictions, re-run them unsandboxed.
+
+## Step 1: Implement BenchUtils
+
+Read `src/main//com/udf/bench/BenchUtils.`. Replace placeholders with the actual camel/snake UDF name.
+
+Fill in the TODO methods following the docstrings. For variable-length inputs, generate sizable rows representative of enterprise-scale data. Refer to the unit test for schema and example data.
+
+## Step 2: Validate
+
+Make scripts executable:
+```bash
+chmod +x *.sh
+```
+
+Run validation mode to test with a small dataset:
+```bash
+./run_gen_data.sh --rows 1000 --validate
+```
+
+This runs both the CPU and GPU implementations on the dataset.
+If validation fails, analyze the error and fix the BenchUtils implementation.
+
+## Step 3: Generate Data and Run Benchmarks
+
+The scripts set the default heap size to 16g in `.mvn/jvm.config`; adjust depending on data size.
+
+### Generate benchmark data (10M rows):
+```bash
+./run_gen_data.sh --rows 10000000
+```
+
+### Run benchmarks:
+```bash
+# CPU benchmark
+./run_spark_benchmark.sh --mode cpu --data-path data/bench_data_10000000_rows.parquet
+
+# GPU benchmark
+./run_spark_benchmark.sh --mode gpu --data-path data/bench_data_10000000_rows.parquet
+```
+
+Results are saved to the `results/` directory as JSON files.
+
+## Step 4: cuDF Microbenchmarks
+
+> Skip this step for SQL targets. This only applies to cuDF RapidsUDF conversions.
+
+Follow [CUDF_MICROBENCHMARKS.md](CUDF_MICROBENCHMARKS.md) to implement and run in-memory microbenchmarks.
+
+## Output
+
+Upon successful completion:
+- Benchmark utilities: `src/main//com/udf/bench/BenchUtils.`
+- Microbenchmarks (cuDF): `src/main//com/udf/bench/MicroBenchRunner.`
+- Generated data: `data/`
+- Benchmark results: `results/`
diff --git a/skills/udf-convert-to-cuda/SKILL.md b/skills/udf-convert-to-cuda/SKILL.md
new file mode 100644
index 00000000000..f43f7a98fb3
--- /dev/null
+++ b/skills/udf-convert-to-cuda/SKILL.md
@@ -0,0 +1,169 @@
+---
+name: udf-convert-to-cuda
+description: Assists with converting a non-aggregating Apache Spark UDF to a native CUDA RapidsUDF using JNI and libcudf. This is step 2 of 3 in the UDF conversion workflow (udf-gen-test -> udf-convert-to-cuda -> udf-benchmark). Use this skill when you have a CPU UDF with a unit test and need to convert it to a native CUDA implementation. Prefer udf-convert-to-cudf unless a CUDA implementation is necessary for performance or correctness, or if requested by the user.
+license: CC-BY-4.0 AND Apache-2.0
+metadata:
+ spdx-file-copyright-text: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+model: inherit
+---
+
+# Convert UDF to Native CUDA RapidsUDF
+
+## Workflow
+
+- [ ] Step 1: Copy CUDA add-on templates into the udf-gen-test project
+- [ ] Step 2: Create the Java RapidsUDF/JNI bridge
+- [ ] Step 3: Implement the CUDA/libcudf native function
+- [ ] Step 4: Build with the `cuda-native-udf` Maven profile
+- [ ] Step 5: Fill in the comparison test and iterate
+- [ ] Step 6: Run judge subagent if requested
+- [ ] Step 7: Review conversion
+
+**Before making any edits, create a visible TODO checklist for every workflow step in this skill and keep it updated.** Do not produce a final answer until every required checklist item is marked complete.
+
+## Prerequisites
+
+- Project directory from Step 1 (`udf-gen-test`) with a passing unit test
+- Native build tools: CMake 3.30.4+, a CUDA-compatible C++ compiler, `git`, and `unzip`
+- Docker is optional, but can be used for a stable native build environment
+
+Derive `` and `` from the UDF class name.
+
+> **Note:** Commands require access to `/tmp` (Spark temp storage) and `/dev` (GPU device). If commands fail due to sandbox restrictions, re-run them unsandboxed.
+
+## Step 1: Copy CUDA Add-On Templates
+
+Copy this skill's CUDA templates into the existing project:
+```bash
+cp -r templates/cuda/* //
+chmod +x //native/scripts/extract-cudf-libs.sh
+```
+
+The `udf-gen-test` Maven template already contains an inactive `cuda-native-udf` profile. The native profile is activated only when you build with `-Pcuda-native-udf`.
+
+Read [NATIVE_BUILD_ENV.md](references/NATIVE_BUILD_ENV.md) before changing build configuration.
+Read `examples/` for native RapidsUDF examples.
+
+## Step 2: Create the RapidsUDF/JNI Bridge
+
+Use `src/main/java/com/udf/PlaceholderUDFNameNativeRapidsUDF.java` as a starting point:
+
+1. Rename it to `NativeRapidsUDF.java`.
+2. Rename the class to `NativeRapidsUDF`.
+3. Copy the original CPU UDF interface and row-by-row method onto the class.
+4. Implement `evaluateColumnar` to validate column count/types and call the native method.
+5. Rename the native method to a descriptive operation name, e.g. `cosineSimilarityNative`.
+
+For Scala projects, keep this Java wrapper under `src/main/java/com/udf/` and register it from the Scala test/project. JNI can be used from Scala, but the Java wrapper keeps native symbol names and examples simpler.
+If the Java wrapper's CPU fallback needs to call a Scala object, direct references can fail before `scala-maven-plugin` compiles the Scala classes; use reflection in the row-by-row fallback only, and keep `evaluateColumnar` on the normal JNI path.
+
+Read [JNI_CUDA_GUIDE.md](references/JNI_CUDA_GUIDE.md) for the `evaluateColumnar` contract, type mapping, pointer ownership, `NativeDepsLoader`, and native memory rules.
+**Note:** memory allocations must use the active RMM resource; avoid direct usage of ad hoc CUDA or Thrust allocators.
+
+## Step 3: Implement Native CUDA Code
+
+Rename and edit:
+- `native/src/main/cpp/src/PlaceholderUDFNameJni.cpp`
+- `native/src/main/cpp/src/placeholder_udf_name.cu`
+- `native/src/main/cpp/src/placeholder_udf_name.hpp`
+
+Update `native/src/main/cpp/CMakeLists.txt` `SOURCE_FILES` to match the renamed files. If libcudf ABI/version compatibility is unclear, defer to the user.
+
+Read [JNI_CUDA_GUIDE.md](references/JNI_CUDA_GUIDE.md) before writing kernels.
+
+Verify cuDF header names before choosing includes or APIs. After dependency extraction, the active header tree will be cloned under `target/cudf-repo/cpp/include`.
+
+### Critical Requirements
+
+- **NEVER use `copyToHost()` or native methods that copy inputs from GPU to CPU.** This defeats the purpose of GPU acceleration
+- **Do NOT hardcode test values.** The RapidsUDF must implement actual business logic for ANY potential input
+
+## Step 4: Build
+
+The native Maven profile uses the RAPIDS dependency already declared in `pom.xml`.
+
+```bash
+mvn package -Pcuda-native-udf -DskipTests
+```
+
+To use the Docker build environment:
+```bash
+docker build -t cuda-udf-build .
+mkdir -p "$HOME/.m2"
+docker run --rm --gpus all \
+ --user "$(id -u):$(id -g)" \
+ -e HOME=/workspace \
+ -v "$PWD":/workspace \
+ -v "$HOME/.m2":/workspace/.m2 \
+ -w /workspace \
+ cuda-udf-build \
+ -c "mvn -B -Dmaven.repo.local=/workspace/.m2/repository package -Pcuda-native-udf -DskipTests -Dnative.build.path=/workspace/target/native-build-docker"
+```
+
+If the build fails while resolving cuDF headers or RAPIDS CMake, check network access and the generated `cudf.git.branch` / `rapids.cmake.branch` properties. These properties may contain either a branch or a tag.
+
+## Step 5: Build and Test
+
+Fill in the target-specific TODOs in `src/test//com/udf/CudfComparisonTest.`:
+- Register `NativeRapidsUDF` as the GPU implementation
+- Replace placeholder UDF names
+
+Run:
+```bash
+# Java
+mvn test -Dtest=CudfComparisonTest -Pcuda-native-udf
+
+# Scala project using a Java native RapidsUDF wrapper
+mvn test -Dsuites=com.udf.CudfComparisonTest -Pcuda-native-udf
+```
+
+To run the tests inside the Docker build environment:
+
+```bash
+docker run --rm --gpus all \
+ --user "$(id -u):$(id -g)" \
+ -e HOME=/workspace \
+ -v "$PWD":/workspace \
+ -v "$HOME/.m2":/workspace/.m2 \
+ -v /etc/passwd:/etc/passwd:ro \
+ -v /etc/group:/etc/group:ro \
+ -w /workspace \
+ cuda-udf-build \
+ -c "mvn -B -Dmaven.repo.local=/workspace/.m2/repository test -Dtest=CudfComparisonTest -Pcuda-native-udf -Dnative.build.path=/workspace/target/native-build-docker -DskipCudfExtraction=true"
+```
+
+If tests fail, iterate on the Java bridge or native implementation.
+
+### Difficult Test Failures
+
+Treat the unit test as the CPU behavior specification. Do not weaken or remove test cases silently.
+
+- Tests that check for CPU errors may not be directly applicable to a columnar implementation: the GPU path typically evaluates a whole column and may produce nulls for invalid rows instead of throwing row-level exceptions. If this causes an unavoidable mismatch, add a clear comment in the test and a `TODO/NOTE` in the implementation explaining the mismatch.
+- If a test case does not pass because of inherent CUDA/libcudf/API limitations or low-level GPU/CPU semantic differences, comment out the conflicting assertion/test only after documenting how you tried to make the behavior match and why those attempts failed. Add a note to the user.
+- If the behavior is important, common, or part of the documented input domain, **always prefer fixing the implementation** over commenting out the test case. The exception is a performance-vs-correctness tradeoff that the user explicitly approves.
+
+## Step 6: Run Judge Subagent If Requested
+
+If the user explicitly asked for the judge, a judge subagent, or a review agent, treat that as an explicit request for delegation: you **MUST** launch a separate subagent with `model: inherit` and instruct it to use the **udf-judge-conversion** skill. Ask it to review the `UnitTest`, `CudfComparisonTest`, Java bridge, and JNI/CUDA sources.
+
+If the user did not request a judge/review agent, mark this step as skipped and continue to Step 7. If a required judge subagent is blocked by tool policy, stop and tell the user that explicit permission/instruction is needed.
+
+If you run the judge, wait for it to complete and review its report. If the judge finds any issues, 1) fix the issues, 2) re-run the tests, and 3) re-run the judge subagent.
+
+## Step 7: Review Conversion
+
+Review your own work to ensure:
+- The test runs on the GPU and directly compares CPU-GPU outputs
+- The implementation does not overfit to test cases
+- No `copyToHost()` or row-by-row GPU-to-CPU copying is used for computation
+- No debug statements (e.g., `TableDebug.get().debug(...)`) remain in final output
+
+## Output
+
+Upon successful completion:
+- Native RapidsUDF file at `src/main/java/com/udf/NativeRapidsUDF.java`
+- JNI/CUDA sources under `native/src/main/cpp/src/`
+- Packaged native library in the generated UDF JAR
+- Comparison test passes
+
+These outputs are required for **Step 3: Benchmark**.
diff --git a/skills/udf-convert-to-cuda/examples/CosineSimilarityJni.cpp b/skills/udf-convert-to-cuda/examples/CosineSimilarityJni.cpp
new file mode 100644
index 00000000000..39cc570d5e4
--- /dev/null
+++ b/skills/udf-convert-to-cuda/examples/CosineSimilarityJni.cpp
@@ -0,0 +1,67 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#include "cosine_similarity.hpp"
+
+#include
+#include
+#include
+
+#include
+
+#include
+#include
+
+namespace {
+
+constexpr char const* RUNTIME_ERROR_CLASS = "java/lang/RuntimeException";
+constexpr char const* ILLEGAL_ARG_CLASS = "java/lang/IllegalArgumentException";
+
+void throw_java_exception(JNIEnv* env, char const* class_name, char const* message)
+{
+ jclass ex_class = env->FindClass(class_name);
+ if (ex_class != nullptr) {
+ env->ThrowNew(ex_class, message);
+ }
+}
+
+} // namespace
+
+extern "C" {
+
+JNIEXPORT jlong JNICALL
+Java_com_udf_CosineSimilarityNativeRapidsUDF_cosineSimilarity(JNIEnv* env,
+ jclass,
+ jlong j_view1,
+ jlong j_view2)
+{
+ try {
+ auto v1 = reinterpret_cast(j_view1);
+ auto v2 = reinterpret_cast(j_view2);
+ if (v1 == nullptr || v2 == nullptr) {
+ throw_java_exception(env, ILLEGAL_ARG_CLASS, "input column view is null");
+ return 0;
+ }
+ if (v1->type().id() != v2->type().id() || v1->type().id() != cudf::type_id::LIST) {
+ throw_java_exception(env, ILLEGAL_ARG_CLASS, "inputs are not list columns");
+ return 0;
+ }
+
+ auto lv1 = cudf::lists_column_view(*v1);
+ auto lv2 = cudf::lists_column_view(*v2);
+ std::unique_ptr result = cosine_similarity(lv1, lv2);
+ return reinterpret_cast(result.release());
+ } catch (std::bad_alloc const& e) {
+ auto message = std::string("Unable to allocate native memory: ") + e.what();
+ throw_java_exception(env, RUNTIME_ERROR_CLASS, message.c_str());
+ } catch (std::invalid_argument const& e) {
+ throw_java_exception(env, ILLEGAL_ARG_CLASS, e.what());
+ } catch (std::exception const& e) {
+ throw_java_exception(env, RUNTIME_ERROR_CLASS, e.what());
+ }
+ return 0;
+}
+
+}
diff --git a/skills/udf-convert-to-cuda/examples/CosineSimilarityNativeRapidsUDF.java b/skills/udf-convert-to-cuda/examples/CosineSimilarityNativeRapidsUDF.java
new file mode 100644
index 00000000000..af953a35516
--- /dev/null
+++ b/skills/udf-convert-to-cuda/examples/CosineSimilarityNativeRapidsUDF.java
@@ -0,0 +1,56 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package com.udf;
+
+import ai.rapids.cudf.ColumnVector;
+import com.nvidia.spark.RapidsUDF;
+import org.apache.spark.sql.api.java.UDF2;
+
+import scala.collection.mutable.WrappedArray;
+
+/**
+ * Native CUDA RapidsUDF example for cosine similarity over two LIST(FLOAT32) columns.
+ */
+public class CosineSimilarityNativeRapidsUDF
+ implements UDF2, WrappedArray, Float>, RapidsUDF {
+ @Override
+ public Float call(WrappedArray v1, WrappedArray v2) {
+ if (v1 == null || v2 == null) {
+ return null;
+ }
+ if (v1.length() != v2.length()) {
+ throw new IllegalArgumentException("Array lengths must match: "
+ + v1.length() + " != " + v2.length());
+ }
+
+ double dotProduct = 0;
+ double magnitude1 = 0;
+ double magnitude2 = 0;
+ for (int i = 0; i < v1.length(); i++) {
+ float f1 = v1.apply(i);
+ float f2 = v2.apply(i);
+ dotProduct += f1 * f2;
+ magnitude1 += f1 * f1;
+ magnitude2 += f2 * f2;
+ }
+ return (float) (dotProduct / (Math.sqrt(magnitude1) * Math.sqrt(magnitude2)));
+ }
+
+ @Override
+ public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) {
+ if (args.length != 2) {
+ throw new IllegalArgumentException("Unexpected argument count: " + args.length);
+ }
+ if (numRows != args[0].getRowCount() || numRows != args[1].getRowCount()) {
+ throw new IllegalArgumentException("Input row count mismatch");
+ }
+
+ NativeUDFLoader.ensureLoaded();
+ return new ColumnVector(cosineSimilarity(args[0].getNativeView(), args[1].getNativeView()));
+ }
+
+ private static native long cosineSimilarity(long vectorView1, long vectorView2);
+}
diff --git a/skills/udf-convert-to-cuda/examples/cosine_similarity.cu b/skills/udf-convert-to-cuda/examples/cosine_similarity.cu
new file mode 100644
index 00000000000..e36e3c17cfc
--- /dev/null
+++ b/skills/udf-convert-to-cuda/examples/cosine_similarity.cu
@@ -0,0 +1,119 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#include "cosine_similarity.hpp"
+
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include
+#include
+#include
+
+#include
+
+#include
+#include
+#include
+
+namespace {
+
+struct cosine_similarity_functor {
+ float const* const v1;
+ float const* const v2;
+ int32_t const* const v1_offsets;
+ int32_t const* const v2_offsets;
+
+ __device__ float operator()(cudf::size_type row_idx)
+ {
+ auto const v1_start_idx = v1_offsets[row_idx];
+ auto const v1_num_elems = v1_offsets[row_idx + 1] - v1_start_idx;
+ auto const v2_start_idx = v2_offsets[row_idx];
+ auto const v2_num_elems = v2_offsets[row_idx + 1] - v2_start_idx;
+
+ double magnitude1 = 0;
+ double magnitude2 = 0;
+ double dot_product = 0;
+ for (auto i = 0; i < v1_num_elems; i++) {
+ float const f1 = v1[v1_start_idx + i];
+ float const f2 = v2[v2_start_idx + i];
+ magnitude1 += f1 * f1;
+ magnitude2 += f2 * f2;
+ dot_product += f1 * f2;
+ }
+ return static_cast(dot_product / (cuda::std::sqrt(magnitude1) * cuda::std::sqrt(magnitude2)));
+ }
+};
+
+} // namespace
+
+std::unique_ptr cosine_similarity(cudf::lists_column_view const& lv1,
+ cudf::lists_column_view const& lv2,
+ rmm::cuda_stream_view stream,
+ rmm::device_async_resource_ref mr)
+{
+ if (!cudf::have_same_types(lv1.child(), lv2.child()) ||
+ lv1.child().type().id() != cudf::type_id::FLOAT32) {
+ throw std::invalid_argument("inputs are not lists of floats");
+ }
+
+ auto const row_count = lv1.size();
+ if (row_count != lv2.size()) {
+ throw std::invalid_argument("input row counts do not match");
+ }
+ if (row_count == 0) {
+ return cudf::make_empty_column(cudf::data_type{cudf::type_id::FLOAT32});
+ }
+ if (lv1.child().null_count() != 0 || lv2.child().null_count() != 0) {
+ throw std::invalid_argument("null floats are not supported");
+ }
+
+ auto const lv1_offsets_ptr = lv1.offsets().data();
+ auto const lv2_offsets_ptr = lv2.offsets().data();
+ auto const lv1_null_mask = lv1.parent().null_mask();
+ auto const lv2_null_mask = lv2.parent().null_mask();
+
+ bool const are_offsets_equal =
+ thrust::all_of(rmm::exec_policy_nosync(stream),
+ thrust::make_counting_iterator(0),
+ thrust::make_counting_iterator(row_count),
+ [lv1_offsets_ptr, lv2_offsets_ptr, lv1_null_mask, lv2_null_mask]
+ __device__(cudf::size_type idx) -> bool {
+ bool const lv1_is_null =
+ lv1_null_mask != nullptr && !cudf::bit_is_set(lv1_null_mask, idx);
+ bool const lv2_is_null =
+ lv2_null_mask != nullptr && !cudf::bit_is_set(lv2_null_mask, idx);
+ if (lv1_is_null || lv2_is_null) {
+ return true;
+ }
+ return (lv1_offsets_ptr[idx + 1] - lv1_offsets_ptr[idx]) ==
+ (lv2_offsets_ptr[idx + 1] - lv2_offsets_ptr[idx]);
+ });
+ if (!are_offsets_equal) {
+ throw std::invalid_argument("input list lengths do not match for every row");
+ }
+
+ rmm::device_uvector float_results(row_count, stream, mr);
+ thrust::transform(rmm::exec_policy_nosync(stream),
+ thrust::make_counting_iterator(0),
+ thrust::make_counting_iterator(row_count),
+ float_results.data(),
+ cosine_similarity_functor({lv1.child().data(),
+ lv2.child().data(),
+ lv1.offsets().data(),
+ lv2.offsets().data()}));
+
+ auto [null_mask, null_count] =
+ cudf::bitmask_and(cudf::table_view({lv1.parent(), lv2.parent()}), stream, mr);
+ return std::make_unique(cudf::data_type{cudf::type_id::FLOAT32},
+ row_count,
+ float_results.release(),
+ std::move(null_mask),
+ null_count);
+}
diff --git a/skills/udf-convert-to-cuda/examples/cosine_similarity.hpp b/skills/udf-convert-to-cuda/examples/cosine_similarity.hpp
new file mode 100644
index 00000000000..99b78ede0f7
--- /dev/null
+++ b/skills/udf-convert-to-cuda/examples/cosine_similarity.hpp
@@ -0,0 +1,22 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#pragma once
+
+#include
+#include
+#include
+#include
+
+#include
+#include
+
+#include
+
+std::unique_ptr cosine_similarity(
+ cudf::lists_column_view const& lv1,
+ cudf::lists_column_view const& lv2,
+ rmm::cuda_stream_view stream = cudf::get_default_stream(),
+ rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref());
diff --git a/skills/udf-convert-to-cuda/references/JNI_CUDA_GUIDE.md b/skills/udf-convert-to-cuda/references/JNI_CUDA_GUIDE.md
new file mode 100644
index 00000000000..2137b79837b
--- /dev/null
+++ b/skills/udf-convert-to-cuda/references/JNI_CUDA_GUIDE.md
@@ -0,0 +1,162 @@
+
+
+# JNI and CUDA RapidsUDF Guide
+
+## RapidsUDF Contract
+
+The RapidsUDF interface provides a way to run a CPU UDF on the GPU when using the RAPIDS Accelerator for Apache Spark. The interface provides a single method you need to override called `evaluateColumnar`. The CPU UDF method must remain on the native RapidsUDF class so Spark can fall back to the CPU if a surrounding plan cannot run on the GPU.
+
+`evaluateColumnar(int numRows, ColumnVector... args)` receives columnar forms of the same inputs as the CPU UDF. All input columns should have `numRows` rows. Scalar inputs may be expanded into full columns by the RAPIDS Accelerator, so do not rely on detecting scalar-vs-column input.
+
+The returned `ColumnVector` must have `numRows` rows and a cuDF type that matches the Spark return type:
+
+| Spark Type | cuDF Type |
+|---|---|
+| BooleanType | BOOL8 |
+| ByteType | INT8 |
+| ShortType | INT16 |
+| IntegerType | INT32 |
+| LongType | INT64 |
+| FloatType | FLOAT32 |
+| DoubleType | FLOAT64 |
+| DecimalType | DECIMAL32, DECIMAL64, DECIMAL128 * |
+| DateType | TIMESTAMP_DAYS |
+| TimestampType | TIMESTAMP_MICROSECONDS |
+| StringType | STRING |
+| ArrayType | LIST of element type |
+| MapType | LIST of STRUCT(key, value) |
+| StructType | STRUCT of fields |
+
+For example, if the CPU UDF returns the Spark type ArrayType(MapType(StringType, StringType)) then evaluateColumnar must return a column of type LIST(LIST(STRUCT(STRING,STRING))).
+
+*Note: cuDF's DECIMAL32 corresponds to precision <= 9 digits, DECIMAL64 corresponds to 9 < precision <= 18 digits, and DECIMAL128 corresponds to 18 < precision <= 38 digits. Precision greater than 38 digits is unsupported.
+Note that cuDF decimals use a negative scale relative to Spark DecimalType. For example, Spark DecimalType(precision=11, scale=2) would translate to cuDF type DECIMAL64(scale=-2).
+
+For `ArrayType(elementType, containsNull)`, the LIST parent null mask represents null arrays. Child nulls represent null array elements and must match the `containsNull` contract. Either preserve child nulls deliberately or reject them explicitly.
+
+## Java Wrapper
+
+Use `NativeDepsLoader.loadNativeDeps(new String[] {"rapidsudfjni"})` from a synchronized loader. Call it from `evaluateColumnar`, not a static initializer, because the Spark driver may not have the executor CUDA runtime.
+
+Pass input columns to JNI with `ColumnVector.getNativeView()`. Wrap the native result with `new ColumnVector(nativeHandle)`.
+
+Do not close input `ColumnVector`s. The RAPIDS Accelerator owns them. Closing inputs can cause double-close errors.
+
+## JNI and Native Ownership
+
+JNI arguments are non-owning pointers:
+```cpp
+auto input = reinterpret_cast(j_input);
+```
+
+The native function must allocate and return an owning `cudf::column`:
+```cpp
+std::unique_ptr result = compute(*input);
+return reinterpret_cast(result.release());
+```
+
+Never return a pointer to an input view, child view, stack object, or a column owned by a temporary that will be destroyed before Java wraps it.
+
+Catch `std::bad_alloc`, `std::invalid_argument`, and `std::exception`, then throw Java exceptions with `JNIEnv::ThrowNew`.
+
+## CUDA/libcudf Implementation
+
+Start with libcudf column APIs before writing custom kernels. Use custom CUDA kernels when the operation requires fused logic, custom reductions, or logic unavailable in cuDF Java/libcudf primitives.
+
+### Checklist
+
+- Validate input types and row counts in Java before crossing JNI when possible
+- Validate libcudf types again in JNI for native safety
+- Preserve Spark null semantics
+- Prefer `cudf::column_view`/`cudf::lists_column_view` for input views
+- Return `std::unique_ptr`
+- Avoid host copies in the final implementation
+- Prefer public libcudf APIs; avoid using `cudf::detail`
+- Keep one native function focused on one UDF operation
+
+### Correctness Pitfalls
+
+- **Null values of fixed-width columns are undefined memory.** Check the null mask (`cudf::bit_is_set(...)` or `column_device_view::is_valid(...)`) before reading element values.
+- **Empty list/string columns have no offsets.** Accessing the offsets child of an empty list or string column is undefined behavior. Handle the empty case early (e.g., return `cudf::make_empty_column(...)`).
+- **Use `cudf::have_same_types(a, b)` for type comparison**, not `a.type() == b.type()` — equality misses differences such as decimal scale.
+- **`cudf::size_type` is `int32_t`. LIST offsets are always `int32_t`.** String offsets may be `int32_t` or `int64_t` for large strings.
+- **Nested column null masks must agree across levels.** When constructing LIST/STRUCT output yourself, ensure parent and child null masks are consistent.
+- **`CUDF_EXPECTS` conditions must be pure predicates** — side effects inside the condition may only execute in debug builds.
+
+### Useful Patterns
+
+- `rmm::device_uvector`: temporary device output buffers that can be released into a `cudf::column`
+- `rmm::exec_policy_nosync(stream)`: pass the intended CUDA stream to Thrust algorithms (prefer the `_nosync` variant unless you need an implicit host-device sync)
+- `cudf::make_empty_column(...)`: return correctly typed empty outputs
+- `cudf::make_numeric_column(...)`: allocate fixed-width output columns with a null mask
+- `cudf::bitmask_and(cudf::table_view({...}))`: combine input validity masks for output null semantics
+- `cudf::lists_column_view`: inspect list offsets, child columns, parent null masks, and nested list shapes
+- `cudf::strings_column_view`: inspect string chars/offsets when implementing string kernels
+- `cudf::create_null_mask(...)`: create all-valid, all-null, or uninitialized masks for new outputs
+- CUB and Thrust APIs: useful for scans, reductions, transforms, selection, and sorting when libcudf does not provide the exact operation
+
+### Memory Allocation
+
+- All device allocations must go through the active RMM memory resource.
+- Use libcudf factories or RMM types such as `rmm::device_uvector` and `rmm::device_buffer`; avoid direct calls to `cudaMalloc`, `cudaMallocAsync`, or other ad hoc device allocators.
+- Use the output MR for returned columns when the API exposes one; use `cudf::get_current_device_resource_ref()` for short-lived temporary buffers.
+- Use RMM pinned memory for large host buffers. Small CPU-only metadata may use normal C++ containers.
+
+Example allocating CUB scratch buffers through RMM:
+
+```cpp
+size_t temp_storage_bytes = 0;
+cub::DeviceScan::InclusiveSum(nullptr, temp_storage_bytes, in, out, n, stream.value());
+rmm::device_buffer temp_storage(temp_storage_bytes, stream, cudf::get_current_device_resource_ref());
+cub::DeviceScan::InclusiveSum(temp_storage.data(), temp_storage_bytes, in, out, n, stream.value());
+```
+
+### Stream and MR Plumbing
+
+Top-level native functions should accept stream and MR as the last two parameters, with defaults:
+
+```cpp
+std::unique_ptr my_op(
+ cudf::column_view const& input,
+ rmm::cuda_stream_view stream = cudf::get_default_stream(),
+ rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref());
+```
+
+Use the passed-in `mr` for the returned column and `cudf::get_current_device_resource_ref()` for short-lived temporaries. Propagate `stream` to every libcudf call, Thrust call, and kernel launch — do not introduce `rmm::cuda_stream_default` inside the implementation.
+
+### Kernel Launch Discipline
+
+Always check kernel launches; silent launch failures cause downstream corruption.
+
+```cpp
+my_kernel<<>>(args);
+CUDF_CHECK_CUDA(stream.value());
+```
+
+Prefer `cuda::std::` (e.g. `cuda::std::min`, `cuda::std::sqrt`, `cuda::std::numeric_limits`) over `std::` inside `__device__` and `CUDF_HOST_DEVICE` code.
+
+Avoid synchronizing in the hot path except when required to fetch output sizes or while debugging.
+
+### Output Construction
+
+For variable-size list outputs:
+1. Compute per-row child sizes on device, using zero for null parent rows.
+2. Prefix-sum sizes into an `INT32` offsets column of length `numRows + 1`.
+3. Allocate the child column from the final offset, fill it on device, and set child nulls if `containsNull=true`.
+4. Assemble the LIST column from offsets, child column, parent null mask, and parent null count.
+
+For string outputs, construct proper offsets, chars, and null masks. For scalar numeric outputs, prefer libcudf transforms/reductions where possible.
+
+## Debugging
+
+Rerun tests with `-Ddebug.memory.leaks=true` to enable Java refcount debugging; this catches leaked `ColumnVector`, `Table`, `Scalar`, and Java-owned buffer objects.
+Note that it does **not** catch native memory leaks; use RMM RAII patterns to ensure all native allocations are freed.
+
+For native kernel memory errors, run the comparison test under Compute Sanitizer:
+
+```bash
+compute-sanitizer --tool memcheck mvn test
+```
diff --git a/skills/udf-convert-to-cuda/references/NATIVE_BUILD_ENV.md b/skills/udf-convert-to-cuda/references/NATIVE_BUILD_ENV.md
new file mode 100644
index 00000000000..56af56fd34d
--- /dev/null
+++ b/skills/udf-convert-to-cuda/references/NATIVE_BUILD_ENV.md
@@ -0,0 +1,92 @@
+
+
+# Native CUDA UDF Build Environment
+
+## Dependency Model
+
+The native build uses the RAPIDS JAR already resolved by Maven. The `cuda-native-udf` profile asks Maven to copy `rapids-4-spark_--.jar` and `rapids-4-spark_-.jar` into `target/rapids-jar`. The `native/scripts/extract-cudf-libs.sh` script then extracts `libcudf.so*` and `libnvcomp.so*`, clones matching cuDF headers, builds `librapidsudfjni.so`, and packages it in the UDF JAR for `NativeDepsLoader`.
+
+No separate manual JAR download is required. Maven should resolve the RAPIDS dependency declared in `pom.xml`; the native profile reuses the same coordinates and copies the resolved JAR into `target/rapids-jar`.
+
+The profile first tries the CUDA-classified artifact (`-cuda12`) and then the unclassified artifact. If extraction fails, the selected JAR probably does not contain Linux native CUDA libraries or the Maven cache/repository is inconsistent with the generated version properties.
+
+## Required Tools
+
+- CUDA toolkit matching spark-rapids build and a compatible NVIDIA driver
+- CMake 3.30.4+
+- C++ compiler compatible with the selected CUDA toolkit
+- JDK 17
+- Maven
+- `git`
+- `unzip`
+
+## CUDA Toolkit Version
+
+The native build compiles against the prebuilt libcudf in the spark-rapids jar, so the local CUDA toolkit must match the version spark-rapids was built against.
+
+1. Get the CUDA version(s) spark-rapids is built against:
+
+```bash
+curl -fsSL https://nvidia.github.io/spark-rapids/docs/download.html \
+ | grep -Eo '[^<>]*built against CUDA[^<>]*'
+```
+
+2. Check the active toolkit (`nvcc --version`). CMake uses `$CUDACXX`, else `nvcc` on `PATH`, else `$CUDAToolkit_ROOT/bin/nvcc` — the default `PATH` `nvcc` may not be the one you want.
+
+3. If it doesn't match, point the build at a matching toolkit that's already installed; otherwise install one that matches:
+
+```bash
+export CUDACXX=/usr/local/cuda-/bin/nvcc
+export CUDAToolkit_ROOT=/usr/local/cuda-
+export PATH="$CUDAToolkit_ROOT/bin:$PATH"
+```
+
+Docker is optional. Use it when local compiler/CMake/CUDA versions drift or when the build needs to be reproducible across machines.
+
+The provided Dockerfile installs JDK 17 and sets it via `/etc/profile.d/java17.sh`. If a modified Dockerfile or alternate entrypoint bypasses the login shell and `mvn` reports Java 8, export `JAVA_HOME=/usr/lib/jvm/java-17-openjdk` and prepend `$JAVA_HOME/bin` to `PATH` explicitly.
+
+Use the full Docker command listed in SKILL.md. It runs as the calling user to avoid root-owned artifacts, mounts the project and Maven cache, and uses a Docker-specific native build path so CMake cache paths do not conflict with host builds.
+
+If a previous root container run already wrote `target/` artifacts, fix ownership or clean them before rerunning as a non-root user.
+
+CMake stores absolute source and build paths in `CMakeCache.txt`. A host-generated `target/native-build` cannot be reused from `/workspace/target/native-build` inside Docker. Use `mvn clean`, remove the stale native build directory, or pass a Docker-specific path such as `-Dnative.build.path=/workspace/target/native-build-docker`.
+
+## Version Alignment
+
+Keep these values aligned:
+- Spark version
+- Scala binary version
+- `rapids4spark.version`
+- `cuda.version`
+- `cudf.git.branch`
+- `rapids.cmake.branch`
+- JDK version
+
+The generated template maps RAPIDS `..` to the `v..00` cuDF and rapids-cmake tags. If building a snapshot, a custom RAPIDS JAR, or a patch release with known native ABI changes, verify the matching cuDF/RMM/CCCL versions with the user.
+
+## Fast Rebuilds and Verification
+
+After the first successful extraction, use `-DskipCudfExtraction=true` while iterating on Java/JNI/CUDA source:
+
+```bash
+mvn package -Pcuda-native-udf -DskipCudfExtraction=true -DskipTests
+```
+
+Verify deployable packaging with:
+
+```bash
+jar tf target/*.jar | grep librapidsudfjni.so
+```
+
+## Build Modes
+
+Default: `USE_PREBUILT_CUDF=ON`.
+
+This extracts `libcudf` from the RAPIDS JAR and builds only the UDF JNI/CUDA library. This is the stable, fast path.
+
+Escape hatch: `-DUSE_PREBUILT_CUDF=OFF`.
+
+This builds cuDF from source through RAPIDS CMake/CPM. It is slow and more sensitive to branch drift; ask the user before using it.
diff --git a/skills/udf-convert-to-cuda/templates/cuda/Dockerfile b/skills/udf-convert-to-cuda/templates/cuda/Dockerfile
new file mode 100644
index 00000000000..f75a5f19376
--- /dev/null
+++ b/skills/udf-convert-to-cuda/templates/cuda/Dockerfile
@@ -0,0 +1,65 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+# Reproducible build image for native CUDA RapidsUDF code.
+ARG CUDA_VERSION=12.9.1
+ARG LINUX_VERSION=rockylinux8
+
+FROM nvidia/cuda:${CUDA_VERSION}-devel-${LINUX_VERSION}
+
+ARG TOOLSET_VERSION=14
+ARG CMAKE_VERSION=3.30.4
+ARG CMAKE_ARCH=x86_64
+ARG CCACHE_VERSION=4.11.2
+ARG PARALLEL_LEVEL=10
+
+ENV TOOLSET_VERSION=${TOOLSET_VERSION}
+ENV PARALLEL_LEVEL=${PARALLEL_LEVEL}
+ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk
+
+RUN dnf --enablerepo=powertools install -y \
+ gcc-toolset-${TOOLSET_VERSION} \
+ git \
+ java-17-openjdk-devel \
+ maven \
+ ninja-build \
+ patch \
+ python39 \
+ scl-utils \
+ tar \
+ unzip \
+ wget \
+ zlib-devel \
+ && alternatives --set python /usr/bin/python3
+
+RUN cd /usr/local && \
+ wget --quiet https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}-linux-${CMAKE_ARCH}.tar.gz && \
+ tar zxf cmake-${CMAKE_VERSION}-linux-${CMAKE_ARCH}.tar.gz && \
+ rm cmake-${CMAKE_VERSION}-linux-${CMAKE_ARCH}.tar.gz
+ENV PATH=${JAVA_HOME}/bin:/usr/local/cmake-${CMAKE_VERSION}-linux-${CMAKE_ARCH}/bin:${PATH}
+
+# Bake the SCL activation and Java 17 environment into /etc/profile.d so they are restored by `bash -l` on every container start.
+RUN printf 'source /opt/rh/gcc-toolset-%s/enable\n' "${TOOLSET_VERSION}" \
+ > /etc/profile.d/scl-gcc-toolset.sh && \
+ printf '%s\n%s\n' \
+ 'export JAVA_HOME=/usr/lib/jvm/java-17-openjdk' \
+ 'export PATH=$JAVA_HOME/bin:$PATH' \
+ > /etc/profile.d/java17.sh
+
+RUN cd /tmp && \
+ wget --quiet https://github.com/ccache/ccache/releases/download/v${CCACHE_VERSION}/ccache-${CCACHE_VERSION}.tar.gz && \
+ tar zxf ccache-${CCACHE_VERSION}.tar.gz && \
+ rm ccache-${CCACHE_VERSION}.tar.gz && \
+ cd ccache-${CCACHE_VERSION} && \
+ mkdir build && \
+ cd build && \
+ scl enable gcc-toolset-${TOOLSET_VERSION} \
+ "cmake .. \
+ -DCMAKE_BUILD_TYPE=Release \
+ -DZSTD_FROM_INTERNET=ON \
+ -DREDIS_STORAGE_BACKEND=OFF && \
+ cmake --build . --parallel ${PARALLEL_LEVEL} --target install" && \
+ cd ../.. && \
+ rm -rf ccache-${CCACHE_VERSION}
+
+ENTRYPOINT ["bash", "-l"]
diff --git a/skills/udf-convert-to-cuda/templates/cuda/native/scripts/extract-cudf-libs.sh b/skills/udf-convert-to-cuda/templates/cuda/native/scripts/extract-cudf-libs.sh
new file mode 100644
index 00000000000..52020e71c3c
--- /dev/null
+++ b/skills/udf-convert-to-cuda/templates/cuda/native/scripts/extract-cudf-libs.sh
@@ -0,0 +1,81 @@
+#!/bin/bash
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_DIR="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+TARGET_DIR="${TARGET_DIR:-${PROJECT_DIR}/target}"
+NATIVE_DEPS_DIR="${TARGET_DIR}/native-deps"
+CUDF_REPO_DIR="${TARGET_DIR}/cudf-repo"
+RAPIDS_JAR_DIR="${TARGET_DIR}/rapids-jar"
+
+SCALA_VERSION="${SCALA_VERSION:-2.12}"
+RAPIDS4SPARK_VERSION="${RAPIDS4SPARK_VERSION:-26.04.0}"
+CUDA_VERSION="${CUDA_VERSION:-cuda12}"
+CUDF_BRANCH="${CUDF_BRANCH:-v26.04.00}"
+
+mkdir -p "${NATIVE_DEPS_DIR}" "${CUDF_REPO_DIR}"
+
+choose_rapids_jar() {
+ local candidates=(
+ "${RAPIDS_JAR_DIR}/rapids-4-spark_${SCALA_VERSION}-${RAPIDS4SPARK_VERSION}-${CUDA_VERSION}.jar"
+ "${RAPIDS_JAR_DIR}/rapids-4-spark_${SCALA_VERSION}-${RAPIDS4SPARK_VERSION}.jar"
+ "${HOME}/.m2/repository/com/nvidia/rapids-4-spark_${SCALA_VERSION}/${RAPIDS4SPARK_VERSION}/rapids-4-spark_${SCALA_VERSION}-${RAPIDS4SPARK_VERSION}-${CUDA_VERSION}.jar"
+ "${HOME}/.m2/repository/com/nvidia/rapids-4-spark_${SCALA_VERSION}/${RAPIDS4SPARK_VERSION}/rapids-4-spark_${SCALA_VERSION}-${RAPIDS4SPARK_VERSION}.jar"
+ )
+
+ for candidate in "${candidates[@]}"; do
+ if [[ -f "${candidate}" ]]; then
+ echo "${candidate}"
+ return 0
+ fi
+ done
+
+ echo "ERROR: Could not find a rapids-4-spark jar." >&2
+ echo "Tried target/rapids-jar and ~/.m2 for version ${RAPIDS4SPARK_VERSION} (${CUDA_VERSION})." >&2
+ echo "Run the build through Maven with -Pcuda-native-udf so the profile can copy the RAPIDS dependency first." >&2
+ return 1
+}
+
+JAR_PATH="$(choose_rapids_jar)"
+
+echo "Using RAPIDS jar: ${JAR_PATH}"
+echo "Using cuDF header ref: ${CUDF_BRANCH}"
+
+TEMP_DIR="${TARGET_DIR}/cudf-extract"
+rm -rf "${TEMP_DIR}"
+mkdir -p "${TEMP_DIR}"
+
+if ! unzip -o "${JAR_PATH}" "*/libcudf.so*" "*/libnvcomp.so*" -d "${TEMP_DIR}"; then
+ echo "ERROR: Failed to extract libcudf/libnvcomp from ${JAR_PATH}" >&2
+ echo "The selected RAPIDS jar may not include native Linux CUDA libraries." >&2
+ rm -rf "${TEMP_DIR}"
+ exit 1
+fi
+
+while IFS= read -r source_file; do
+ cp -f "${source_file}" "${NATIVE_DEPS_DIR}/$(basename "${source_file}")"
+done < <(find "${TEMP_DIR}" -name "*.so*")
+rm -rf "${TEMP_DIR}"
+
+if [[ ! -f "${NATIVE_DEPS_DIR}/libcudf.so" ]]; then
+ echo "ERROR: libcudf.so was not extracted into ${NATIVE_DEPS_DIR}" >&2
+ exit 1
+fi
+
+if [[ ! -d "${CUDF_REPO_DIR}/.git" ]]; then
+ git clone --depth 1 --branch "${CUDF_BRANCH}" https://github.com/rapidsai/cudf.git "${CUDF_REPO_DIR}"
+else
+ echo "Using existing cuDF headers at ${CUDF_REPO_DIR}"
+fi
+
+if [[ ! -d "${CUDF_REPO_DIR}/cpp/include" ]]; then
+ echo "ERROR: cuDF headers not found at ${CUDF_REPO_DIR}/cpp/include" >&2
+ exit 1
+fi
+
+echo "Native dependencies ready:"
+echo " Libraries: ${NATIVE_DEPS_DIR}"
+echo " Headers: ${CUDF_REPO_DIR}/cpp/include"
diff --git a/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/CMakeLists.txt b/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/CMakeLists.txt
new file mode 100644
index 00000000000..a9398b4938c
--- /dev/null
+++ b/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/CMakeLists.txt
@@ -0,0 +1,119 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+cmake_minimum_required(VERSION 3.30.4 FATAL_ERROR)
+
+set(RAPIDS_CMAKE_BRANCH "v26.04.00" CACHE STRING "rapids-cmake branch or tag")
+if(RAPIDS_CMAKE_BRANCH MATCHES "^v(.+)")
+ set(rapids-cmake-version "${CMAKE_MATCH_1}")
+ set(rapids-cmake-tag "${RAPIDS_CMAKE_BRANCH}")
+else()
+ set(rapids-cmake-branch "${RAPIDS_CMAKE_BRANCH}")
+endif()
+set(NATIVE_LIBRARY_NAME "rapidsudfjni" CACHE STRING "JNI shared library target name")
+set(NATIVE_DEPS_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../../../../target/native-deps" CACHE PATH "Directory containing prebuilt libcudf")
+set(CUDF_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../../../../target/cudf-repo/cpp" CACHE PATH "cuDF source directory for headers")
+set(GPU_ARCHS "RAPIDS" CACHE STRING "CUDA architectures")
+
+file(DOWNLOAD
+ https://raw.githubusercontent.com/rapidsai/rapids-cmake/${RAPIDS_CMAKE_BRANCH}/RAPIDS.cmake
+ ${CMAKE_BINARY_DIR}/RAPIDS.cmake
+)
+include(${CMAKE_BINARY_DIR}/RAPIDS.cmake)
+
+include(rapids-cmake)
+include(rapids-cpm)
+include(rapids-cuda)
+
+if(DEFINED ENV{CXX} AND NOT "$ENV{CXX}" STREQUAL "")
+ set(CMAKE_CXX_COMPILER "$ENV{CXX}" CACHE FILEPATH "C++ compiler" FORCE)
+endif()
+
+if(DEFINED GPU_ARCHS)
+ set(CMAKE_CUDA_ARCHITECTURES "${GPU_ARCHS}")
+endif()
+rapids_cuda_init_architectures(RAPIDSUDFJNI)
+
+project(RAPIDSUDFJNI VERSION 26.04.0 LANGUAGES C CXX CUDA)
+
+set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
+set(CMAKE_POSITION_INDEPENDENT_CODE ON)
+set(CMAKE_CXX_STANDARD 20)
+set(CMAKE_CXX_STANDARD_REQUIRED ON)
+set(CMAKE_CUDA_STANDARD 20)
+set(CMAKE_CUDA_STANDARD_REQUIRED ON)
+set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -w --expt-extended-lambda --expt-relaxed-constexpr")
+
+option(USE_PREBUILT_CUDF "Use libcudf extracted from the rapids-4-spark jar" ON)
+option(PER_THREAD_DEFAULT_STREAM "Build with per-thread default stream" ON)
+option(CUDF_ENABLE_ARROW_S3 "Enable Arrow S3 support in source-build mode" OFF)
+
+if(USE_PREBUILT_CUDF)
+ if(NOT EXISTS "${NATIVE_DEPS_DIR}")
+ message(FATAL_ERROR "NATIVE_DEPS_DIR does not exist: ${NATIVE_DEPS_DIR}")
+ endif()
+ if(NOT EXISTS "${CUDF_SOURCE_DIR}/include")
+ message(FATAL_ERROR "CUDF_SOURCE_DIR headers not found: ${CUDF_SOURCE_DIR}/include")
+ endif()
+
+ find_library(CUDF_LIBRARY NAMES cudf PATHS "${NATIVE_DEPS_DIR}" NO_DEFAULT_PATH REQUIRED)
+
+ get_property(rapids-cmake-dir GLOBAL PROPERTY rapids-cmake-dir)
+ if(NOT rapids-cmake-dir)
+ set(rapids-cmake-dir "${CMAKE_BINARY_DIR}/_deps/rapids-cmake-src")
+ endif()
+
+ rapids_cpm_init()
+ include("${rapids-cmake-dir}/cpm/cccl.cmake")
+ rapids_cpm_cccl()
+ include("${rapids-cmake-dir}/cpm/rmm.cmake")
+ rapids_cpm_rmm()
+
+ if(NOT TARGET rmm::rmm)
+ message(FATAL_ERROR "rmm::rmm target was not created")
+ endif()
+
+ get_target_property(RMM_INCLUDE_DIRS rmm::rmm INTERFACE_INCLUDE_DIRECTORIES)
+
+ add_library(cudf_imported SHARED IMPORTED GLOBAL)
+ set_target_properties(cudf_imported PROPERTIES IMPORTED_LOCATION "${CUDF_LIBRARY}")
+ target_include_directories(cudf_imported INTERFACE
+ "${CUDF_SOURCE_DIR}/include"
+ ${RMM_INCLUDE_DIRS}
+ )
+ target_link_libraries(cudf_imported INTERFACE rmm::rmm)
+ add_library(cudf::cudf ALIAS cudf_imported)
+else()
+ rapids_cpm_init()
+ rapids_cpm_find(cudf 26.04.00
+ CPM_ARGS
+ GIT_REPOSITORY https://github.com/rapidsai/cudf.git
+ GIT_TAG ${RAPIDS_CMAKE_BRANCH}
+ GIT_SHALLOW TRUE
+ SOURCE_SUBDIR cpp
+ OPTIONS "BUILD_TESTS OFF"
+ "BUILD_BENCHMARKS OFF"
+ "CUDF_ENABLE_ARROW_S3 ${CUDF_ENABLE_ARROW_S3}"
+ "CUDF_KVIKIO_REMOTE_IO OFF"
+ "DISABLE_DEPRECATION_WARNING ON"
+ "AUTO_DETECT_CUDA_ARCHITECTURES OFF"
+ )
+endif()
+
+find_package(JNI REQUIRED)
+
+set(SOURCE_FILES
+ "src/PlaceholderUDFNameJni.cpp"
+ "src/placeholder_udf_name.cu"
+)
+
+add_library(${NATIVE_LIBRARY_NAME} SHARED ${SOURCE_FILES})
+set_target_properties(${NATIVE_LIBRARY_NAME} PROPERTIES BUILD_RPATH "\$ORIGIN")
+
+if(PER_THREAD_DEFAULT_STREAM)
+ target_compile_definitions(${NATIVE_LIBRARY_NAME} PRIVATE CUDA_API_PER_THREAD_DEFAULT_STREAM)
+endif()
+
+target_include_directories(${NATIVE_LIBRARY_NAME} PRIVATE ${JNI_INCLUDE_DIRS})
+target_compile_definitions(${NATIVE_LIBRARY_NAME} PUBLIC SPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_OFF)
+target_link_libraries(${NATIVE_LIBRARY_NAME} cudf::cudf)
diff --git a/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/src/PlaceholderUDFNameJni.cpp b/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/src/PlaceholderUDFNameJni.cpp
new file mode 100644
index 00000000000..1b07459b9fd
--- /dev/null
+++ b/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/src/PlaceholderUDFNameJni.cpp
@@ -0,0 +1,58 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#include "placeholder_udf_name.hpp"
+
+#include
+#include
+
+#include
+
+#include
+#include
+
+namespace {
+
+constexpr char const* RUNTIME_ERROR_CLASS = "java/lang/RuntimeException";
+constexpr char const* ILLEGAL_ARG_CLASS = "java/lang/IllegalArgumentException";
+
+void throw_java_exception(JNIEnv* env, char const* class_name, char const* message)
+{
+ jclass ex_class = env->FindClass(class_name);
+ if (ex_class != nullptr) {
+ env->ThrowNew(ex_class, message);
+ }
+}
+
+} // namespace
+
+extern "C" {
+
+JNIEXPORT jlong JNICALL
+Java_com_udf_PlaceholderUDFNameNativeRapidsUDF_evaluateNative(JNIEnv* env,
+ jclass,
+ jlong input_view)
+{
+ try {
+ auto input = reinterpret_cast(input_view);
+ if (input == nullptr) {
+ throw_java_exception(env, ILLEGAL_ARG_CLASS, "input column view is null");
+ return 0;
+ }
+
+ std::unique_ptr result = placeholder_udf_name(*input);
+ return reinterpret_cast(result.release());
+ } catch (std::bad_alloc const& e) {
+ auto message = std::string("Unable to allocate native memory: ") + e.what();
+ throw_java_exception(env, RUNTIME_ERROR_CLASS, message.c_str());
+ } catch (std::invalid_argument const& e) {
+ throw_java_exception(env, ILLEGAL_ARG_CLASS, e.what());
+ } catch (std::exception const& e) {
+ throw_java_exception(env, RUNTIME_ERROR_CLASS, e.what());
+ }
+ return 0;
+}
+
+}
diff --git a/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/src/placeholder_udf_name.cu b/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/src/placeholder_udf_name.cu
new file mode 100644
index 00000000000..5e0de8f20ff
--- /dev/null
+++ b/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/src/placeholder_udf_name.cu
@@ -0,0 +1,18 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#include "placeholder_udf_name.hpp"
+
+#include
+#include
+#include
+
+std::unique_ptr placeholder_udf_name(cudf::column_view const& input)
+{
+ // TODO: Replace this placeholder with the actual CUDA/libcudf implementation.
+ auto null_mask = cudf::create_null_mask(input.size(), cudf::mask_state::ALL_NULL);
+ return cudf::make_numeric_column(
+ cudf::data_type{cudf::type_id::INT32}, input.size(), std::move(null_mask), input.size());
+}
diff --git a/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/src/placeholder_udf_name.hpp b/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/src/placeholder_udf_name.hpp
new file mode 100644
index 00000000000..d34ac2f8828
--- /dev/null
+++ b/skills/udf-convert-to-cuda/templates/cuda/native/src/main/cpp/src/placeholder_udf_name.hpp
@@ -0,0 +1,13 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#pragma once
+
+#include
+#include
+
+#include
+
+std::unique_ptr placeholder_udf_name(cudf::column_view const& input);
diff --git a/skills/udf-convert-to-cuda/templates/cuda/src/main/java/com/udf/NativeUDFLoader.java b/skills/udf-convert-to-cuda/templates/cuda/src/main/java/com/udf/NativeUDFLoader.java
new file mode 100644
index 00000000000..d5469882951
--- /dev/null
+++ b/skills/udf-convert-to-cuda/templates/cuda/src/main/java/com/udf/NativeUDFLoader.java
@@ -0,0 +1,29 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package com.udf;
+
+import ai.rapids.cudf.NativeDepsLoader;
+
+import java.io.IOException;
+
+/** Loads JNI libraries packaged in this UDF jar. */
+public final class NativeUDFLoader {
+ private static boolean loaded;
+
+ private NativeUDFLoader() {
+ }
+
+ public static synchronized void ensureLoaded() {
+ if (!loaded) {
+ try {
+ NativeDepsLoader.loadNativeDeps(new String[] {"rapidsudfjni"});
+ loaded = true;
+ } catch (IOException e) {
+ throw new RuntimeException("Failed to load native CUDA UDF library", e);
+ }
+ }
+ }
+}
diff --git a/skills/udf-convert-to-cuda/templates/cuda/src/main/java/com/udf/PlaceholderUDFNameNativeRapidsUDF.java b/skills/udf-convert-to-cuda/templates/cuda/src/main/java/com/udf/PlaceholderUDFNameNativeRapidsUDF.java
new file mode 100644
index 00000000000..8212de0b399
--- /dev/null
+++ b/skills/udf-convert-to-cuda/templates/cuda/src/main/java/com/udf/PlaceholderUDFNameNativeRapidsUDF.java
@@ -0,0 +1,45 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package com.udf;
+
+import ai.rapids.cudf.ColumnVector;
+import com.nvidia.spark.RapidsUDF;
+// TODO: add imports for CPU UDF's base type, e.g.:
+// import org.apache.hadoop.hive.ql.exec.UDF;
+// import org.apache.spark.sql.api.java.UDFn;
+
+/**
+ * Template for a native CUDA RapidsUDF.
+ *
+ * 1. Rename this class and file to {@code NativeRapidsUDF}.
+ * 2. Match the CPU UDF's Spark contract:
+ * - Hive UDF : add {@code extends org.apache.hadoop.hive.ql.exec.UDF}
+ * - Java typed UDF : add {@code implements UDFn} alongside {@code RapidsUDF}
+ * - Scala CPU UDF : implement the equivalent {@code UDFn<...>} contract.
+ * Invoke the Scala UDF via reflection from {@code call(...)}.
+ * 3. Add the CPU evaluation method.
+ * 4. Update {@code evaluateColumnar} and {@code evaluateNative} as needed to match the signature.
+ */
+public class PlaceholderUDFNameNativeRapidsUDF implements RapidsUDF {
+
+ // TODO: copy the original CPU evaluation method here (evaluate / call).
+
+ @Override
+ public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) {
+ if (args.length != 1) {
+ throw new IllegalArgumentException("Unexpected argument count: " + args.length);
+ }
+ if (numRows != args[0].getRowCount()) {
+ throw new IllegalArgumentException(
+ "Expected " + numRows + " rows, received " + args[0].getRowCount());
+ }
+
+ NativeUDFLoader.ensureLoaded();
+ return new ColumnVector(evaluateNative(args[0].getNativeView()));
+ }
+
+ private static native long evaluateNative(long inputView);
+}
diff --git a/skills/udf-convert-to-cudf/SKILL.md b/skills/udf-convert-to-cudf/SKILL.md
new file mode 100644
index 00000000000..58d4b01d2b7
--- /dev/null
+++ b/skills/udf-convert-to-cudf/SKILL.md
@@ -0,0 +1,128 @@
+---
+name: udf-convert-to-cudf
+description: Assists with converting an Apache Spark UDF to a GPU-accelerated RapidsUDF using cuDF Java APIs. This is step 2 of 3 in the UDF conversion workflow (udf-gen-test -> udf-convert-to-cudf -> udf-benchmark). Use this skill when you have a CPU UDF with a unit test and need to convert it to a RapidsUDF.
+license: CC-BY-4.0 AND Apache-2.0
+metadata:
+ spdx-file-copyright-text: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+model: inherit
+---
+
+# Convert UDF to cuDF RapidsUDF
+
+## Workflow
+
+- [ ] Step 1: Create the RapidsUDF file
+- [ ] Step 2: Implement the `evaluateColumnar` method
+- [ ] Step 3: Build and test
+- [ ] Step 4: Check for memory leaks
+- [ ] Step 5: Run judge subagent if requested
+- [ ] Step 6: Review conversion
+
+**Before making any edits, create a visible TODO checklist for every workflow step in this skill and keep it updated.** Do not produce a final answer until every required checklist item is marked complete.
+
+## Prerequisites
+
+- Project directory from Step 1 (udf-gen-test) with passing unit test
+
+Derive `` and `` from the UDF class name.
+
+> **Note:** Commands require access to `/tmp` (Spark temp storage) and `/dev` (GPU device). If commands fail due to sandbox restrictions, re-run them unsandboxed.
+
+## Step 1: Create the RapidsUDF File
+
+Create a copy of the original UDF file in the same source directory (`src/main//com/udf/`), then modify it:
+
+1. Add imports:
+ Java: `import ai.rapids.cudf.*;`, `import com.nvidia.spark.RapidsUDF;`
+ Scala: `import ai.rapids.cudf._`, `import com.nvidia.spark.RapidsUDF`, `import Arm.{withResource, closeOnExcept}`
+2. Add `implements RapidsUDF` to the class declaration
+3. Add the `evaluateColumnar` method stub:
+ Java: `public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) { }`
+ Scala: `def evaluateColumnar(numRows: Int, args: ColumnVector*): ColumnVector = { }`
+4. Rename the class and the file to `RapidsUDF`
+
+## Step 2: Implement the `evaluateColumnar` method
+
+### Background
+
+**Read `references/RAPIDS_UDF.md`** for detailed background on:
+- How RapidsUDF and `evaluateColumnar` work
+- Input ColumnVector types and output type mapping
+- Debugging techniques and GPU memory management
+
+**Read `examples/` for example RapidsUDF implementations for the target language.**
+
+### Implementation
+
+1. Clone https://github.com/rapidsai/cudf (branch matching spark-rapids version) to `~/.cache/aether_agent/` if not already present. Explore `java/src//java/ai/rapids/cudf` for relevant methods and usage patterns.
+2. Implement the `evaluateColumnar` method using cuDF APIs.
+
+### Critical Requirements
+
+- **NEVER use `copyToHost()` or methods that copy data GPU→CPU.** This defeats the purpose of GPU acceleration
+- **Do NOT hardcode test values.** The RapidsUDF must implement actual business logic for ANY potential input
+
+## Step 3: Build and Test
+
+Fill in the target-specific TODOs in `src/test//com/udf/CudfComparisonTest.`:
+- Implement `registerRapidsUDF` to register the new RapidsUDF class.
+- Replace placeholders with the actual camel/snake UDF name
+
+Then run the test:
+```bash
+# Java
+mvn test -Dtest=CudfComparisonTest
+
+# Scala
+mvn test -Dsuites=com.udf.CudfComparisonTest
+```
+
+If the test fails, analyze the error and iterate on the RapidsUDF implementation.
+
+### Difficult Test Failures
+
+Treat the unit test as the CPU behavior specification. Do not weaken or remove test cases silently.
+
+- Tests that check for CPU errors may not be directly applicable to a columnar implementation: the GPU path typically evaluates a whole column and may produce nulls for invalid rows instead of throwing row-level exceptions. If this causes an unavoidable mismatch, add a clear comment in the test and a `TODO/NOTE` in the implementation explaining the mismatch.
+- If a test case does not pass because of inherent cuDF/libcudf/API limitations or low-level GPU/CPU semantic differences, comment out the conflicting assertion/test only after documenting how you tried to make the behavior match and why those attempts failed. Add a note to the user.
+- If the behavior is important, common, or part of the documented input domain, **always prefer fixing the implementation** over commenting out the test case. The exception is a performance-vs-correctness tradeoff that the user explicitly approves.
+
+## Step 4: Memory Leak Check
+
+Re-run with memory leak detection:
+```bash
+# Java
+mvn test -Dtest=CudfComparisonTest -Ddebug.memory.leaks=true > /tmp/memleak.log 2>&1
+
+# Scala
+mvn test -Dsuites=com.udf.CudfComparisonTest -Ddebug.memory.leaks=true > /tmp/memleak.log 2>&1
+
+# Check for leaks
+grep "LEAKED" /tmp/memleak.log | head -5
+```
+
+If leaks are found, ensure all GPU objects are properly closed.
+
+## Step 5: Run Judge Subagent If Requested
+
+If the user explicitly asked for the judge, a judge subagent, or a review agent, treat that as an explicit request for delegation: you **MUST** launch a separate subagent with `model: inherit` and instruct it to use the **udf-judge-conversion** skill. Ask it to review the `UnitTest`, `CudfComparisonTest`, and RapidsUDF implementation.
+
+If the user did not request a judge/review agent, mark this step as skipped and continue to Step 6. If a required judge subagent is blocked by tool policy, stop and tell the user that explicit permission/instruction is needed.
+
+If you run the judge, wait for it to complete and review its report. If the judge finds any issues, 1) fix the issues, 2) re-run the tests and leak checks, and 3) re-run the judge subagent.
+
+## Step 6: Review Conversion
+
+Review your own work to ensure:
+- The test runs on the GPU and directly compares CPU-GPU outputs
+- The implementation does not overfit to test cases
+- No `copyToHost()` or row-by-row GPU-to-CPU copying is used for computation
+- No debug statements (e.g., `TableDebug.get().debug(...)`) remain in final output
+
+## Output
+
+Upon successful completion:
+- RapidsUDF file at `src/main//com/udf/RapidsUDF.`
+- Comparison test passes with no memory leaks
+
+These outputs are required for **Step 3: Benchmark**.
diff --git a/skills/udf-convert-to-cudf/examples/URLDecode.java b/skills/udf-convert-to-cudf/examples/URLDecode.java
new file mode 100644
index 00000000000..7122e15a68e
--- /dev/null
+++ b/skills/udf-convert-to-cudf/examples/URLDecode.java
@@ -0,0 +1,57 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import ai.rapids.cudf.*;
+import com.nvidia.spark.RapidsUDF;
+import org.apache.spark.sql.api.java.UDF1;
+
+import java.io.UnsupportedEncodingException;
+import java.net.URLDecoder;
+
+/** Decode URL-encoded strings. */
+public class URLDecode implements UDF1, RapidsUDF {
+ /** Row-by-row implementation that executes on the CPU */
+ @Override
+ public String call(String s) {
+ String result = null;
+ if (s != null) {
+ try {
+ result = URLDecoder.decode(s, "utf-8");
+ } catch (IllegalArgumentException ignored) {
+ result = s;
+ } catch (UnsupportedEncodingException e) {
+ // utf-8 is a builtin, standard encoding, so this should never happen
+ throw new RuntimeException(e);
+ }
+ }
+ return result;
+ }
+
+ /** Columnar implementation that runs on the GPU */
+ @Override
+ public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) {
+ // The CPU implementation takes a single string argument, so similarly
+ // there should only be one column argument of type STRING.
+ if (args.length != 1) {
+ throw new IllegalArgumentException("Unexpected argument count: " + args.length);
+ }
+ ColumnVector input = args[0];
+ if (numRows != input.getRowCount()) {
+ throw new IllegalArgumentException("Expected " + numRows + " rows, received " + input.getRowCount());
+ }
+ if (!input.getType().equals(DType.STRING)) {
+ throw new IllegalArgumentException("Argument type is not a string column: " +
+ input.getType());
+ }
+
+ // The cudf urlDecode does not convert '+' to a space, so do that as a pre-pass first.
+ // All intermediate results are closed to avoid leaking GPU resources.
+ try (Scalar plusScalar = Scalar.fromString("+");
+ Scalar spaceScalar = Scalar.fromString(" ");
+ ColumnVector replaced = input.stringReplace(plusScalar, spaceScalar)) {
+ return replaced.urlDecode();
+ }
+ }
+}
diff --git a/skills/udf-convert-to-cudf/examples/URLDecodeExtendsFunction.scala b/skills/udf-convert-to-cudf/examples/URLDecodeExtendsFunction.scala
new file mode 100644
index 00000000000..4a5e4f086f0
--- /dev/null
+++ b/skills/udf-convert-to-cudf/examples/URLDecodeExtendsFunction.scala
@@ -0,0 +1,44 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import java.net.URLDecoder
+
+import ai.rapids.cudf._
+import com.nvidia.spark.RapidsUDF
+import Arm.{withResource, closeOnExcept}
+
+/** Decode URL-encoded strings. */
+class URLDecode extends Function1[String, String] with RapidsUDF with Serializable {
+ /** Row-by-row implementation that executes on the CPU */
+ override def apply(s: String): String = {
+ Option(s).map { s =>
+ try {
+ URLDecoder.decode(s, "utf-8")
+ } catch {
+ case _: IllegalArgumentException => s
+ }
+ }.orNull
+ }
+
+ /** Columnar implementation that runs on the GPU */
+ override def evaluateColumnar(numRows: Int, args: ColumnVector*): ColumnVector = {
+ // The CPU implementation takes a single string argument, so similarly
+ // there should only be one column argument of type STRING.
+ require(args.length == 1, s"Unexpected argument count: ${args.length}")
+ val input = args.head
+ require(numRows == input.getRowCount, s"Expected $numRows rows, received ${input.getRowCount}")
+ require(input.getType == DType.STRING, s"Argument type is not a string: ${input.getType}")
+
+ // The cudf urlDecode does not convert '+' to a space, so do that as a pre-pass first.
+ // All intermediate results are closed using withResource to avoid leaking GPU resources.
+ withResource(Scalar.fromString("+")) { plusScalar =>
+ withResource(Scalar.fromString(" ")) { spaceScalar =>
+ withResource(input.stringReplace(plusScalar, spaceScalar)) { replaced =>
+ replaced.urlDecode()
+ }
+ }
+ }
+ }
+}
diff --git a/skills/udf-convert-to-cudf/examples/URLDecodeHive.java b/skills/udf-convert-to-cudf/examples/URLDecodeHive.java
new file mode 100644
index 00000000000..d5b571e7085
--- /dev/null
+++ b/skills/udf-convert-to-cudf/examples/URLDecodeHive.java
@@ -0,0 +1,57 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import ai.rapids.cudf.*;
+import com.nvidia.spark.RapidsUDF;
+import org.apache.hadoop.hive.ql.exec.UDF;
+
+import java.io.UnsupportedEncodingException;
+import java.net.URLDecoder;
+
+/** Decode URL-encoded strings. */
+public class URLDecode extends UDF implements RapidsUDF {
+
+ /** Row-by-row implementation that executes on the CPU */
+ public String evaluate(String s) {
+ String result = null;
+ if (s != null) {
+ try {
+ result = URLDecoder.decode(s, "utf-8");
+ } catch (IllegalArgumentException ignored) {
+ result = s;
+ } catch (UnsupportedEncodingException e) {
+ // utf-8 is a builtin, standard encoding, so this should never happen
+ throw new RuntimeException(e);
+ }
+ }
+ return result;
+ }
+
+ /** Columnar implementation that runs on the GPU */
+ @Override
+ public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) {
+ // The CPU implementation takes a single string argument, so similarly
+ // there should only be one column argument of type STRING.
+ if (args.length != 1) {
+ throw new IllegalArgumentException("Unexpected argument count: " + args.length);
+ }
+ ColumnVector input = args[0];
+ if (numRows != input.getRowCount()) {
+ throw new IllegalArgumentException("Expected " + numRows + " rows, received " + input.getRowCount());
+ }
+ if (!input.getType().equals(DType.STRING)) {
+ throw new IllegalArgumentException("Argument type is not a string column: " +
+ input.getType());
+ }
+
+ // The cudf urlDecode does not convert '+' to a space, so do that as a pre-pass first.
+ // All intermediate results are closed to avoid leaking GPU resources.
+ try (Scalar plusScalar = Scalar.fromString("+");
+ Scalar spaceScalar = Scalar.fromString(" ");
+ ColumnVector replaced = input.stringReplace(plusScalar, spaceScalar)) {
+ return replaced.urlDecode();
+ }
+ }
+}
diff --git a/skills/udf-convert-to-cudf/examples/URLDecodeWithField.scala b/skills/udf-convert-to-cudf/examples/URLDecodeWithField.scala
new file mode 100644
index 00000000000..7dc4f122dcd
--- /dev/null
+++ b/skills/udf-convert-to-cudf/examples/URLDecodeWithField.scala
@@ -0,0 +1,48 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import java.net.URLDecoder
+
+import ai.rapids.cudf._
+import com.nvidia.spark.RapidsUDF
+import Arm.{withResource, closeOnExcept}
+
+/** Decode URL-encoded strings. */
+object URLDecode {
+ val myUDF = udf(
+ new Function1[String, String] with RapidsUDF with Serializable {
+ /** Row-by-row implementation that executes on the CPU */
+ override def apply(s: String): String = {
+ Option(s).map { s =>
+ try {
+ URLDecoder.decode(s, "utf-8")
+ } catch {
+ case _: IllegalArgumentException => s
+ }
+ }.orNull
+ }
+
+ /** Columnar implementation that runs on the GPU */
+ override def evaluateColumnar(numRows: Int, args: ColumnVector*): ColumnVector = {
+ // The CPU implementation takes a single string argument, so similarly
+ // there should only be one column argument of type STRING.
+ require(args.length == 1, s"Unexpected argument count: ${args.length}")
+ val input = args.head
+ require(numRows == input.getRowCount, s"Expected $numRows rows, received ${input.getRowCount}")
+ require(input.getType == DType.STRING, s"Argument type is not a string: ${input.getType}")
+
+ // The cudf urlDecode does not convert '+' to a space, so do that as a pre-pass first.
+ // All intermediate results are closed using withResource to avoid leaking GPU resources.
+ withResource(Scalar.fromString("+")) { plusScalar =>
+ withResource(Scalar.fromString(" ")) { spaceScalar =>
+ withResource(input.stringReplace(plusScalar, spaceScalar)) { replaced =>
+ replaced.urlDecode()
+ }
+ }
+ }
+ }
+ }
+ )
+}
diff --git a/skills/udf-convert-to-cudf/references/RAPIDS_UDF.md b/skills/udf-convert-to-cudf/references/RAPIDS_UDF.md
new file mode 100644
index 00000000000..737aa640d41
--- /dev/null
+++ b/skills/udf-convert-to-cudf/references/RAPIDS_UDF.md
@@ -0,0 +1,111 @@
+
+
+# Background: RAPIDS Accelerated UDFs
+
+These instructions document how to implement a GPU version of an existing CPU UDF using the RapidsUDF interface. The RapidsUDF interface provides a way to run a CPU UDF on the GPU when using the RAPIDS Accelerator for Apache Spark.
+
+## Implementation
+
+The original CPU implementation is in the `evaluate` method. To make a UDF run on the GPU, you must implement the RapidsUDF interface, which provides a single method you need to override called `evaluateColumnar`. The `evaluateColumnar` function should use pre-existing cuDF methods from the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy) to perform the UDF computation by operating on cudF ColumnVectors.
+
+Note that you must keep both CPU and GPU evaluate methods, so that the UDF will still work if a higher-level operation involving the Rapids UDF falls back to the CPU.
+
+Refer to examples/ for example RapidsUDF implementations.
+
+## Interpreting Inputs
+
+The RAPIDS Accelerator will pass columnar forms of the same inputs for the CPU version of the UDF into the `args` array. For example, if the CPU UDF expects two inputs, a String and an Integer, then the evaluateColumnar method will be invoked with an array of two cuDF ColumnVector instances of type STRING and INT32 respectively.
+
+Note that passing scalar inputs to a RAPIDS accelerated UDF is supported with limitations. The scalar value will be replicated into a full column before being passed to evaluateColumnar. Therefore the UDF implementation cannot easily detect the difference between a scalar input and a columnar input.
+
+The implementation of evaluateColumnar must return a column with the specified numRows, equal to the input number of rows. All input columns will contain the same number of rows.
+
+## Generating output
+
+evaluateColumnar must return a ColumnVector of an appropriate cuDF type to match the result type of the original UDF.
+
+The following table shows the mapping of Spark types to equivalent cuDF columnar types:
+
+| Spark Type | cuDF Type |
+|---------------|--------------------------------------------|
+| BooleanType | BOOL8 |
+| ByteType | INT8 |
+| ShortType | INT16 |
+| IntegerType | INT32 |
+| LongType | INT64 |
+| FloatType | FLOAT32 |
+| DoubleType | FLOAT64 |
+| DecimalType | DECIMAL32, DECIMAL64, DECIMAL128 * |
+| DateType | TIMESTAMP_DAYS |
+| TimestampType | TIMESTAMP_MICROSECONDS |
+| StringType | STRING |
+| NullType | INT8 |
+| ArrayType | LIST of the underlying element type |
+| MapType | LIST of STRUCT of the key and value types |
+| StructType | STRUCT of all the field types |
+
+For example, if the CPU UDF returns the Spark type `ArrayType(MapType(StringType, StringType))` then evaluateColumnar must return a column of type `LIST(LIST(STRUCT(STRING,STRING)))`.
+
+*Note: cuDF's DECIMAL32 corresponds to precision <= 9 digits, DECIMAL64 corresponds to 9 < precision <= 18 digits, and DECIMAL128 corresponds to 18 < precision <= 38 digits. Precision greater than 38 digits is unsupported.
+
+Note that cuDF decimals use a negative scale relative to Spark DecimalType. For example, Spark DecimalType(precision=11, scale=2) would translate to cuDF type DECIMAL64(scale=-2).
+
+## Debugging
+
+When debugging, it may be helpful to print data type information about cuDF objects. For example, to get information about a ColumnVector:
+
+```java
+System.out.println("Param 1 info:" + param1Column);
+```
+
+Example output:
+
+```text
+Param 1 info: ColumnVector{rows=10, type=INT32, nullCount=Optional.empty, offHeap=(ID: 880 7d1d4c5951e0)}
+```
+
+To print the actual values in a column or table, use `TableDebug`:
+
+```java
+TableDebug debugger = TableDebug.get();
+debugger.debug("Param 1 data:", param1Column);
+```
+
+Note that you should NEVER call this from production code, since it causes a device-to-host copy.
+
+## Managing Memory
+
+The Java memory model is not friendly for doing GPU operations because the JVM makes the assumption that everything we're trying to do is in heap memory. **Therefore, you must free the GPU resources in a timely manner with try-finally blocks**, calling `close()` to release GPU resources and `incRefCount()` to increment reference counts.
+
+The JVM's garbage collector is generally triggered when the JVM heap runs out of free space, but not necessarily when the GPU memory runs out.
+To prevent these GPU memory leaks, the cuDF Java code tracks these objects, and if the garbage collector causes the memory to be freed instead of a proper close, it will output a warning like the following:
+
+```text
+ERROR ColumnVector: A DEVICE COLUMN VECTOR WAS LEAKED (ID: 15 7fb5f94d8fa0)
+```
+
+These messages are an indication that an object on the GPU was not properly closed. Once a leak is detected, the Spark driver/executor `extraJavaOptions` can be set to `-Dai.rapids.refcount.debug=true -ea` to get a stack trace for the leak.
+
+The user will run the unit test and provide tracebacks if memory leaks occur to help you debug the issue.
+
+For Scala, use `withResource` and `closeOnExcept` from the `Arm` object for resource management.
+
+**Note:** Avoid placing the input ColumnVectors (those passed in `args`) in try-finally or try-with-resources blocks. The RAPIDS Accelerator will close the input columns for you. For example, avoid doing this:
+
+```java
+ColumnVector param1 = args[0];
+try {
+ // Do something with param1
+} finally {
+ param1.close();
+}
+```
+
+This will result in a double-close error:
+
+```text
+java.lang.IllegalStateException: Close called too many times ColumnVector{rows=10, type=INT32, nullCount=Optional.empty, offHeap=(ID: 637 0)}
+```
diff --git a/skills/udf-convert-to-sql/SKILL.md b/skills/udf-convert-to-sql/SKILL.md
new file mode 100644
index 00000000000..a55f464e555
--- /dev/null
+++ b/skills/udf-convert-to-sql/SKILL.md
@@ -0,0 +1,87 @@
+---
+name: udf-convert-to-sql
+description: Assists with converting an Apache Spark UDF to a functionally equivalent Spark SQL expression. This is step 2 of 3 in the UDF conversion workflow (udf-gen-test -> udf-convert-to-sql -> udf-benchmark). Use this skill when you have a CPU UDF with a unit test and need to convert it to SQL for GPU acceleration.
+license: CC-BY-4.0 AND Apache-2.0
+metadata:
+ spdx-file-copyright-text: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+model: inherit
+---
+
+# Convert UDF to Spark SQL
+
+## Workflow
+
+- [ ] Step 1: Implement the SQL expression
+- [ ] Step 2: Fill in the comparison test and iterate
+- [ ] Step 3: Run judge subagent if requested
+- [ ] Step 4: Review conversion
+
+**Before making any edits, create a visible TODO checklist for every workflow step in this skill and keep it updated.** Do not produce a final answer until every required checklist item is marked complete.
+
+## Prerequisites
+
+- Project directory from Step 1 (udf-gen-test) with passing unit test
+
+Derive `` and `` from the UDF class name.
+
+> **Note:** Commands require access to `/tmp` (Spark temp storage) and `/dev` (GPU device). If commands fail due to sandbox restrictions, re-run them unsandboxed.
+
+## Step 1: Implement the SQL Expression
+
+Implement the SQL expression in a file at `src/main/resources/.sql`.
+
+**Read `examples/` for example UDF-to-SQL conversions for the target language.**
+
+### Guidelines
+
+- Focus on correctness FIRST, then GPU compatibility — the test will report which operators are not GPU-compatible
+- Avoid expensive joins; prefer window functions, CTEs, and built-in array/map functions over explode-and-aggregate patterns
+
+**Do NOT hardcode test sample values or outputs.** The SQL expression must work correctly for ANY potential input.
+
+## Step 2: Fill in test and iterate
+
+Update `src/test//com/udf/SqlComparisonTest.`:
+- Update the SQL file path to point to your `src/main/resources/.sql` file
+- Replace placeholders with the actual camel/snake UDF name
+
+Then run the test:
+```bash
+# Java
+mvn test -Dtest=SqlComparisonTest
+
+# Scala
+mvn test -Dsuites=com.udf.SqlComparisonTest
+```
+
+If the test fails, analyze the error and iterate on the SQL expression.
+
+### Difficult Test Failures
+
+Treat the unit test as the CPU behavior specification. Do not weaken or remove test cases silently.
+
+- Tests that check for CPU errors may not be directly applicable to SQL operators: Spark RAPIDS typically evaluates a whole column/batch and may produce nulls for invalid rows instead of throwing one row-level exception. Make an explicit judgment call about the UDF contract. Add a clear comment in the test and a `TODO/NOTE` in the SQL statement explaining the mismatch.
+- In rare cases, the Spark RAPIDS Plugin has known discrepancies in certain SQL operators. If a test case does not pass because of these discrepancies, notify the user and comment out the conflicting assertion/test only after documenting how you tried to make the behavior match and why those attempts failed.
+- If the behavior is important, common, or part of the documented input domain, **always prefer fixing the SQL expression** over commenting out the test case. The exception is a performance-vs-correctness tradeoff that the user explicitly approves.
+
+## Step 3: Run Judge Subagent If Requested
+
+If the user explicitly asked for the judge, a judge subagent, or a review agent, treat that as an explicit request for delegation: you **MUST** launch a separate subagent with `model: inherit` and instruct it to use the **udf-judge-conversion** skill. Ask it to review the `UnitTest`, `SqlComparisonTest`, and SQL expression.
+
+If the user did not request a judge/review agent, mark this step as skipped and continue to Step 4. If a required judge subagent is blocked by tool policy, stop and tell the user that explicit permission/instruction is needed.
+
+If you run the judge, wait for it to complete and review its report. If the judge finds any issues, 1) fix the issues, 2) re-run the tests, and 3) re-run the judge subagent.
+
+## Step 4: Review Conversion
+
+Review your own work to ensure:
+- The test runs on the GPU and directly compares CPU-SQL outputs
+- The implementation does not overfit to test cases
+
+## Output
+
+Upon successful completion:
+- SQL file at `src/main/resources/.sql`
+- Comparison test passes
+
+These outputs are required for **Step 3: Benchmark**.
diff --git a/skills/udf-convert-to-sql/examples/FormatPhone.java b/skills/udf-convert-to-sql/examples/FormatPhone.java
new file mode 100644
index 00000000000..1b0d227199a
--- /dev/null
+++ b/skills/udf-convert-to-sql/examples/FormatPhone.java
@@ -0,0 +1,27 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import org.apache.spark.sql.api.java.UDF1;
+
+/**
+ * Strip non-digit characters and format as (XXX) XXX-XXXX.
+ * See format_phone.sql for equivalent SQL expression.
+ */
+public class FormatPhone implements UDF1 {
+ @Override
+ public String call(String phone) throws Exception {
+ if (phone == null) {
+ return null;
+ }
+ String digits = phone.replaceAll("[^0-9]", "");
+ if (digits.length() != 10) {
+ return null;
+ }
+ return String.format("(%s) %s-%s",
+ digits.substring(0, 3),
+ digits.substring(3, 6),
+ digits.substring(6));
+ }
+}
diff --git a/skills/udf-convert-to-sql/examples/FormatPhone.scala b/skills/udf-convert-to-sql/examples/FormatPhone.scala
new file mode 100644
index 00000000000..0aebe0ef11d
--- /dev/null
+++ b/skills/udf-convert-to-sql/examples/FormatPhone.scala
@@ -0,0 +1,22 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import org.apache.spark.sql.functions.udf
+
+/**
+ * Strip non-digit characters and format as (XXX) XXX-XXXX.
+ * See format_phone.sql for equivalent SQL expression.
+ */
+object FormatPhone {
+ val formatPhone = udf((phone: String) => {
+ Option(phone).flatMap { p =>
+ val digits = p.replaceAll("[^0-9]", "")
+ if (digits.length == 10)
+ Some(s"($${digits.substring(0, 3)}) $${digits.substring(3, 6)}-$${digits.substring(6)}")
+ else
+ None
+ }.orNull
+ })
+}
diff --git a/skills/udf-convert-to-sql/examples/FormatPhoneHive.java b/skills/udf-convert-to-sql/examples/FormatPhoneHive.java
new file mode 100644
index 00000000000..4609b7254ee
--- /dev/null
+++ b/skills/udf-convert-to-sql/examples/FormatPhoneHive.java
@@ -0,0 +1,26 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import org.apache.hadoop.hive.ql.exec.UDF;
+
+/**
+ * Strip non-digit characters and format as (XXX) XXX-XXXX.
+ * See format_phone.sql for equivalent SQL expression.
+ */
+public class FormatPhone extends UDF {
+ public String evaluate(String phone) {
+ if (phone == null) {
+ return null;
+ }
+ String digits = phone.replaceAll("[^0-9]", "");
+ if (digits.length() != 10) {
+ return null;
+ }
+ return String.format("(%s) %s-%s",
+ digits.substring(0, 3),
+ digits.substring(3, 6),
+ digits.substring(6));
+ }
+}
diff --git a/skills/udf-convert-to-sql/examples/NormalizeTags.java b/skills/udf-convert-to-sql/examples/NormalizeTags.java
new file mode 100644
index 00000000000..152d63bb480
--- /dev/null
+++ b/skills/udf-convert-to-sql/examples/NormalizeTags.java
@@ -0,0 +1,37 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import org.apache.spark.sql.api.java.UDF1;
+import scala.collection.Seq;
+import scala.collection.Iterator;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.TreeSet;
+
+/**
+ * Lowercase, deduplicate, and sort a variable-length tag array.
+ * See normalize_tags.sql for equivalent SQL expression.
+ */
+public class NormalizeTags implements UDF1, List> {
+ @Override
+ public List call(Seq tags) throws Exception {
+ if (tags == null) {
+ return null;
+ }
+ TreeSet result = new TreeSet<>();
+ Iterator it = tags.iterator();
+ while (it.hasNext()) {
+ String tag = it.next();
+ if (tag != null) {
+ String stripped = tag.replaceAll("^ +| +$", "").toLowerCase();
+ if (!stripped.isEmpty()) {
+ result.add(stripped);
+ }
+ }
+ }
+ return result.isEmpty() ? null : new ArrayList<>(result);
+ }
+}
diff --git a/skills/udf-convert-to-sql/examples/NormalizeTags.scala b/skills/udf-convert-to-sql/examples/NormalizeTags.scala
new file mode 100644
index 00000000000..92a2eee6954
--- /dev/null
+++ b/skills/udf-convert-to-sql/examples/NormalizeTags.scala
@@ -0,0 +1,25 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import org.apache.spark.sql.expressions.UserDefinedFunction
+import org.apache.spark.sql.functions.udf
+
+/**
+ * Lowercase, deduplicate, and sort a variable-length tag array.
+ * See normalize_tags.sql for equivalent SQL expression.
+ */
+object NormalizeTags {
+ val normalizeTags: UserDefinedFunction = udf((tags: Seq[String]) => {
+ Option(tags).flatMap { ts =>
+ val cleaned = ts
+ .filter(_ != null)
+ .map(_.replaceAll("^ +| +$", "").toLowerCase)
+ .filter(_.nonEmpty)
+ .distinct
+ .sorted
+ if (cleaned.isEmpty) None else Some(cleaned)
+ }.orNull
+ })
+}
diff --git a/skills/udf-convert-to-sql/examples/NormalizeTagsHive.java b/skills/udf-convert-to-sql/examples/NormalizeTagsHive.java
new file mode 100644
index 00000000000..058bd210c5e
--- /dev/null
+++ b/skills/udf-convert-to-sql/examples/NormalizeTagsHive.java
@@ -0,0 +1,32 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import org.apache.hadoop.hive.ql.exec.UDF;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.TreeSet;
+
+/**
+ * Lowercase, deduplicate, and sort a variable-length tag array.
+ * See normalize_tags.sql for equivalent SQL expression.
+ */
+public class NormalizeTags extends UDF {
+ public List evaluate(List tags) {
+ if (tags == null) {
+ return null;
+ }
+ TreeSet result = new TreeSet<>();
+ for (String tag : tags) {
+ if (tag != null) {
+ String stripped = tag.replaceAll("^ +| +$", "").toLowerCase();
+ if (!stripped.isEmpty()) {
+ result.add(stripped);
+ }
+ }
+ }
+ return result.isEmpty() ? null : new ArrayList<>(result);
+ }
+}
diff --git a/skills/udf-convert-to-sql/examples/format_phone.sql b/skills/udf-convert-to-sql/examples/format_phone.sql
new file mode 100644
index 00000000000..6a35040c0e7
--- /dev/null
+++ b/skills/udf-convert-to-sql/examples/format_phone.sql
@@ -0,0 +1,17 @@
+-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+-- SPDX-License-Identifier: Apache-2.0
+
+SELECT
+ CASE
+ WHEN phone IS NULL THEN NULL
+ WHEN LENGTH(REGEXP_REPLACE(phone, '[^0-9]', '')) != 10 THEN NULL
+ ELSE CONCAT(
+ '(',
+ SUBSTR(REGEXP_REPLACE(phone, '[^0-9]', ''), 1, 3),
+ ') ',
+ SUBSTR(REGEXP_REPLACE(phone, '[^0-9]', ''), 4, 3),
+ '-',
+ SUBSTR(REGEXP_REPLACE(phone, '[^0-9]', ''), 7, 4)
+ )
+ END AS result
+FROM __table__
diff --git a/skills/udf-convert-to-sql/examples/normalize_tags.sql b/skills/udf-convert-to-sql/examples/normalize_tags.sql
new file mode 100644
index 00000000000..385d6adfca8
--- /dev/null
+++ b/skills/udf-convert-to-sql/examples/normalize_tags.sql
@@ -0,0 +1,15 @@
+-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+-- SPDX-License-Identifier: Apache-2.0
+
+SELECT
+ CASE
+ WHEN tags IS NULL THEN NULL
+ WHEN SIZE(FILTER(tags, x -> x IS NOT NULL AND TRIM(x) != '')) = 0 THEN NULL
+ ELSE ARRAY_SORT(ARRAY_DISTINCT(
+ TRANSFORM(
+ FILTER(tags, x -> x IS NOT NULL AND TRIM(x) != ''),
+ x -> LOWER(TRIM(x))
+ )
+ ))
+ END AS result
+FROM __table__
diff --git a/skills/udf-gen-test/SKILL.md b/skills/udf-gen-test/SKILL.md
new file mode 100644
index 00000000000..44b668dfd12
--- /dev/null
+++ b/skills/udf-gen-test/SKILL.md
@@ -0,0 +1,148 @@
+---
+name: udf-gen-test
+description: Assists with generating a unit test for an Apache Spark UDF. This is step 1 of 3 in the UDF conversion workflow (udf-gen-test -> udf-convert-to-* -> udf-benchmark). Use this skill when you have a CPU UDF and need to create a unit test for the UDF before converting it into a GPU-compatible implementation.
+license: CC-BY-4.0 AND Apache-2.0
+metadata:
+ spdx-file-copyright-text: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+model: inherit
+---
+
+# UDF Unit Test Generation
+
+## Workflow
+
+- [ ] Step 1: Set up project (copy template, add UDF source)
+- [ ] Step 2: Implement the unit test (fill in TODO methods)
+- [ ] Step 3: Compile and test until passing
+- [ ] Step 4: Run coverage and inspect gaps
+- [ ] Step 5: Verify outputs
+
+**Before making any edits, create a visible TODO checklist for every workflow step in this skill and keep it updated.** Do not produce a final answer until every required checklist item is marked complete.
+
+## Prerequisites
+
+- Path to the input UDF file (Java or Scala)
+
+Derive `` and `` from the UDF class name.
+
+> **Note:** Commands require access to `/tmp` (Spark temp storage) and `/dev` (GPU device). If commands fail due to sandbox restrictions, re-run them unsandboxed.
+
+## Step 1: Set Up the Project
+
+### 1a. Copy the template project
+
+The project can be found under this skill's templates directory.
+```bash
+cp -r templates///
+```
+
+This provides a complete Maven project with all test and benchmark infrastructure.
+
+### 1b. Copy or extract the UDF source
+
+Before copying code, decide whether the input UDF is already self-contained:
+- If the UDF file contains only the target UDF and local helpers it directly needs, copy it as-is.
+- If the UDF is part of a larger project or a file containing unrelated UDFs/classes, extract only the target UDF class/object and all local helper classes/methods required for that UDF to compile and run (modifying package declarations as needed).
+
+The template project should contain the smallest self-contained implementation of the target CPU UDF.
+
+Place the resulting source file(s) in the source directory:
+- Java: `/src/main/java/com/udf/`
+- Scala: `/src/main/scala/com/udf/`
+
+Set the package declaration to `com.udf`:
+- Java: `package com.udf;`
+- Scala: `package com.udf`
+
+## Step 2: Implement the Unit Test
+
+Read `src/test//com/udf/UnitTest.`. Replace placeholders with the actual camel/snake UDF name.
+
+Fill in the TODO methods following the docstrings. Include diverse edge cases in `createTestData` (nulls, empty strings, malformed inputs, varying lengths).
+
+### Test Data Coverage
+
+The generated tests should serve as a strong specification of the CPU UDF behavior over a documented input domain, and are intended to prove that a GPU or SQL implementation preserves the CPU UDF behavior.
+For each input type and visible UDF branch, include applicable examples from these coverage dimensions:
+- null inputs and null elements
+- empty strings, arrays, maps, or structs
+- malformed or unparsable inputs
+- edges of input boundaries, such as min/max valid values, string length, or array length
+- numeric sign/identity cases, such as negative, zero, and positive values
+- string variety, such as unicode, ASCII, and encoding-sensitive inputs
+- date/time boundaries, such as epoch, end-of-day/month/year, leap day, and DST/timezone transitions
+- decimal precision and scale
+- duplicate rows and repeated values
+- mixed valid/invalid rows in the same DataFrame
+- nested empty and nested null values
+
+Assertions should verify schema, row count, deterministic ordering, output values, null propagation, and exception/default behavior. Every visible UDF branch should be covered by the unit test or explicitly documented as out of scope.
+
+### Critical Requirements
+
+- Do NOT hardcode the UDF name; use the provided `udfName` argument. This ensures the correct registered UDF is exercised.
+- Assume the user's UDF implementation is correct; the assertions should reflect its actual behavior.
+
+## Step 3: Compile and Test
+
+```bash
+# Java
+mvn test -Dtest=UnitTest
+
+# Scala
+mvn test -Dsuites=com.udf.UnitTest
+```
+
+If it fails, analyze the error output (stdout/stderr) and fix the test code. Continue iterating until the test passes.
+
+## Step 4: Coverage Report
+
+The template projects use JaCoCo (Java) / scoverage (Scala) code coverage tools.
+
+```bash
+# Java
+mvn -Pcoverage test jacoco:report -Dtest=UnitTest
+
+# Scala
+mvn -Pcoverage scoverage:report -Dsuites=com.udf.UnitTest
+```
+
+For Java, read `target/site/jacoco/jacoco.csv` and inspect LINE, BRANCH, and METHOD counters for the target CPU UDF class and local helper classes. In `jacoco.xml`, counters appear as `` elements, and source-line misses appear under ``.
+
+For Scala, read `target/scoverage.xml` and inspect statement, branch, and method-level coverage for the target CPU UDF class/object and local helper classes/objects. scoverage XML stores package/class/method `statement-rate` and `branch-rate` attributes, and each executable statement has `line`, `branch`, and `invocation-count` attributes.
+
+Use the coverage report as actionable feedback:
+1. Inspect missed Java line, branch, and method coverage, or missed Scala statement, branch, and method-level coverage.
+2. Add test cases and assertions that exercise those paths.
+3. Re-run the unit test and coverage report.
+4. Repeat until important CPU UDF branches are covered.
+
+If a missed line, statement, branch, or method path cannot or should not be tested, add a clear comment explaining why. Examples include:
+- unreachable defensive code
+- unsupported input domains
+- unrelated template infrastructure
+
+Report the relevant counters for the target CPU UDF and local helper classes/objects:
+- Java: LINE, BRANCH, and METHOD counters from JaCoCo.
+- Scala: statement and branch coverage from scoverage, plus method-level statement/branch rates from `` elements.
+
+NOTE: JaCoCo and scoverage will not track source-level coverage in external JARs. If the UDF relies on external JAR business logic, make a note of this residual coverage gap.
+
+## Step 5: Verify Outputs
+
+After the test passes, verify that:
+1. The test data covers various edge cases and reflects realistic input formats
+2. The assertions reflect actual UDF behavior (no "cheating" by hardcoding values)
+3. The coverage report shows strong coverage of the target CPU UDF and local helper logic
+4. Any uncovered lines, branches, or methods are explicitly explained
+5. Any external JAR logic invoked by the UDF is called out as outside the coverage scope
+
+If any quality checks fail, revise the test code and re-run.
+
+## Output
+
+Upon successful completion:
+- Project directory: `//`
+- Unit test: `src/test//com/udf/UnitTest.`
+
+These outputs are required for **Step 2: Convert UDF**.
diff --git a/skills/udf-gen-test/templates/java/.mvn/jvm.config b/skills/udf-gen-test/templates/java/.mvn/jvm.config
new file mode 100644
index 00000000000..0ae13fa9a86
--- /dev/null
+++ b/skills/udf-gen-test/templates/java/.mvn/jvm.config
@@ -0,0 +1,16 @@
+-Xmx16g
+-ea
+--add-opens=java.base/java.lang=ALL-UNNAMED
+--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
+--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
+--add-opens=java.base/java.io=ALL-UNNAMED
+--add-opens=java.base/java.net=ALL-UNNAMED
+--add-opens=java.base/java.nio=ALL-UNNAMED
+--add-opens=java.base/java.util=ALL-UNNAMED
+--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
+--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
+--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
+--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
+--add-opens=java.base/sun.security.action=ALL-UNNAMED
+--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
+--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
diff --git a/skills/udf-gen-test/templates/java/pom.xml b/skills/udf-gen-test/templates/java/pom.xml
new file mode 100644
index 00000000000..6925eb2f55f
--- /dev/null
+++ b/skills/udf-gen-test/templates/java/pom.xml
@@ -0,0 +1,342 @@
+
+
+
+ 4.0.0
+ com.udf
+ aether-agent-udfs
+ 1.0.0
+ Aether UDF Conversion
+ This project contains UDFs that will be converted from CPU to GPU.
+ jar
+
+
+ 17
+ 17
+ UTF-8
+ UTF-8
+ UTF-8
+ 2.12
+
+ 3.5.5
+ 26.04.0
+ 0.8.14
+
+ cuda12
+ v26.04.00
+ v26.04.00
+
+
+ false
+ off
+
+ ON
+ RAPIDS
+ 10
+ ON
+ OFF
+ false
+ rapidsudfjni
+ ${project.build.directory}/native-build
+
+
+ -Xmx5g -ea
+ -Dai.rapids.refcount.debug=${debug.memory.leaks}
+ -Dorg.slf4j.simpleLogger.defaultLogLevel=off
+ -Dorg.slf4j.simpleLogger.log.ai.rapids.cudf=${cudf.log.level}
+ --add-opens=java.base/java.lang=ALL-UNNAMED
+ --add-opens=java.base/java.lang.invoke=ALL-UNNAMED
+ --add-opens=java.base/java.lang.reflect=ALL-UNNAMED
+ --add-opens=java.base/java.io=ALL-UNNAMED
+ --add-opens=java.base/java.net=ALL-UNNAMED
+ --add-opens=java.base/java.nio=ALL-UNNAMED
+ --add-opens=java.base/java.util=ALL-UNNAMED
+ --add-opens=java.base/java.util.concurrent=ALL-UNNAMED
+ --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
+ --add-opens=java.base/sun.nio.ch=ALL-UNNAMED
+ --add-opens=java.base/sun.nio.cs=ALL-UNNAMED
+ --add-opens=java.base/sun.security.action=ALL-UNNAMED
+ --add-opens=java.base/sun.util.calendar=ALL-UNNAMED
+ --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
+
+
+
+
+ debug-leaks
+
+
+ debug.memory.leaks
+ true
+
+
+
+ error
+
+
+
+ coverage
+
+
+
+ org.jacoco
+ jacoco-maven-plugin
+ ${jacoco.version}
+
+
+ HTML
+ XML
+ CSV
+
+
+ com/udf/bench/*
+ com/udf/SparkUtils*
+
+
+
+
+ prepare-agent
+
+ prepare-agent
+
+
+ jacoco.agent.argLine
+
+
+
+ report
+ verify
+
+ report
+
+
+
+
+
+
+
+
+ cuda-native-udf
+
+
+
+ org.apache.maven.plugins
+ maven-dependency-plugin
+ 3.6.1
+
+
+ copy-rapids-jar-with-classifier
+ generate-sources
+
+ copy
+
+
+
+
+ com.nvidia
+ rapids-4-spark_${scala.binary.version}
+ ${rapids4spark.version}
+ ${cuda.version}
+ jar
+ false
+ ${project.build.directory}/rapids-jar
+
+
+ true
+
+
+
+ copy-rapids-jar-no-classifier
+ generate-sources
+
+ copy
+
+
+
+
+ com.nvidia
+ rapids-4-spark_${scala.binary.version}
+ ${rapids4spark.version}
+ jar
+ false
+ ${project.build.directory}/rapids-jar
+
+
+ true
+
+
+
+
+
+ org.apache.maven.plugins
+ maven-antrun-plugin
+ 3.1.0
+
+
+ extract-cuda-native-dependencies
+ generate-sources
+
+ ${skipCudfExtraction}
+
+
+
+
+
+
+
+
+
+
+
+
+ run
+
+
+
+ cmake-cuda-native-udf
+ compile
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ run
+
+
+
+
+
+ org.apache.maven.plugins
+ maven-resources-plugin
+ 3.3.1
+
+
+ copy-cuda-native-library-to-classes
+ process-classes
+
+ copy-resources
+
+
+ true
+ ${project.build.outputDirectory}/${os.arch}/${os.name}
+
+
+ ${native.build.path}
+
+ lib${native.library.name}.so
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ org.apache.spark
+ spark-hive_${scala.binary.version}
+ ${spark.version}
+ provided
+
+
+
+ com.nvidia
+ rapids-4-spark_${scala.binary.version}
+ ${rapids4spark.version}
+ provided
+
+
+
+ junit
+ junit
+ 4.13.2
+ test
+
+
+
+ org.slf4j
+ slf4j-simple
+ 1.7.36
+ test
+
+
+
+
+
+
+
+ org.apache.maven.plugins
+ maven-surefire-plugin
+ 3.1.2
+
+
+ **/*Test.java
+
+ @{jacoco.agent.argLine} ${test.jvm.args}
+
+
+
+
+ org.codehaus.mojo
+ exec-maven-plugin
+ 3.1.0
+
+
+
+ org.apache.maven.plugins
+ maven-shade-plugin
+ 3.2.4
+
+
+ package
+
+ shade
+
+
+ false
+
+
+ *:*
+
+ META-INF/*.SF
+ META-INF/*.DSA
+ META-INF/*.RSA
+
+
+
+
+
+
+
+
+
+
diff --git a/skills/udf-gen-test/templates/java/run_gen_data.sh b/skills/udf-gen-test/templates/java/run_gen_data.sh
new file mode 100644
index 00000000000..1a7b4b2adbc
--- /dev/null
+++ b/skills/udf-gen-test/templates/java/run_gen_data.sh
@@ -0,0 +1,72 @@
+#!/bin/bash
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+# Generate or validate benchmark data
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+print_usage() {
+ echo "Usage: $0 --rows NUM [--validate] [--output-path PATH] [--mvn-arg ARG]..."
+}
+
+ROWS=""
+VALIDATE=""
+OUTPUT_PATH=""
+MAVEN_ARGS=()
+
+while [[ $# -gt 0 ]]; do
+ case $1 in
+ --rows) ROWS="$2"; shift 2;;
+ --validate) VALIDATE="true"; shift;;
+ --output-path) OUTPUT_PATH="$2"; shift 2;;
+ --mvn-arg) MAVEN_ARGS+=("$2"); shift 2;;
+ *)
+ echo "Unknown option: $1"
+ print_usage
+ exit 1
+ ;;
+ esac
+done
+
+if [ -z "$ROWS" ]; then
+ echo "Error: --rows is required"
+ print_usage
+ exit 1
+fi
+
+SPARK_CONFS=(
+ --spark-conf spark.master="local[8]"
+ --spark-conf spark.rapids.sql.enabled="true"
+ --spark-conf spark.plugins="com.nvidia.spark.SQLPlugin"
+ --spark-conf spark.locality.wait="0s"
+ --spark-conf spark.sql.cache.serializer="com.nvidia.spark.ParquetCachedBatchSerializer"
+ --spark-conf spark.rapids.sql.format.parquet.reader.type="MULTITHREADED"
+ --spark-conf spark.rapids.sql.reader.batchSizeBytes="1000MB"
+ --spark-conf spark.sql.files.maxPartitionBytes="512MB"
+ --spark-conf spark.rapids.sql.metrics.level="DEBUG"
+)
+
+EXEC_ARGS="--rows $ROWS --partitions 32"
+for arg in "${SPARK_CONFS[@]}"; do
+ EXEC_ARGS="$EXEC_ARGS $arg"
+done
+
+if [ -n "$VALIDATE" ]; then
+ EXEC_ARGS="$EXEC_ARGS --validate"
+ echo "Running GenData in validation mode with $ROWS rows..."
+else
+ if [ -z "$OUTPUT_PATH" ]; then
+ OUTPUT_PATH="data/bench_data_${ROWS}_rows.parquet"
+ fi
+ EXEC_ARGS="$EXEC_ARGS --output-path $OUTPUT_PATH"
+ echo "Running GenData to generate $ROWS rows -> $OUTPUT_PATH..."
+fi
+
+mvn "${MAVEN_ARGS[@]}" compile exec:java \
+ -Dexec.mainClass="com.udf.bench.GenData" \
+ -Dexec.classpathScope=compile \
+ -Dexec.args="$EXEC_ARGS"
diff --git a/skills/udf-gen-test/templates/java/run_micro_benchmark.sh b/skills/udf-gen-test/templates/java/run_micro_benchmark.sh
new file mode 100644
index 00000000000..6cac6d3e8c9
--- /dev/null
+++ b/skills/udf-gen-test/templates/java/run_micro_benchmark.sh
@@ -0,0 +1,60 @@
+#!/bin/bash
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+# Run in-memory microbenchmark for RapidsUDFs.
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+print_usage() {
+ echo "Usage: $0 --mode cpu|gpu|all --data-path PATH [--rows N] [--warmup N] [--measured N] [--pool-fraction F] [--profile] [--mvn-arg ARG]..."
+}
+
+MODE=""
+DATA_PATH=""
+PROFILE=""
+MAVEN_ARGS=()
+RUNNER_ARGS=()
+
+while [[ $# -gt 0 ]]; do
+ case $1 in
+ --mode) MODE="$2"; RUNNER_ARGS+=("$1" "$2"); shift 2;;
+ --data-path) DATA_PATH="$2"; RUNNER_ARGS+=("$1" "$2"); shift 2;;
+ --profile) PROFILE="true"; RUNNER_ARGS+=("$1"); shift;;
+ --mvn-arg) MAVEN_ARGS+=("$2"); shift 2;;
+ *) RUNNER_ARGS+=("$1"); shift;;
+ esac
+done
+
+if [ -z "$MODE" ] || [ -z "$DATA_PATH" ]; then
+ echo "Error: --mode and --data-path are required"
+ print_usage
+ exit 1
+fi
+
+MVN_CMD=(
+ mvn "${MAVEN_ARGS[@]}" compile exec:java
+ -Dexec.mainClass=com.udf.bench.MicroBenchRunner
+ -Dexec.classpathScope=compile
+ "-Dexec.args=${RUNNER_ARGS[*]}"
+)
+
+if [ -n "$PROFILE" ]; then
+ REPORT_PATH="results/microbench_$(date +%Y%m%d_%H%M%S)"
+ mkdir -p results
+ echo "Running microbenchmark (mode=$MODE) on $DATA_PATH with nsys profiling..."
+ echo "nsys report will be saved to: ${REPORT_PATH}.nsys-rep"
+ nsys profile \
+ -c cudaProfilerApi \
+ --capture-range-end=stop \
+ --trace=cuda,nvtx \
+ --nvtx-domain-include="libcudf" \
+ -o "$REPORT_PATH" \
+ "${MVN_CMD[@]}"
+else
+ echo "Running microbenchmark (mode=$MODE) on $DATA_PATH..."
+ "${MVN_CMD[@]}"
+fi
diff --git a/skills/udf-gen-test/templates/java/run_spark_benchmark.sh b/skills/udf-gen-test/templates/java/run_spark_benchmark.sh
new file mode 100644
index 00000000000..d8b2b1d1b70
--- /dev/null
+++ b/skills/udf-gen-test/templates/java/run_spark_benchmark.sh
@@ -0,0 +1,69 @@
+#!/bin/bash
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+# Run CPU or GPU Spark benchmark.
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+print_usage() {
+ echo "Usage: $0 --mode cpu|gpu --data-path PATH [--result-path PATH] [--mvn-arg ARG]..."
+}
+
+MODE=""
+DATA_PATH=""
+RESULT_PATH=""
+MAVEN_ARGS=()
+
+while [[ $# -gt 0 ]]; do
+ case $1 in
+ --mode) MODE="$2"; shift 2;;
+ --data-path) DATA_PATH="$2"; shift 2;;
+ --result-path) RESULT_PATH="$2"; shift 2;;
+ --mvn-arg) MAVEN_ARGS+=("$2"); shift 2;;
+ *)
+ echo "Unknown option: $1"
+ print_usage
+ exit 1
+ ;;
+ esac
+done
+
+if [ -z "$MODE" ] || [ -z "$DATA_PATH" ]; then
+ echo "Error: --mode and --data-path are required"
+ print_usage
+ exit 1
+fi
+
+DATA_BASENAME=$(basename "$DATA_PATH" .parquet)
+TIMESTAMP=$(date +%Y%m%d_%H%M%S)
+if [ -z "$RESULT_PATH" ]; then
+ RESULT_PATH="results/${MODE}_${DATA_BASENAME}_${TIMESTAMP}_result.json"
+fi
+
+SPARK_CONFS=(
+ --spark-conf spark.master="local[8]"
+ --spark-conf spark.rapids.sql.enabled="true"
+ --spark-conf spark.plugins="com.nvidia.spark.SQLPlugin"
+ --spark-conf spark.locality.wait="0s"
+ --spark-conf spark.sql.cache.serializer="com.nvidia.spark.ParquetCachedBatchSerializer"
+ --spark-conf spark.rapids.sql.format.parquet.reader.type="MULTITHREADED"
+ --spark-conf spark.rapids.sql.reader.batchSizeBytes="1000MB"
+ --spark-conf spark.sql.files.maxPartitionBytes="512MB"
+ --spark-conf spark.rapids.sql.metrics.level="DEBUG"
+)
+
+EXEC_ARGS="--mode $MODE --data-path $DATA_PATH --result-path $RESULT_PATH"
+for arg in "${SPARK_CONFS[@]}"; do
+ EXEC_ARGS="$EXEC_ARGS $arg"
+done
+EXEC_ARGS="$EXEC_ARGS --spark-conf spark.app.name=${MODE}_${DATA_BASENAME}_${TIMESTAMP}"
+
+echo "Running $MODE benchmark on $DATA_PATH..."
+mvn "${MAVEN_ARGS[@]}" compile exec:java \
+ -Dexec.mainClass="com.udf.bench.SparkBenchRunner" \
+ -Dexec.classpathScope=compile \
+ -Dexec.args="$EXEC_ARGS"
diff --git a/skills/udf-gen-test/templates/java/src/main/java/com/udf/SparkUtils.java b/skills/udf-gen-test/templates/java/src/main/java/com/udf/SparkUtils.java
new file mode 100644
index 00000000000..d50816e6fdf
--- /dev/null
+++ b/skills/udf-gen-test/templates/java/src/main/java/com/udf/SparkUtils.java
@@ -0,0 +1,126 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package com.udf;
+
+import com.nvidia.spark.rapids.ExplainPlan;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Spark utility methods.
+ */
+public class SparkUtils {
+
+ /**
+ * Apply key=value Spark configs to a builder.
+ *
+ * @param builder the SparkSession builder to configure
+ * @param sparkConfs "spark.key=value" config strings
+ * @return the same builder, for chaining
+ */
+ public static SparkSession.Builder applySparkConfs(
+ SparkSession.Builder builder, List sparkConfs) {
+ for (String conf : sparkConfs) {
+ String[] kv = conf.split("=", 2);
+ if (kv.length == 2) builder.config(kv[0], kv[1]);
+ }
+ return builder;
+ }
+
+ /**
+ * Get a required argument from a parsed argument map, or throw.
+ *
+ * @param parsed the parsed argument map
+ * @param key the argument key (without "--" prefix)
+ * @return the argument value
+ * @throws IllegalArgumentException if the key is missing
+ */
+ public static String requireArg(Map parsed, String key) {
+ String val = parsed.get(key);
+ if (val == null) {
+ throw new IllegalArgumentException("--" + key + " is required");
+ }
+ return val;
+ }
+
+ /**
+ * Ops that cause fallback but can be ignored, since they are strictly used for testing:
+ * - RDDScanExec/LocalTableScanExec: surfaces due to spark.createDataFrame()
+ * - CollectLimitExec: surfaces during dataframe collection (e.g. df.show())
+ * - ToPrettyString: surfaces due to df.show()
+ */
+ private static final Set IGNORE_OPERATIONS = new HashSet<>(
+ Arrays.asList("RDDScanExec", "LocalTableScanExec", "CollectLimitExec", "ToPrettyString")
+ );
+
+ /**
+ * Assert that the DataFrame's plan can run on GPU.
+ * NOTE: This is only reliable in explainOnly mode, with AQE disabled.
+ *
+ * @param df the DataFrame to check
+ * @throws RuntimeException if any operations cannot run on GPU
+ */
+ public static void assertPlanRunsOnGpu(Dataset df) {
+ assertPlanRunsOnGpu(df, false);
+ }
+
+ /**
+ * Assert that the DataFrame's plan can run on GPU.
+ * NOTE: This is only reliable in explainOnly mode, with AQE disabled.
+ *
+ * @param df the DataFrame to check
+ * @param returnFullPlan if true, include the full plan in the error message
+ * @throws RuntimeException if any operations cannot run on GPU
+ */
+ public static void assertPlanRunsOnGpu(Dataset df, boolean returnFullPlan) {
+ String plan = getGpuPlan(df);
+ List unsupportedOps = getUnsupportedOps(plan);
+ if (!unsupportedOps.isEmpty()) {
+ StringBuilder sb = new StringBuilder();
+ sb.append("Some operations cannot run on GPU.\nFound the following unsupported ops:\n");
+ for (String op : unsupportedOps) {
+ sb.append("- ").append(op).append("\n");
+ }
+ if (returnFullPlan) {
+ sb.append("\nFull physical plan:\n").append(plan);
+ }
+ throw new RuntimeException(sb.toString());
+ }
+ }
+
+ /** Get the potential GPU plan using the RAPIDS ExplainPlan API. */
+ private static String getGpuPlan(Dataset df) {
+ return ExplainPlan.explainPotentialGpuPlan(df, "NOT_ON_GPU");
+ }
+
+ /** Parse the plan for unsupported operations (lines starting with '!'). */
+ private static List getUnsupportedOps(String plan) {
+ List result = new ArrayList<>();
+ for (String line : plan.split("\n")) {
+ // Each unsupported line looks like: ![Exec] cannot run on GPU
+ String trimmed = line.trim();
+ if (trimmed.startsWith("!")) {
+ int start = trimmed.indexOf('<');
+ int end = trimmed.indexOf('>');
+ if (start >= 0 && end > start) {
+ String op = trimmed.substring(start + 1, end);
+ if (!IGNORE_OPERATIONS.contains(op)) {
+ result.add(trimmed);
+ }
+ }
+ }
+ }
+ return result;
+ }
+}
diff --git a/skills/udf-gen-test/templates/java/src/main/java/com/udf/bench/BenchUtils.java b/skills/udf-gen-test/templates/java/src/main/java/com/udf/bench/BenchUtils.java
new file mode 100644
index 00000000000..8fbf2fa2fa5
--- /dev/null
+++ b/skills/udf-gen-test/templates/java/src/main/java/com/udf/bench/BenchUtils.java
@@ -0,0 +1,112 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package com.udf.bench;
+
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.types.DataTypes;
+import static org.apache.spark.sql.functions.*;
+
+/**
+ * Benchmark utilities.
+ * - generateSyntheticData: Create benchmark data for the UDF
+ * - executeCpu: Register and run the CPU UDF
+ * - executeGpu: Register and run the GPU implementation
+ */
+public class BenchUtils {
+
+ // ---------------------------------------------------------------------------
+ // Data generation
+ // ---------------------------------------------------------------------------
+
+ /**
+ * TODO: Generate a synthetic DataFrame matching the unit test schema.
+ *
+ * Use {@code spark.range(0, numRows, 1, numPartitions)} as the base, then apply
+ * randomized column generators to produce data matching the UDF's expected input.
+ *
+ * Requirements:
+ * - Column names and types MUST match the unit test dataset schema
+ * - Data should be realistic and varied (different lengths, edge cases, etc.)
+ * - For variable-length inputs, generate sizable rows representative of
+ * enterprise-scale data
+ *
+ * Example:
+ *
+ *
+ * @param spark active SparkSession
+ * @param numRows number of rows to generate
+ * @param numPartitions number of output partitions
+ * @return DataFrame with the same schema as the unit test data
+ */
+ public static Dataset generateSyntheticData(
+ SparkSession spark, long numRows, int numPartitions) {
+ return null; // TODO
+ }
+
+ // ---------------------------------------------------------------------------
+ // Execution
+ // ---------------------------------------------------------------------------
+
+ /**
+ * TODO: Execute the CPU UDF on the benchmark DataFrame.
+ * 1. Register the CPU UDF with Spark
+ * 2. Execute it on {@code df}
+ * 3. Return the result DataFrame
+ *
+ * Example:
+ *
{@code
+ * df.createOrReplaceTempView("bench_table");
+ * spark.sql("CREATE TEMPORARY FUNCTION calculate_risk AS 'com.udf.CalculateRiskUDF'");
+ * return spark.sql("SELECT *, calculate_risk(credit_score) AS risk_level FROM bench_table");
+ * }
+ *
+ * @param spark active SparkSession
+ * @param df input benchmark DataFrame
+ * @return result DataFrame after applying the CPU UDF
+ */
+ public static Dataset executeCpu(SparkSession spark, Dataset df) {
+ return null; // TODO
+ }
+
+ /**
+ * TODO: Execute the GPU implementation on the benchmark DataFrame.
+ *
+ * For RapidsUDF - register the RapidsUDF and run the same query as executeCpu:
+ *
{@code
+ * df.createOrReplaceTempView("bench_table");
+ * spark.sql("CREATE TEMPORARY FUNCTION calculate_risk_rapids AS 'com.udf.CalculateRiskRapidsUDF'");
+ * return spark.sql("SELECT *, calculate_risk_rapids(credit_score) AS risk_level FROM bench_table");
+ * }
+ *
+ * For SQL - read the SQL file from src/main/resources/ and adapt it for
+ * benchmarking. The SQL was written for the unit test, so you must:
+ * 1. Replace "test_table" with "bench_table"
+ * 2. Replace the SELECT column list with "SELECT *" to avoid referencing
+ * columns that may not exist in the benchmark DataFrame
+ *
{@code
+ * df.createOrReplaceTempView("bench_table");
+ * String sqlContent = new String(Files.readAllBytes(Paths.get("src/main/resources/calculate_risk.sql")));
+ * String benchSql = sqlContent.replace("test_table", "bench_table");
+ * // Also replace the SELECT column list with SELECT * if needed
+ * return spark.sql(benchSql);
+ * }
+ *
+ * @param spark active SparkSession
+ * @param df input benchmark DataFrame
+ * @return result DataFrame after applying the GPU implementation
+ */
+ public static Dataset executeGpu(SparkSession spark, Dataset df) {
+ return null; // TODO
+ }
+}
diff --git a/skills/udf-gen-test/templates/java/src/main/java/com/udf/bench/GenData.java b/skills/udf-gen-test/templates/java/src/main/java/com/udf/bench/GenData.java
new file mode 100644
index 00000000000..94a22eea753
--- /dev/null
+++ b/skills/udf-gen-test/templates/java/src/main/java/com/udf/bench/GenData.java
@@ -0,0 +1,110 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package com.udf.bench;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import com.udf.SparkUtils;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+
+/**
+ * Generates benchmark data and optionally validates by running
+ * BenchUtils.executeCpu and BenchUtils.executeGpu.
+ *
+ * Usage:
+ * mvn exec:java -Dexec.mainClass=com.udf.bench.GenData \
+ * -Dexec.args="--rows 1000 --validate --spark-conf k=v ..."
+ */
+public class GenData {
+
+ public static void main(String[] args) {
+ Map argMap = new HashMap<>();
+ List sparkConfs = new ArrayList<>();
+ parseArgs(args, argMap, sparkConfs);
+
+ long rows = Long.parseLong(SparkUtils.requireArg(argMap, "rows"));
+ int partitions = Integer.parseInt(argMap.getOrDefault("partitions", "32"));
+ boolean validate = argMap.containsKey("validate");
+ String outputPath = argMap.get("output-path");
+
+ // Build Spark session
+ SparkSession.Builder builder = SparkSession.builder().appName("GenData");
+ SparkUtils.applySparkConfs(builder, sparkConfs);
+ SparkSession spark = builder.enableHiveSupport().getOrCreate();
+
+ try {
+ // Generate synthetic data
+ Dataset df = BenchUtils.generateSyntheticData(spark, rows, partitions);
+
+ // Verify row count
+ long actualRows = df.count();
+ if (actualRows != rows) {
+ System.err.println("Row count mismatch: expected=" + rows
+ + ", actual=" + actualRows);
+ System.exit(1);
+ }
+ System.out.println("Generated " + actualRows + " rows across "
+ + partitions + " partitions");
+
+ if (validate) {
+ // Validation mode — run both CPU and GPU execute, don't write
+ for (String label : new String[]{"cpu", "gpu"}) {
+ try {
+ if ("cpu".equals(label)) {
+ BenchUtils.executeCpu(spark, df).collect();
+ } else {
+ BenchUtils.executeGpu(spark, df).collect();
+ }
+ System.out.println("Validation (" + label + ") passed.");
+ } catch (Exception e) {
+ System.err.println("Validation (" + label + ") failed: "
+ + e.getClass().getSimpleName() + ": " + e.getMessage());
+ e.printStackTrace(System.err);
+ System.exit(1);
+ }
+ }
+ } else {
+ // Generation mode — write to output path
+ if (outputPath == null) {
+ throw new IllegalArgumentException(
+ "--output-path is required when not in validation mode");
+ }
+ df.write().mode("overwrite").parquet(outputPath);
+ System.err.println("Successfully generated dataset and saved to: " + outputPath);
+ }
+ } catch (Exception e) {
+ System.err.println("Failed to generate dataset: "
+ + e.getClass().getSimpleName());
+ e.printStackTrace(System.err);
+ System.exit(1);
+ } finally {
+ spark.stop();
+ }
+
+ System.exit(0);
+ }
+
+ /** Parse CLI arguments. */
+ private static void parseArgs(String[] args, Map map, List sparkConfs) {
+ int i = 0;
+ while (i < args.length) {
+ switch (args[i]) {
+ case "--rows": map.put("rows", args[i + 1]); i += 2; break;
+ case "--partitions": map.put("partitions", args[i + 1]); i += 2; break;
+ case "--validate": map.put("validate", "true"); i += 1; break;
+ case "--output-path": map.put("output-path", args[i + 1]); i += 2; break;
+ case "--spark-conf": sparkConfs.add(args[i + 1]); i += 2; break;
+ default:
+ throw new IllegalArgumentException("Unknown argument: " + args[i]);
+ }
+ }
+ }
+}
diff --git a/skills/udf-gen-test/templates/java/src/main/java/com/udf/bench/MicroBenchRunner.java b/skills/udf-gen-test/templates/java/src/main/java/com/udf/bench/MicroBenchRunner.java
new file mode 100644
index 00000000000..877bfe214c4
--- /dev/null
+++ b/skills/udf-gen-test/templates/java/src/main/java/com/udf/bench/MicroBenchRunner.java
@@ -0,0 +1,320 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package com.udf.bench;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+
+import ai.rapids.cudf.ColumnVector;
+import ai.rapids.cudf.Cuda;
+import ai.rapids.cudf.CudaMemInfo;
+import ai.rapids.cudf.HostColumnVector;
+import ai.rapids.cudf.Rmm;
+import ai.rapids.cudf.RmmAllocationMode;
+import ai.rapids.cudf.Table;
+
+/**
+ * Microbenchmark runner for CPU vs. RapidsUDF. Measures UDF execution time on in-memory dataset.
+ *
+ * Reads Parquet file (produced by GenData) via cuDF Table.readParquet.
+ * Benchmarks CPU (row-by-row evaluate) and GPU (evaluateColumnar) paths.
+ * Data loading and host/device transfers are not part of timing.
+ *
+ * Usage:
+ * mvn exec:java -Dexec.mainClass=com.udf.bench.MicroBenchRunner \
+ * -Dexec.args="--mode all --data-path data/bench_data --rows 1000000"
+ */
+public class MicroBenchRunner {
+
+ private static final int DEFAULT_WARMUP = 2;
+ private static final int DEFAULT_MEASURED = 4;
+ private static final float DEFAULT_RMM_ALLOC_FRACTION = 0.9f;
+
+ /**
+ * TODO: Extract column data from host memory into Java objects.
+ *
+ * Called once before CPU timing loop. Convert HostColumnVectors to
+ * array of Java objects for executeCpu.
+ * Use hostColumns[i].getJavaString(row), .getInt(row), .getDouble(row),
+ * .getStruct(row), .getList(row), etc. to extract values into typed arrays.
+ *
+ * This is outside of the timing loop due to overhead of extracting/boxing
+ * Java types from cuDF.
+ *
+ * Example for a UDF that takes (String, int):
+ *
{@code
+ * String[] col0 = new String[numRows];
+ * int[] col1 = new int[numRows];
+ * for (int i = 0; i < numRows; i++) {
+ * col0[i] = hostColumns[0].getJavaString(i);
+ * col1[i] = hostColumns[1].getInt(i);
+ * }
+ * return new Object[] { col0, col1 };
+ * }
+ *
+ * @param hostColumns all columns copied to host memory
+ * @param numRows number of rows in the dataset
+ * @return array of typed Java arrays, one per UDF input column
+ */
+ public static Object[] prepareCpuData(HostColumnVector[] hostColumns, int numRows) {
+ // TODO: Extract columns to Java arrays
+ return null; // TODO
+ }
+
+ /**
+ * TODO: Execute the CPU UDF on Java data row-by-row.
+ *
+ * Example:
+ *
+ *
+ * @param data Java arrays from {@link #prepareCpuData}
+ * @param numRows number of rows in the dataset
+ */
+ public static void executeCpu(Object[] data, int numRows) {
+ // TODO: Cast arrays and call CPU UDF evaluate() per row
+ }
+
+ /**
+ * TODO: Execute the GPU UDF via evaluateColumnar.
+ *
+ * Example:
+ *
+ */
+ public static Dataset createTestData(SparkSession spark) {
+ return null; // TODO
+ }
+
+ /**
+ * TODO: Register the UDF with Spark.
+ *
+ * Example (Hive UDF):
+ *
{@code
+ * spark.sql("CREATE TEMPORARY FUNCTION " + udfName
+ * + " AS 'com.udf.CalculateRiskUDF'");
+ * }
+ */
+ public static void registerUDF(SparkSession spark, String udfName) {
+ // TODO
+ }
+
+ /**
+ * TODO: Execute the UDF on the test DataFrame and return the result.
+ *
+ * Example:
+ *