Publish UDF agent skills [skip ci] by rishic3 · Pull Request #15058 · NVIDIA/cudf-spark

rishic3 · 2026-06-11T03:05:50Z

Closes #15014. See the epic #14977 for the next phases.

Description

Publishes the UDF skills, docs, and tests. The tests are yet to be wired up to CI, which is planned in #15013.

Installation from this branch can be tested like so:

npx skills add "https://github.com/rishic3/spark-rapids/tree/aether-udf-skills"

Once merged, the following will work:

npx skills add NVIDIA/spark-rapids

Note that only directories with a SKILL.md file will be detected as skills.

Changes taking effect outside of skills/ are:

updating the root pom.xml to ignore skills/ in RAT and scalastyle checks
ignoring template pom.xml files under skills/ in make-scala-version-build-files.sh
updating the root LICENSE with the dual CC-by-4.0 and Apache 2.0 license

Checklists

Documentation

Updated for new or modified user-facing features or behaviors
No user-facing change

Testing

Added or modified tests to cover new code paths
Covered by existing tests
(Please provide the names of the existing tests in the PR description.)
Not required

Performance

Tests ran and results are added in the PR description
Issue filed with a link in the PR description
Not required

Signed-off-by: Rishi Chandra <rishic@nvidia.com>

greptile-apps · 2026-06-11T04:45:03Z

Greptile Summary

This PR publishes the UDF agent skills — a collection of AI-assistant skill packages (SKILL.md-based) covering SQL conversion, cuDF/CUDA conversion, benchmarking, and test generation — along with dual-licensed documentation/examples and the build changes needed to keep the main Maven project from processing the skills subtree.

Build wiring: pom.xml and scala2.13/pom.xml exclude skills/** from RAT and scalastyle checks; make-scala-version-build-files.sh skips skills/ pom templates to prevent them from being transformed for Scala 2.13.
Scala test templates (UnitTest, CudfComparisonTest, SqlComparisonTest): all three are missing the installMutableClassLoader() setup that the Java counterparts include and that is required to avoid a RAPIDS ShimLoader failure on Java 17 under a forked Surefire JVM; the Scala TestUtils class does not define this method at all.
BenchUtils.scala Scaladoc example: the executeGpu doc block shows scala.io.Source.fromFile(...).mkString without a close, while the actual template code (SqlComparisonTest.scala) correctly uses try/finally.

Confidence Score: 4/5

Safe to merge for build infrastructure and documentation; the Scala test templates distributed to users will silently fail on Java 17 with the RAPIDS plugin until the missing classloader setup is added.

The Scala test scaffolding omits the URLClassLoader setup that the Java counterpart explicitly documents as required for Java 17 / RAPIDS ShimLoader. Every Scala-template user who runs mvn test with spark.plugins = com.nvidia.spark.SQLPlugin on Java 17 will get a ShimLoader exception before a single test runs. The Java templates, build changes, and documentation are all clean.

skills/udf-gen-test/templates/scala/src/test/scala/com/udf/TestUtils.scala and all three Scala test classes (UnitTest.scala, CudfComparisonTest.scala, SqlComparisonTest.scala) need the installMutableClassLoader setup added.

Important Files Changed

Filename	Overview
skills/udf-gen-test/templates/scala/src/main/scala/com/udf/Arm.scala	Adds ARM helpers (withResource, closeOnExcept, closeAll); closeAll now wraps each close in try/catch as requested, addressing the previous review concern.
skills/udf-gen-test/templates/scala/src/main/scala/com/udf/bench/MicroBenchRunner.scala	Microbenchmark runner; copyAllToHost correctly catches exceptions and calls closeAll; readParquetData and limitTable resource handling looks correct.
skills/udf-gen-test/templates/scala/src/test/scala/com/udf/UnitTest.scala	Scala test template missing installMutableClassLoader() setup that the Java counterpart requires for Java 17 RAPIDS ShimLoader compatibility.
skills/udf-gen-test/templates/scala/src/test/scala/com/udf/TestUtils.scala	Missing the installMutableClassLoader() method that Java TestUtils.java provides for Java 17 / RAPIDS ShimLoader compatibility.
skills/udf-gen-test/templates/scala/src/test/scala/com/udf/SqlComparisonTest.scala	scala.io.Source is now correctly closed with try/finally, addressing the previous review comment.
skills/udf-gen-test/templates/scala/src/main/scala/com/udf/bench/BenchUtils.scala	Scaladoc example for executeGpu shows scala.io.Source without close; actual production logic is correct stub; P2 doc issue.
pom.xml	Excludes skills/ from RAT license checks and scalastyle checks; correctly motivated by dual CC-BY-4.0/Apache-2.0 licensing in the skills subtree.
build/make-scala-version-build-files.sh	Adds a guard to skip skills/** pom.xml template files, preventing them from being processed by the Scala version build file generator.
skills/udf-convert-to-cudf/examples/URLDecode.java	RapidsUDF example with proper try-with-resources for GPU intermediates; correct null handling on CPU path.
skills/udf-gen-test/templates/java/src/main/java/com/udf/bench/MicroBenchRunner.java	Java microbenchmark template; resource management (closeAll, try-with-resources) is correct throughout.
skills/udf-gen-test/templates/java/src/test/java/com/udf/TestUtils.java	Includes installMutableClassLoader() with clear explanation for Java 17 RAPIDS ShimLoader compatibility; assertDataFrameEquals properly sorts and compares rows.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User: npx skills add NVIDIA/spark-rapids] --> B[Skill discovery: SKILL.md files]
    B --> C1[udf-convert-to-sql]
    B --> C2[udf-convert-to-cudf]
    B --> C3[udf-convert-to-cuda]
    B --> C4[udf-gen-test]
    B --> C5[udf-judge-conversion]
    B --> C6[udf-optimize-cudf]
    B --> C7[udf-benchmark]
    C4 --> D1[Java template]
    C4 --> D2[Scala template]
    D1 --> E1[TestUtils.installMutableClassLoader ✅]
    D2 --> E2[TestUtils - method missing ❌]
    E1 --> F1[RAPIDS ShimLoader works on Java 17]
    E2 --> F2[RAPIDS ShimLoader fails on Java 17]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[User: npx skills add NVIDIA/spark-rapids] --> B[Skill discovery: SKILL.md files]
    B --> C1[udf-convert-to-sql]
    B --> C2[udf-convert-to-cudf]
    B --> C3[udf-convert-to-cuda]
    B --> C4[udf-gen-test]
    B --> C5[udf-judge-conversion]
    B --> C6[udf-optimize-cudf]
    B --> C7[udf-benchmark]
    C4 --> D1[Java template]
    C4 --> D2[Scala template]
    D1 --> E1[TestUtils.installMutableClassLoader ✅]
    D2 --> E2[TestUtils - method missing ❌]
    E1 --> F1[RAPIDS ShimLoader works on Java 17]
    E2 --> F2[RAPIDS ShimLoader fails on Java 17]

_{Reviews (6): Last reviewed commit: "add a note on heap size" | Re-trigger Greptile}

pxLi · 2026-06-12T06:14:47Z

I recommend starting with a skills-only change first. The example and test code should be added together with the CI updates in phase 1.

Refer to #14977

rishic3 · 2026-06-12T15:49:27Z

I recommend starting with a skills-only change first. The example and test code should be added together with the CI updates in phase 1.

Refer to #14977

Sounds good, deferring tests/ to a follow-up. Can you clarify which examples you meant to defer? Some examples are bundled in the skill and given to the agent for reference, the top-level examples is for users to test on example UDFs.

rishic3 · 2026-06-15T15:18:46Z

build

revans2 · 2026-06-15T18:33:52Z

+        if (resources != null) {
+            for (AutoCloseable r : resources) {
+                if (r != null) {
+                    try { r.close(); } catch (Exception ignore) {}


Should we have the ignored exceptions be added to the original exception with addSuppressed?

abellina · 2026-06-15T21:53:10Z

+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package com.udf.bench;


It would be nice if we could reuse DBGen() here (see datagen module). If not it would be good to discuss why we can't use it.

I didn't realize it supports custom generators, GeneratorFunction looks like what we'd need. But the README says it is not published on Maven? I could file a follow-up on this.

abellina · 2026-06-15T22:19:48Z

+        <scala.binary.version>2.12</scala.binary.version>
+        <scala.version>2.12.15</scala.version>
+        <!-- Spark/RAPIDS versions -->
+        <spark.version>3.5.5</spark.version>


so why are we hardcoding these variables here (binary.version, spark.version, scala.version). Is it because these are meant to be replaced by the llm?

In other words, why are these poms so verbose compared to the regular module poms.

These skill templates are meant to be standalone projects, copied out of the repo and into the user space. So we'll try to keep the version pins up to date with the latest GA release, but could be adjusted at runtime if the user so specifies.
As I understand spark-rapids' root pom.xml is where most of the complexity is (and all the version pins) and the modules just inherit it, hence why they are much simpler.

abellina · 2026-06-15T22:25:48Z

+   * @param f The function to execute with the resource
+   * @return The result of the function
+   */
+  def withResource[T <: AutoCloseable, R](resource: T)(f: T => R): R = {


We should move Arm.* from sql-plugin to the sql-plugin-api module so we don't need to clone parts of it here.

Yeah, I attempted this a while ago #13424 but never pushed it through 🙂 . May be a good time to revisit.

abellina

I left some comments on my first pass today, I'll take another look once there are more updates to the change.

abellina · 2026-06-16T13:55:54Z

+            if (runGpu) {
+                try {
+                    long[] times = runBenchmark(warmup, measured, profile, () -> {
+                        try (ColumnVector result = executeGpu(table, numRows)) {}


why does executeGpu return a result and executeCpu doesnt?

why do we just ignore the result here and close it?

We don't care about either result as this is just for measurement, but having executeGpu return something is just to make sure we don't leak the output column.

then executeGpu should close it inside and return void.

I'll follow up on this

revans2

I don't see any more blockers. There is more that would be nice to clean up. But we can look into it later.

revans2 · 2026-06-16T14:55:59Z

build

abellina · 2026-06-16T15:05:15Z

+            mapper.writer(printer).writeValue(new File(path), report);
+            System.err.println("Report written to: " + path);
+        } catch (Exception e) {
+            System.err.println("Failed to write report: " + e.getMessage());


this would be another swallowed error case.

Will follow up

abellina · 2026-06-16T15:14:00Z

+     *   Assert.assertEquals("UNKNOWN", results[2].getAs("risk_level"));
+     * }</pre>
+     */
+    public static void verifyUDFResults(Dataset<Row> resultDF, Dataset<Row> testDF) {


nit, would be nice if this was "assertUDFResultsEqual". I had to read the comment to realize this was going to assert not throw.

Makes sense, will follow-up

abellina · 2026-06-16T15:15:25Z

+  }
+
+  test("UDF vs SQL expression") {
+    val testDF = UnitTest.createTestData(spark).repartition(1)


in the past, single partition execution has yielded some nasty bugs, especially with our gpu execs (like hash aggregate). Having multiple tasks with splits is more of the natural Spark execution as well. Why repartition(1)? This implies a single task will run the udf, which seems odd, especially since you have 4 cores total.

This was in response to cases where we would see degenerate execution because the test dataframe would be too small, i.e. we were just passing the UDF single-row columns and not actually exercising columnar execution. Maybe we can replace with a repartition(2) or require a minimum number of test rows

abellina

I think we can take care of my comments as follow ups

rishic3 · 2026-06-16T18:24:31Z

Thanks @abellina @revans2 for shepherding this!

… in skill templates [skip ci] (#15116) ### Description This is a follow-up to address a few comments on #15058. - ([comment](#15058 (comment))) - executeGpu now closes its result internally and returns void - ([comment](#15058 (comment))) - no longer catching write errors and letting it throw (more easily reviewable with whitespace off) - ([comment](#15058 (comment))) - rename `verifyUDFResults` -> `assertUDFResults` to clarify that this calls assertions - ([comment](#15058 (comment))) - bumped to `repartition(2)` and encouraging 10+ test cases, for more realistic parallelism while ensuring the UDF is actually fed multi-row columns ### Checklists Documentation - [ ] Updated for new or modified user-facing features or behaviors - [X] No user-facing change Testing - [ ] Added or modified tests to cover new code paths - [ ] Covered by existing tests (Please provide the names of the existing tests in the PR description.) - [X] Not required Performance - [ ] Tests ran and results are added in the PR description - [ ] Issue filed with a link in the PR description - [X] Not required --------- Signed-off-by: Rishi Chandra <rishic@nvidia.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

rishic3 added 3 commits June 10, 2026 19:30

add aether udf skills

f545601

cleanups.

dc9b25b

update installation

cce3011

rishic3 requested a review from a team as a code owner June 11, 2026 03:05

signoff

474c971

Signed-off-by: Rishi Chandra <rishic@nvidia.com>

rishic3 changed the title ~~Publish UDF agent skills~~ Publish UDF agent skills [skip ci] Jun 11, 2026

pxLi requested review from GaryShen2008, YanxuanLiu, gerashegalov, revans2, sameerz and yinqingh June 11, 2026 03:07

rishic3 added 5 commits June 10, 2026 20:15

ignore pom files under skills/ in scala build

d8ae9dc

license header

0f1998e

ignore skills in rat, fix readme link

f48b531

regenerate scala 2.13 pom

446ec50

skip skills in scala style

6a709e5

greptile-apps Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread skills/udf-gen-test/templates/scala/src/main/scala/com/udf/Arm.scala

Comment thread skills/tests/test_export/scala_fixtures.py Outdated

Comment thread skills/tests/test_export/utils.py Outdated

address comments

53e1fef

pxLi reviewed Jun 12, 2026

View reviewed changes

Comment thread skills/docs/dev/VERSIONS.md Outdated

rishic3 added 3 commits June 12, 2026 06:39

fix name

81f6cf9

remove tests from initial publish

d7438d5

defer pyproject to follow-up

99c0795

sameerz added the feature request New feature or request label Jun 12, 2026

sameerz previously approved these changes Jun 12, 2026

View reviewed changes