Update databricks 17.3 [skip ci] by muzza-lovelytics · Pull Request #1017 · NVIDIA/cuml-spark

muzza-lovelytics · 2026-05-20T21:59:57Z

Databricks just sent an automated email this morning warning me that my cluster that uses Spark Rapids ML was using a Databricks LTS runtime (13.3 LTS ML) that would go end-of-life this August. That means, in a few months, new users will not be able to follow these instructions.

Rectifying this was relatively straightforward in terms of getting the new Spark 4 and Scala 2.13 dependencies sussed out and updating to the newest 26.04.2 build.

Also note that the manual CUDA installation is no longer necessary and would in fact create a regression. For simplicity, that was removed.

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

Databricks 17.3 ML LTS already natively provisions a highly optimized CUDA 12.6 environment, meaning the script causes an unstable driver/toolkit regression. Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

greptile-apps · 2026-05-20T22:01:31Z

Greptile Summary

This PR updates the Databricks notebook setup from DBR 13.3 LTS to 17.3 LTS, upgrading the Spark/Scala artifact suffix from _2.12 to _2.13, bumping SPARK_RAPIDS_VERSION to 26.04.2, and removing the now-unnecessary manual CUDA installation block.

Runtime upgrade: README and init script updated to target Databricks 17.3 LTS ML GPU Runtime, reflecting the end-of-life timeline for 13.3 LTS.
CUDA block removed: The wget/sh CUDA runfile installer and /usr/local/cuda symlink reset are removed, as DBR 17.3 ships CUDA directly and re-installing would regress the environment.
Artifact update: JAR download now targets rapids-4-spark_2.13-26.04.2, and the PYTHONPATH in the README is updated to match; a new comment explains that SPARK_RAPIDS_VERSION and RAPIDS_VERSION may legitimately differ and directs users to the compatibility matrix.

Confidence Score: 4/5

Safe to merge for the runtime upgrade and CUDA removal; the pip RAPIDS packages are pinned to 25.12.0 while the in-script comment examples reference 26.4.0, which an earlier review round already flagged as potentially stale.

The JAR artifact, PYTHONPATH, and Scala suffix are all consistently updated to 26.04.2 / _2.13. The CUDA removal is intentional and correctly documented. The one remaining open question — whether RAPIDS_VERSION=25.12.0 is the deliberate, verified-compatible version for DBR 17.3 or an oversight — was raised in a prior review cycle and has not yet been explicitly confirmed or corrected by the author.

notebooks/databricks/init-pip-cuda-12.sh — specifically the RAPIDS_VERSION value and whether it matches the intended RAPIDS Python package version for DBR 17.3.

Important Files Changed

Filename	Overview
notebooks/databricks/README.md	Updated runtime version to 17.3, removed stale CUDA bullet, and corrected PYTHONPATH jar to rapids-4-spark_2.13-26.04.2.jar — all consistent with the init script changes.
notebooks/databricks/init-pip-cuda-12.sh	SPARK_RAPIDS_VERSION bumped to 26.04.2 and jar artifact updated to _2.13; CUDA install block removed. RAPIDS_VERSION remains at 25.12.0 — author comments indicate this divergence may be intentional, but the in-code comment example says 26.4.0.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Cluster node starts\nDBR 17.3 LTS ML GPU] --> B[init-pip-cuda-12.sh runs]
    B --> C[curl: download rapids-4-spark_2.13-26.04.2-cuda12.jar\n→ /databricks/jars/]
    C --> D[pip upgrade]
    D --> E[pip install cudf-cu12~=25.12.0\ncuml-cu12~=25.12.0\ncuvs-cu12~=25.12.0\npylibraft-cu12~=25.12.0\nraft-dask-cu12~=25.12.0\nnumpy~=1.0]
    E --> F[pip install spark-rapids-ml]
    F --> G{SPARK_RAPIDS_ML_NO_IMPORT_ENABLED == 1?}
    G -->|Yes| H[Write IPython startup\nscript for no-import-change UX]
    G -->|No| I[Done]
    H --> I

    J[Spark Executor starts] --> K[PYTHONPATH includes\n/databricks/jars/rapids-4-spark_2.13-26.04.2.jar]
    K --> L[SQLPlugin loaded\nGPU acceleration active]

_{Reviews (7): Last reviewed commit: "Syncing appropriate rapids and spark_rap..." | Re-trigger Greptile}

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

eordentlich

Thanks for this update. I will test it out.

A couple of minor comments.

eordentlich · 2026-05-21T15:03:59Z

+# 
+# Note also that sometimes the jar and python packages will have different patch versions published and available at any time,
+# so the versions may not perfectly align. This is expected and should not cause issues.
+RAPIDS_VERSION=26.4.0


I think we need to keep this at 25.12.0 for now, till we do an update to match cuML changes. This is forthcoming for 26.6.0.

The Spark rapids version bump is fine.

eordentlich · 2026-05-21T15:08:16Z

Please update copyright date to 2026 while you are changing this file.

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

eordentlich · 2026-05-22T16:55:39Z

+# Note also that sometimes the jar and python packages will have different patch versions published and available at any time,
+# so the versions may not perfectly align. This is expected and should not cause issues.
 RAPIDS_VERSION=25.12.0
 SPARK_RAPIDS_VERSION=25.12.0


Looks like my previous comment might have been a bit confusing. Please keep SPARK_RAPIDS_VERSION=26.04.2 (and RAPIDS_VERSION=25.12.0 - ok to mismatch here as they are completely disjoint packages). It seems the 25.12.0 for Spark Rapids is technically not fully compatible with DB 17.3 per the release notes .

And also sync the version in the jar name in the README.

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

greptile-apps · 2026-05-22T17:52:30Z

      spark.databricks.delta.preview.enabled true
      spark.python.worker.reuse true
-      spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-25.12.0.jar:/databricks/spark/python
+      spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.13-26.4.2.jar:/databricks/spark/python


The PYTHONPATH in the README references rapids-4-spark_2.13-26.4.2.jar (no leading zero), but the init script sets SPARK_RAPIDS_VERSION=26.04.2 and downloads the jar to /databricks/jars/rapids-4-spark_2.13-26.04.2.jar (with leading zero). The comment in the script even documents this convention: "SPARK_RAPIDS_VERSION (jar) should have leading 0 in month/minor (e.g. 26.04.2 and not 26.4.2)". At runtime, Spark executors will fail to start because the PYTHONPATH points to a file that was never written to disk.

Suggested change

spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.13-26.4.2.jar:/databricks/spark/python

spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.13-26.04.2.jar:/databricks/spark/python

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

eordentlich

one more iteration.

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

eordentlich · 2026-05-22T21:55:41Z

build

eordentlich

Looks good. I tried it out and it worked.
Thanks!

muzza-lovelytics added 3 commits May 20, 2026 16:55

Updated versions to match Databricks Spark 4 and Scala 2.13

2e37fa2

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

Removing unnecessary CUDA regression.

39bab9a

Databricks 17.3 ML LTS already natively provisions a highly optimized CUDA 12.6 environment, meaning the script causes an unstable driver/toolkit regression. Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

Syncing spark config instructions to match new updated jar version.

7a0a824

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

Fixed documentation line about CUDA runtime updates.

c71e947

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

eordentlich reviewed May 21, 2026

View reviewed changes

Changed version as per eordentilich request

2621a40

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

greptile-apps Bot reviewed May 22, 2026

View reviewed changes

Comment thread notebooks/databricks/init-pip-cuda-12.sh Outdated

Forgot to sync the Spark config in the readme

d9a4cca

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

eordentlich reviewed May 22, 2026

View reviewed changes

Reverting back to 26.4.2 and 26.4.0

9e5cab0

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

greptile-apps Bot reviewed May 22, 2026

View reviewed changes

Fixing leading 0 in JAR path.

cd18ca4

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

eordentlich reviewed May 22, 2026

View reviewed changes

Comment thread notebooks/databricks/init-pip-cuda-12.sh Outdated

Syncing appropriate rapids and spark_rapids versions.

fe653b1

Signed-off-by: Murray Todd Williams <murray.williams@lovelytics.com>

eordentlich changed the title ~~Update databricks 17.3~~ Update databricks 17.3 [skip ci] May 22, 2026

eordentlich approved these changes May 22, 2026

View reviewed changes

eordentlich merged commit c51743b into NVIDIA:main May 22, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update databricks 17.3 [skip ci]#1017

Update databricks 17.3 [skip ci]#1017
eordentlich merged 9 commits into
NVIDIA:mainfrom
muzza-lovelytics:update-databricks-17.3

muzza-lovelytics commented May 20, 2026

Uh oh!

greptile-apps Bot commented May 20, 2026 •

edited

Loading

Uh oh!

eordentlich left a comment

Uh oh!

eordentlich May 21, 2026

Uh oh!

eordentlich May 21, 2026

Uh oh!

Uh oh!

eordentlich May 22, 2026

Uh oh!

eordentlich May 22, 2026

Uh oh!

greptile-apps Bot May 22, 2026

Uh oh!

eordentlich left a comment

Uh oh!

Uh oh!

eordentlich commented May 22, 2026

Uh oh!

eordentlich left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.13-26.4.2.jar:/databricks/spark/python
	spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.13-26.04.2.jar:/databricks/spark/python

Uh oh!

Conversation

muzza-lovelytics commented May 20, 2026

Uh oh!

greptile-apps Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

eordentlich left a comment

Choose a reason for hiding this comment

Uh oh!

eordentlich May 21, 2026

Choose a reason for hiding this comment

Uh oh!

eordentlich May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eordentlich May 22, 2026

Choose a reason for hiding this comment

Uh oh!

eordentlich May 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

eordentlich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eordentlich commented May 22, 2026

Uh oh!

eordentlich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 20, 2026 •

edited

Loading