Skip to content
5 changes: 2 additions & 3 deletions notebooks/databricks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,10 @@ If you already have a Databricks account, you can run the example notebooks on a
databricks workspace import --format AUTO --file init-pip-cuda-12.sh ${WS_SAVE_DIR}/init-pip-cuda-12.sh --profile ${PROFILE}
```
**Note**: the init script does the following on each Spark node:
- updates the CUDA runtime (required for Spark Rapids ML dependencies).
- downloads and installs the [Spark-Rapids](https://github.com/NVIDIA/spark-rapids) plugin for accelerating data loading and Spark SQL.
- installs various `cuXX` dependencies via pip.
- if the cluster environment variable `SPARK_RAPIDS_ML_NO_IMPORT_ENABLED=1` is define (see below), the init script also modifies a Databricks notebook kernel startup script to enable no-import change UX for the cluster. See [no-import-change](../README.md#no-import-change).
- Create a cluster using **Databricks 13.3 LTS ML GPU Runtime** using at least two single-gpu workers and add the following configurations to the **Advanced options**.
- Create a cluster using **Databricks 17.3 LTS ML GPU Runtime** using at least two single-gpu workers and add the following configurations to the **Advanced options**.
- **Init Scripts**
- add the workspace path to the uploaded init script `${WS_SAVE_DIR}/init-pip-cuda-12.sh` as set above (but substitute variables manually in the form).
- **Spark**
Expand All @@ -27,7 +26,7 @@ If you already have a Databricks account, you can run the example notebooks on a
spark.task.resource.gpu.amount 0.125
spark.databricks.delta.preview.enabled true
spark.python.worker.reuse true
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-25.12.0.jar:/databricks/spark/python
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.13-25.12.0.jar:/databricks/spark/python
spark.sql.execution.arrow.maxRecordsPerBatch 100000
spark.plugins com.nvidia.spark.SQLPlugin
spark.locality.wait 0s
Expand Down
21 changes: 8 additions & 13 deletions notebooks/databricks/init-pip-cuda-12.sh

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update copyright date to 2026 while you are changing this file.

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2025, NVIDIA CORPORATION.
# Copyright (c) 2026, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -15,21 +15,16 @@

set -ex

# IMPORTANT: specify RAPIDS_VERSION fully 23.10.0 and not 23.10
# also in general, RAPIDS_VERSION (python) fields should omit any leading 0 in month/minor field (i.e. 23.8.0 and not 23.08.0)
# while SPARK_RAPIDS_VERSION (jar) should have leading 0 in month/minor (e.g. 23.08.2 and not 23.8.2)
# IMPORTANT: specify RAPIDS_VERSION fully 26.4.0 and not 26.4
# also in general, RAPIDS_VERSION (python) fields should omit any leading 0 in month/minor field (i.e. 26.4.0 and not 26.04.0)
# while SPARK_RAPIDS_VERSION (jar) should have leading 0 in month/minor (e.g. 26.04.2 and not 26.4.2)
#
# Note also that sometimes the jar and python packages will have different patch versions published and available at any time,
# so the versions may not perfectly align. This is expected and should not cause issues.
RAPIDS_VERSION=25.12.0
SPARK_RAPIDS_VERSION=25.12.0
Comment thread
muzza-lovelytics marked this conversation as resolved.
Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my previous comment might have been a bit confusing. Please keep SPARK_RAPIDS_VERSION=26.04.2 (and RAPIDS_VERSION=25.12.0 - ok to mismatch here as they are completely disjoint packages). It seems the 25.12.0 for Spark Rapids is technically not fully compatible with DB 17.3 per the release notes .

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also sync the version in the jar name in the README.


curl -L https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/${SPARK_RAPIDS_VERSION}/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}-cuda12.jar -o /databricks/jars/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}.jar

# install cudatoolkit 12.2 via runfile approach
wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run
sh cuda_12.2.2_535.104.05_linux.run --silent --toolkit

# reset symlink and update library loading paths
rm /usr/local/cuda
ln -s /usr/local/cuda-12.2 /usr/local/cuda
curl -L https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/${SPARK_RAPIDS_VERSION}/rapids-4-spark_2.13-${SPARK_RAPIDS_VERSION}-cuda12.jar -o /databricks/jars/rapids-4-spark_2.13-${SPARK_RAPIDS_VERSION}.jar

# upgrade pip
/databricks/python/bin/pip install --upgrade pip
Expand Down