cumulative updates since 25.12 to align with cuML 26.06 by eordentlich · Pull Request #1019 · NVIDIA/cuml-spark

eordentlich · 2026-06-03T20:14:13Z

In addition:

dropping support for Spark 3.3 due to incompatible min python version for cuML
drops DB 13.3 and 14.3 support in benchmark and notebook examples

eordentlich · 2026-06-03T20:33:13Z

build

greptile-apps · 2026-06-03T20:37:41Z

Greptile Summary

This PR aligns spark-rapids-ml with cuML 26.06, bumps the package version to 26.6.0, drops Spark 3.3 / Python 3.10 support, and updates Databricks benchmark tooling to target runtimes 15.4–17.3.

cuML API updates: Handle() is now explicitly passed to LogisticRegressionMG, LinearRegressionMG, and PCAMG; treelite deserialization is removed from the RF inference path (model bytes are assigned directly); radius is added to KNN defaults; device_ids and force_serial_epochs are added to UMAP defaults.
Spark 3.3 cleanup: All version-guarded compatibility shims for PySpark < 3.4 are removed from source, tests, and scripts; PySpark requirement widened to >=3.4.1,<4.0.
Bug fixes: UMAP model loading now uses pdf[\"data\"] (column access) instead of pdf.data (attribute access); _load_sparse_data in tests correctly handles zero-row normalization to avoid division by zero.

Confidence Score: 4/5

The PR is safe to merge; the cuML API changes are straightforward adapter updates and the Spark 3.3 cleanup is well-scoped.

Changes are primarily version-bump bookkeeping plus targeted API adapter fixes for cuML 26.06. The max_depth deprecated placeholder in _get_cuml_params_default is unconventional and could be confusing, but is functionally safe because _initialize_cuml_params always overwrites it with the Spark default before any cuML call. The undocumented n_components=1 to 2 change in test_pipeline.py mildly reduces test coverage of the 1-component UMAP case without explanation.

python/src/spark_rapids_ml/tree.py (the deprecated placeholder and the treelite removal) and python/tests/test_pipeline.py (silent n_components change) deserve a second look.

Important Files Changed

Filename	Overview
python/src/spark_rapids_ml/tree.py	Removes treelite deserialization (assigns model bytes directly to treelite_model_bytes), sets n_features_in explicitly on RF models, and marks max_depth default as deprecated in _get_cuml_params_default.
python/src/spark_rapids_ml/classification.py	Adds explicit Handle() construction for LogisticRegressionMG to match new cuML 26.06 API requirement.
python/src/spark_rapids_ml/regression.py	Removes deprecated normalize/standardization mapping for LinearRegression and adds explicit Handle() to LinearRegressionMG, aligning with cuML 26.06.
python/src/spark_rapids_ml/umap.py	Adds device_ids and force_serial_epochs to default params; fixes pdf[data] column access (was pdf.data); imports cast from typing.
python/src/spark_rapids_ml/knn.py	Adds radius: 1.0 default parameter to both NearestNeighborsClass and ApproximateNearestNeighborsClass to match new cuML 26.06 API.
python/pyproject.toml	Version bumped to 26.6.0; minimum Python raised from 3.10 to 3.11; classifiers updated to 3.11/3.12; PySpark range widened to >=3.4.1,<4.0.
python/benchmark/databricks/run_benchmark.sh	Adds dynamic SCALA_VERSION selection (2.12 vs 2.13) based on Databricks runtime version; updates SPARK_RAPIDS_VERSION to 26.06.0; drops support for DB 13.3 and 14.3.
python/tests/test_pipeline.py	Changes UMAP test parameter from n_components=1 to n_components=2 without explanation.
python/tests/test_logistic_regression.py	Removes all PySpark < 3.4 version-guard early returns; tests now run the full sparse path unconditionally.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[RandomForest model bytes from training worker] --> B{cuML 25.12}
    A --> C{cuML 26.06}
    B --> D[treelite.Model.deserialize_bytes]
    D --> E[rf._treelite_model_bytes = treelite_obj]
    C --> F[rf._treelite_model_bytes = model bytes directly]
    E --> G[rf.predict]
    F --> G
    G --> H[Transform output]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[RandomForest model bytes from training worker] --> B{cuML 25.12}
    A --> C{cuML 26.06}
    B --> D[treelite.Model.deserialize_bytes]
    D --> E[rf._treelite_model_bytes = treelite_obj]
    C --> F[rf._treelite_model_bytes = model bytes directly]
    E --> G[rf.predict]
    F --> G
    G --> H[Transform output]

_{Reviews (1): Last reviewed commit: "make copyrights more consistent, make ex..." | Re-trigger Greptile}

greptile-apps · 2026-06-03T20:37:46Z

+if [[ $db_version > 16.4 ]]; then
+    SCALA_VERSION=2.13
+fi


Lexicographic version comparison is fragile for DB runtime versions

[[ $db_version > 16.4 ]] uses bash string ordering, not numeric ordering. For the currently listed versions (15.4, 16.4, 17.3) this happens to be correct, but a future runtime like "16.10" would compare as less than "16.4" lexicographically (since "1" < "4"), incorrectly keeping SCALA_VERSION=2.12. The same pattern applies in init-pip-cuda-12.sh at the [[ $DATABRICKS_RUNTIME_VERSION < "17.3" ]] guards. Consider splitting on . and doing integer comparisons, or using sort -V-based logic.

greptile-apps · 2026-06-03T20:37:47Z

 # install cuML
-ARG RAPIDS_VERSION=25.12
-RUN conda install -y -c rapidsai -c conda-forge -c nvidia cuml=$RAPIDS_VERSION cuvs=$RAPIDS_VERSION python=3.10 pylibraft=$RAPIDS_VERSION raft-dask=$RAPIDS_VERSION cuda-version=12.2 numpy~=1.0 \
+ARG RAPIDS_VERSION=26.06


CI now uses rapidsai-nightly channel

The channel was changed from rapidsai (stable) to rapidsai-nightly for 26.06. If this is temporary until the stable release is published, it should be tracked — nightly packages can change daily and may introduce unintended breakage in CI. Consider adding a comment noting this should be switched back to rapidsai once 26.06 is released to the stable channel.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

eordentlich · 2026-06-03T21:32:57Z

build

eordentlich · 2026-06-03T22:29:08Z

build

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

…mpatible with pyspark 3.3 Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

…ataproc Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

eordentlich · 2026-06-13T16:51:39Z

build

…iles to use python > 3.10, fix databricks 17.3 with plugin, update plugin to 26.06 Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

eordentlich · 2026-06-24T05:05:33Z

build

eordentlich · 2026-06-24T16:39:32Z

build

rishic3

LGTM, very minor comment

…on 3.11 for compatibility with spark < 4 Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

eordentlich · 2026-06-25T01:15:08Z

build

eordentlich · 2026-06-25T02:06:13Z

build

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

rishic3 reviewed Jun 8, 2026

View reviewed changes

Comment thread .claude/skills/update-rapids-version/SKILL.md

Comment thread ci/test.sh

rishic3 previously approved these changes Jun 9, 2026

View reviewed changes

Comment thread .claude/skills/update-rapids-version/SKILL.md

eordentlich added 10 commits June 11, 2026 22:01

preliminary rapids 26.02 updates to pass tests

7e845df

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

updates for 26.04 + claude skill for this update

96077ce

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

more 26.04 updates

6d573b9

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

updates for rapids 26.06

dc518ac

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

drop spark 3.3 test, as rapids minimum python is 3.11 which is not co…

ba7f67e

…mpatible with pyspark 3.3 Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

update databricks benchmark scripts

3bed382

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

update copyright years

b2982ae

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

bumpy python version in ci Docker

ab5a9cb

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

add some TODOs to track official 26.06 rapids release

624ebcb

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

updates to align with official 26.06 rapids release. update emr and d…

71c8197

…ataproc Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

eordentlich dismissed rishic3’s stale review via 71c8197 June 13, 2026 16:48

eordentlich force-pushed the eo-26.06-updates branch from 64e60eb to 71c8197 Compare June 13, 2026 16:48

rishic3 reviewed Jun 22, 2026

View reviewed changes

Comment thread python/benchmark/databricks/gpu_etl_cluster_spec.sh Outdated

eordentlich added 2 commits June 23, 2026 16:11

update more pyspark 3.3 drop related code,docs,tests, update Docker f…

67d2f01

…iles to use python > 3.10, fix databricks 17.3 with plugin, update plugin to 26.06 Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

license

4d7490b

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

rishic3 reviewed Jun 24, 2026

View reviewed changes

Comment thread notebooks/aws-emr/init-bootstrap-action.sh Outdated

make copyrights more consistent, make example Dockerfile.pip use pyth…

7492b96

…on 3.11 for compatibility with spark < 4 Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

rishic3 approved these changes Jun 25, 2026

View reviewed changes

eordentlich merged commit b6cb77a into NVIDIA:main Jun 25, 2026
4 checks passed

eordentlich deleted the eo-26.06-updates branch June 25, 2026 18:34

Uh oh!

Conversation

eordentlich commented Jun 3, 2026

Uh oh!

eordentlich commented Jun 3, 2026

Uh oh!

greptile-apps Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

greptile-apps Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

eordentlich commented Jun 3, 2026

Uh oh!

eordentlich commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eordentlich commented Jun 13, 2026

Uh oh!

Uh oh!

eordentlich commented Jun 24, 2026

Uh oh!

eordentlich commented Jun 24, 2026

Uh oh!

rishic3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eordentlich commented Jun 25, 2026

Uh oh!

eordentlich commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 3, 2026 •

edited

Loading