cumulative updates since 25.12 to align with cuML 26.06#1019
Conversation
|
build |
Greptile SummaryThis PR aligns spark-rapids-ml with cuML 26.06, bumps the package version to 26.6.0, drops Spark 3.3 / Python 3.10 support, and updates Databricks benchmark tooling to target runtimes 15.4–17.3.
Confidence Score: 4/5The PR is safe to merge; the cuML API changes are straightforward adapter updates and the Spark 3.3 cleanup is well-scoped. Changes are primarily version-bump bookkeeping plus targeted API adapter fixes for cuML 26.06. The max_depth deprecated placeholder in _get_cuml_params_default is unconventional and could be confusing, but is functionally safe because _initialize_cuml_params always overwrites it with the Spark default before any cuML call. The undocumented n_components=1 to 2 change in test_pipeline.py mildly reduces test coverage of the 1-component UMAP case without explanation. python/src/spark_rapids_ml/tree.py (the deprecated placeholder and the treelite removal) and python/tests/test_pipeline.py (silent n_components change) deserve a second look. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[RandomForest model bytes from training worker] --> B{cuML 25.12}
A --> C{cuML 26.06}
B --> D[treelite.Model.deserialize_bytes]
D --> E[rf._treelite_model_bytes = treelite_obj]
C --> F[rf._treelite_model_bytes = model bytes directly]
E --> G[rf.predict]
F --> G
G --> H[Transform output]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[RandomForest model bytes from training worker] --> B{cuML 25.12}
A --> C{cuML 26.06}
B --> D[treelite.Model.deserialize_bytes]
D --> E[rf._treelite_model_bytes = treelite_obj]
C --> F[rf._treelite_model_bytes = model bytes directly]
E --> G[rf.predict]
F --> G
G --> H[Transform output]
Reviews (1): Last reviewed commit: "make copyrights more consistent, make ex..." | Re-trigger Greptile |
| if [[ $db_version > 16.4 ]]; then | ||
| SCALA_VERSION=2.13 | ||
| fi |
There was a problem hiding this comment.
Lexicographic version comparison is fragile for DB runtime versions
[[ $db_version > 16.4 ]] uses bash string ordering, not numeric ordering. For the currently listed versions (15.4, 16.4, 17.3) this happens to be correct, but a future runtime like "16.10" would compare as less than "16.4" lexicographically (since "1" < "4"), incorrectly keeping SCALA_VERSION=2.12. The same pattern applies in init-pip-cuda-12.sh at the [[ $DATABRICKS_RUNTIME_VERSION < "17.3" ]] guards. Consider splitting on . and doing integer comparisons, or using sort -V-based logic.
| # install cuML | ||
| ARG RAPIDS_VERSION=25.12 | ||
| RUN conda install -y -c rapidsai -c conda-forge -c nvidia cuml=$RAPIDS_VERSION cuvs=$RAPIDS_VERSION python=3.10 pylibraft=$RAPIDS_VERSION raft-dask=$RAPIDS_VERSION cuda-version=12.2 numpy~=1.0 \ | ||
| ARG RAPIDS_VERSION=26.06 |
There was a problem hiding this comment.
CI now uses rapidsai-nightly channel
The channel was changed from rapidsai (stable) to rapidsai-nightly for 26.06. If this is temporary until the stable release is published, it should be tracked — nightly packages can change daily and may introduce unintended breakage in CI. Consider adding a comment noting this should be switched back to rapidsai once 26.06 is released to the stable channel.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
build |
1 similar comment
|
build |
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
…mpatible with pyspark 3.3 Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
…ataproc Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
64e60eb to
71c8197
Compare
|
build |
…iles to use python > 3.10, fix databricks 17.3 with plugin, update plugin to 26.06 Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
|
build |
1 similar comment
|
build |
rishic3
left a comment
There was a problem hiding this comment.
LGTM, very minor comment
…on 3.11 for compatibility with spark < 4 Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
|
build |
1 similar comment
|
build |
In addition: