[BUG] PCA nightly example fails: Py4JException Method mapInPandas does not exist (pyspark 3.5.8 vs Spark 3.4.3 mismatch)

**Describe the bug**
Build: examples_PCA_build_nightly/1550

The PCA example notebook (pca.ipynb) fails during gpu_pca_loaded.fit(data_df) in the jupyter nbconvert --execute step. The error is a Py4JException indicating the JVM mapInPandas method with a Boolean (barrier) argument does not exist. The installed Python pyspark is 3.5.8 (resolved from spark-rapids-ml requirements.txt 'pyspark>=3.4.1,<4.0'), while the Spark runtime distribution started as the standalone cluster is 3.4.3. PySpark 3.5.x mapInPandas passes an extra 'barrier' Boolean argument that Spark 3.4.3 Dataset.mapInPandas does not accept. Deterministic client/server version mismatch.

Error logs:
```
Collecting pyspark<4.0,>=3.4.1 (from requirements.txt line 17)
  Downloading pyspark-3.5.8.tar.gz (317.8 MB)
...
wget https://.../org/apache/spark/3.4.3/spark-3.4.3-bin-hadoop3.tgz
...
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
gpu_pca_model = gpu_pca_loaded.fit(data_df)
...
File ".../spark_rapids_ml/core.py", line 1000, in _call_cuml_fit_func
    dataset.mapInPandas(_train_udf, schema=self._out_schema())
File ".../pyspark/sql/pandas/map_ops.py", line 112, in mapInPandas
    jdf = self._jdf.mapInPandas(udf_column._jc.expr(), barrier)
Py4JError: An error occurred while calling o665.mapInPandas. Trace:
py4j.Py4JException: Method mapInPandas([class org.apache.spark.sql.catalyst.expressions.PythonUDF, class java.lang.Boolean]) does not exist
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
	at py4j.Gateway.invoke(Gateway.java:274)
```

**Environment details**
- spark-rapids-ml 26.6.0; requirements.txt pyspark>=3.4.1,<4.0 -> installs pyspark 3.5.8
- Spark runtime distribution 3.4.3 (bin-hadoop3) standalone cluster
- conda env pca-nightly, python 3.11
-->

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] PCA nightly example fails: Py4JException Method mapInPandas does not exist (pyspark 3.5.8 vs Spark 3.4.3 mismatch) #15155

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] PCA nightly example fails: Py4JException Method mapInPandas does not exist (pyspark 3.5.8 vs Spark 3.4.3 mismatch) #15155

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions