Skip to content

[BUG] PCA nightly example fails: Py4JException Method mapInPandas does not exist (pyspark 3.5.8 vs Spark 3.4.3 mismatch) #15155

Description

@hyperbolic2346

Describe the bug
Build: examples_PCA_build_nightly/1550

The PCA example notebook (pca.ipynb) fails during gpu_pca_loaded.fit(data_df) in the jupyter nbconvert --execute step. The error is a Py4JException indicating the JVM mapInPandas method with a Boolean (barrier) argument does not exist. The installed Python pyspark is 3.5.8 (resolved from spark-rapids-ml requirements.txt 'pyspark>=3.4.1,<4.0'), while the Spark runtime distribution started as the standalone cluster is 3.4.3. PySpark 3.5.x mapInPandas passes an extra 'barrier' Boolean argument that Spark 3.4.3 Dataset.mapInPandas does not accept. Deterministic client/server version mismatch.

Error logs:

Collecting pyspark<4.0,>=3.4.1 (from requirements.txt line 17)
  Downloading pyspark-3.5.8.tar.gz (317.8 MB)
...
wget https://.../org/apache/spark/3.4.3/spark-3.4.3-bin-hadoop3.tgz
...
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
gpu_pca_model = gpu_pca_loaded.fit(data_df)
...
File ".../spark_rapids_ml/core.py", line 1000, in _call_cuml_fit_func
    dataset.mapInPandas(_train_udf, schema=self._out_schema())
File ".../pyspark/sql/pandas/map_ops.py", line 112, in mapInPandas
    jdf = self._jdf.mapInPandas(udf_column._jc.expr(), barrier)
Py4JError: An error occurred while calling o665.mapInPandas. Trace:
py4j.Py4JException: Method mapInPandas([class org.apache.spark.sql.catalyst.expressions.PythonUDF, class java.lang.Boolean]) does not exist
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
	at py4j.Gateway.invoke(Gateway.java:274)

Environment details

  • spark-rapids-ml 26.6.0; requirements.txt pyspark>=3.4.1,<4.0 -> installs pyspark 3.5.8
  • Spark runtime distribution 3.4.3 (bin-hadoop3) standalone cluster
  • conda env pca-nightly, python 3.11
    -->

Metadata

Metadata

Assignees

No one assigned

    Labels

    ? - Needs TriageNeed team to review and classifybot_watchSlack bot watched issue for LLM analyzerbugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions