Skip to content

Add Spark 3.3.0 SQL shim module sources#15043

Open
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02b-fileiofrom
codex/unshim-stack-02c-shims-330
Open

Add Spark 3.3.0 SQL shim module sources#15043
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02b-fileiofrom
codex/unshim-stack-02c-shims-330

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It adds the Spark 3.3.0 SQL shim module sources after the shared helper module wiring is in place. Keeping this as its own layer makes the Spark-family source population easy to inspect.

Stack context

Testing and validation notes

  • No standalone behavior change is intended in this layer. It is covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025 and the existing tests for the affected subsystem.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02c-shims-330 branch from de49e32 to e6f5a41 Compare June 13, 2026 12:13
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:48
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR populates the sql-plugin-shims module with Spark 3.3.0 and Databricks 3.3.0 (11.3) SQL shim sources as one reviewable layer in the unshim stack. All files are pure additions with no deletions.

  • Spark 3.3.0 shims: SparkShimServiceProvider, ShuffleManagerShims, ShuffleClientShims, FileCommitProtocolShims, OriginContextShim, SparkUpgradeExceptionShims, SequenceSizeTooLongErrorBuilder, CreateDataSourceTableAsSelectRules, and TrampolineConnectShims are added; most carry broad version headers covering 330–411.
  • Databricks 3.3.0 shims: DatabricksShimServiceProvider, SparkShimServiceProvider (330db), OriginContextShim, SparkDateTimeExceptionShims, and SparkUpgradeExceptionShims (330db-specific constructor) are added.
  • Shared helpers: ConvUtils (reflection-based overflowInConvError dispatcher) and SparkSessionUtils (reflection-based session/leafNodeDefaultParallelism accessor) land in the non-version-gated source tree.

Confidence Score: 3/5

The bulk of the shim wiring is safe; two files have defects that could surface silently at runtime.

SparkDateTimeExceptionShims on Databricks 3.3 silently discards messageParameters, producing exceptions with empty substitution values. SparkSessionUtils.invokeNoArg leaves NoSuchMethodException unhandled so callers targeting Spark 3.3 see a raw propagated exception with no fallback opportunity.

SparkDateTimeExceptionShims.scala (spark330db) and SparkSessionUtils.scala (shared)

Important Files Changed

Filename Overview
sql-plugin-shims/src/main/spark330db/scala/org/apache/spark/sql/rapids/shims/SparkDateTimeExceptionShims.scala DB-specific SparkDateTimeException factory; the messageParameters argument is silently dropped, producing exceptions with blank substitution values.
sql-plugin-shims/src/main/scala/org/apache/spark/sql/rapids/shims/SparkSessionUtils.scala Reflection-based helper; only catches InvocationTargetException, leaving NoSuchMethodException unhandled for Spark 3.4+-only methods like leafNodeDefaultParallelism.
sql-plugin-shims/src/main/scala/org/apache/spark/sql/errors/ConvUtils.scala Shared reflection shim for overflowInConvError; passes null for single parameter without type validation, risking NPE on primitive-boxed parameter types.
sql-plugin-shims/src/main/spark330/scala/org/apache/spark/sql/rapids/shims/TrampolineConnectShims.scala SparkSession/Dataset bridge for pre-Connect Spark; type aliases shadow local imports with identical types, minor redundancy only.
sql-plugin-shims/src/main/spark330db/scala/com/nvidia/spark/rapids/DatabricksShimServiceProvider.scala Databricks runtime detection via BuildInfo; correctly uses ignoreExceptions guard.

Comments Outside Diff (1)

  1. sql-plugin-shims/src/main/scala/org/apache/spark/sql/rapids/shims/SparkSessionUtils.scala, line 86-93 (link)

    P1 NoSuchMethodException not caught in invokeNoArg

    Class.getMethod(name) throws NoSuchMethodException when the method is absent, which this helper does not catch. leafNodeDefaultParallelism was introduced in Spark 3.4; calling it against a Spark 3.3 runtime will propagate a raw NoSuchMethodException with no fallback. Catching NoSuchMethodException | IllegalAccessException and wrapping in UnsupportedOperationException would make version-gated call sites fail more informatively.

Reviews (1): Last reviewed commit: "Add SQL shim module sources for Spark 3...." | Re-trigger Greptile

Comment on lines +21 to +33

import org.apache.spark.{QueryContext, SparkDateTimeException}

object SparkDateTimeExceptionShims {

def newSparkDateTimeException(
errorClass: String,
messageParameters: Map[String, String],
context: Array[QueryContext],
summary: String): SparkDateTimeException = {
new SparkDateTimeException(
errorClass,
None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 messageParameters silently dropped

The messageParameters: Map[String, String] argument is accepted by the method but never forwarded to the SparkDateTimeException constructor — Array.empty is passed in its place. Any caller that supplies message parameters expecting them to appear in the exception message will silently get an exception with empty/blank substitution values, producing a misleading or incomplete error string at runtime on Databricks 3.3.0.

Comment on lines +33 to +38
} catch {
case _: ClassNotFoundException | _: NoSuchFieldException =>
throw new UnsupportedOperationException()
case e: InvocationTargetException =>
throw e.getCause
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded null passed to unknown parameter type

The code finds any overflowInConvError overload with exactly one parameter and invokes it with null. If that parameter is a primitive-boxed type, JVM unboxing will throw a NullPointerException inside invoke, masking the intended error. Either matching the exact parameter type in the find predicate or adding a comment documenting the nullable assumption would reduce the ambiguity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant