Skip to content

Add Spark 3.5 and 4.0 SQL shim module sources#15042

Open
gerashegalov wants to merge 2 commits into
codex/unshim-stack-02d-shims-334from
codex/unshim-stack-02e-shims-35-40
Open

Add Spark 3.5 and 4.0 SQL shim module sources#15042
gerashegalov wants to merge 2 commits into
codex/unshim-stack-02d-shims-334from
codex/unshim-stack-02e-shims-35-40

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It adds the Spark 3.5 and Spark 4.0 SQL shim module sources as the final source-population layer before columnar/helper class movement begins.

Stack context

Testing and validation notes

  • No standalone behavior change is intended in this layer. It is covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025 and the existing tests for the affected subsystem.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02e-shims-35-40 branch from ac60c19 to 8288b83 Compare June 13, 2026 12:13
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:49
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR populates the new sql-plugin-shims module with Spark 3.5.x and 4.x SQL shim module sources as part of the "unshim stack" refactoring (#15025). All 21 added files are faithful, character-for-character copies of their counterparts already present in sql-plugin/src/main/spark{VERSION}/, with copyright years updated to include 2026.

  • SparkShimServiceProvider stubs for spark350–358, spark400–411, and Databricks variants (spark350db143, spark400db173) are added, each registering the correct SparkShimVersion/DatabricksShimVersion and matchesVersion logic.
  • Utility shims (SequenceSizeExceededLimitErrorBuilder, OriginContextShim, TrampolineConnectShims, ShuffleManagerShims, ShuffleClientShims, FileCommitProtocolShims) are migrated with their per-version shim-json scope headers intact and consistent with the originals.

Confidence Score: 4/5

Safe to merge — all 21 files are faithful copies of pre-existing sql-plugin sources with no behavioral changes introduced.

Every added file was verified against its sql-plugin original and matches character-for-character (modulo copyright year). The one flagged item — the misleading // disabled by default comment in the Databricks 14.3 shim — is a pre-existing inaccuracy carried from the original source; no new defects are introduced. The migration to sql-plugin-shims is structurally correct and the shim-json headers are consistent with the originals.

sql-plugin-shims/src/main/spark350db143/scala/com/nvidia/spark/rapids/shims/spark350db143/SparkShimServiceProvider.scala — the // disabled by default comment contradicts the getOrElse(true) default, though this was copied from the original location.

Important Files Changed

Filename Overview
sql-plugin-shims/src/main/spark350db143/scala/com/nvidia/spark/rapids/shims/spark350db143/SparkShimServiceProvider.scala Faithful copy of sql-plugin counterpart; registers Databricks 14.3 shim. The "disabled by default" comment contradicts getOrElse(true) — carried from the original but worth fixing here.
sql-plugin-shims/src/main/spark350db143/scala/org/apache/spark/sql/rapids/shims/SequenceSizeExceededLimitErrorBuilder.scala Faithful copy of sql-plugin counterpart; bridges QueryExecutionErrors.createArrayWithElementsExceedLimitError for DB 14.3 and Spark 4.x builds.
sql-plugin-shims/src/main/spark400/scala/com/nvidia/spark/rapids/shims/spark400/SparkShimServiceProvider.scala Faithful copy; shim-json header includes both spark 400 and 400db173 (compilation scope), but matchesVersion only matches "4.0.0" at runtime — no conflict with the spark400db173 provider.
sql-plugin-shims/src/main/spark400/scala/org/apache/spark/sql/rapids/shims/OriginContextShim.scala Faithful copy; shims Spark 4.0 widening of Origin.context to QueryContext, narrowing back to SQLQueryContext for QueryExecutionErrors callers.
sql-plugin-shims/src/main/spark400/scala/org/apache/spark/sql/rapids/shims/TrampolineConnectShims.scala Faithful copy of sql-plugin counterpart (extra blank line before shim header also present in original); provides Spark 4 classic session API adapters and Avro 1.12 Schema.Parser shim.
sql-plugin-shims/src/main/spark400db173/scala/com/nvidia/spark/rapids/shims/spark400db173/SparkShimServiceProvider.scala Faithful copy; registers Databricks 17.3 shim service provider with enabled-by-default pattern consistent with the spark350db143 provider.
sql-plugin-shims/src/main/spark400db173/scala/org/apache/spark/sql/rapids/ShuffleManagerShims.scala Faithful copy; wraps DB 17.3 ShuffleManager.getReader with the extra boolean parameter that differs from the OSS Spark 4 API.
sql-plugin-shims/src/main/spark411/scala/org/apache/spark/sql/rapids/shims/FileCommitProtocolShims.scala Faithful copy; adapts Spark 4.1.0+ FileCommitProtocol.newTaskTempFile/newTaskTempFileAbsPath to use the FileNameSpec-based signatures.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph sql-plugin-shims ["sql-plugin-shims (new module)"]
        SP350[spark350 SparkShimServiceProvider] --> SV350[SparkShimVersion 3.5.0]
        SP351[spark351] --> SV351[SparkShimVersion 3.5.1 + SNAPSHOT]
        SP352[spark352..358] --> SV35x[SparkShimVersion 3.5.x]
        SP400[spark400] --> SV400[SparkShimVersion 4.0.0]
        SP401[spark401..402] --> SV40x[SparkShimVersion 4.0.x]
        SP411[spark411] --> SV411[SparkShimVersion 4.1.1]
        SP350DB[spark350db143 SparkShimServiceProvider] --> DBV143[DatabricksShimVersion 3.5.0 / 14.3]
        SP400DB[spark400db173 SparkShimServiceProvider] --> DBV173[DatabricksShimVersion 4.0.0 / 17.3]
    end
    SP350DB -->|runtime check: dbrVersion startsWith 14.3.x| DBV143
    SP400DB -->|runtime check: dbrVersion startsWith 17.3.x| DBV173
    sql-plugin-shims -->|exact copy of sources| existing["sql-plugin/src/main/spark{VERSION}"]
Loading

Reviews (1): Last reviewed commit: "Add SQL shim module sources for Spark 4" | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants