Skip to content

Add Spark 3.3.1 through 3.4 SQL shim module sources#15041

Open
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02c-shims-330from
codex/unshim-stack-02d-shims-334
Open

Add Spark 3.3.1 through 3.4 SQL shim module sources#15041
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02c-shims-330from
codex/unshim-stack-02d-shims-334

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It adds the Spark 3.3.1 through Spark 3.4 SQL shim module sources as a focused shim-family source population step.

Stack context

Testing and validation notes

  • No standalone behavior change is intended in this layer. It is covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025 and the existing tests for the affected subsystem.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02c-shims-330 branch from de49e32 to e6f5a41 Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02d-shims-334 branch from 80a2612 to 271c29e Compare June 13, 2026 12:13
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:49
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR is one layer in the "unshim stack" restructuring (#15025) that splits shim sources out of sql-plugin into the new sql-plugin-shims module. It adds Spark 3.3.1 through 3.4.4 (including Databricks 12.2 and 13.3 variants) as a focused population step with no standalone behavior changes.

  • 11 SparkShimServiceProvider files are added — one per Spark/Databricks version target — each declaring the correct SparkShimVersion/DatabricksShimVersion, a single-entry VERSIONNAMES (or DatabricksShimServiceProvider.matchesVersion delegation for DB builds), and shim json-lines annotations. All are faithful copies of their sql-plugin counterparts.
  • 6 multi-version utility shim files are migrated: CreateDataSourceTableAsSelectRules, WriteFilesExecShims, SparkDateTimeExceptionShims, SparkUpgradeExceptionShims (covering spark332db → spark411), SequenceSizeTooLongUnsuccessfulErrorBuilder (spark334, 342–344, 351–358 — intentionally skipping 340/341 as in the original), and OriginContextShim (spark340 → spark358, handling the SQLQueryContext vs wider QueryContext split introduced at Spark 3.4).

Confidence Score: 5/5

All 17 files are mechanical copies of existing sql-plugin sources into the new sql-plugin-shims module with no logic changes; safe to merge as part of the unshim stack sequence.

Every added file is a verbatim copy of its counterpart already shipping in sql-plugin, validated by the author as tree-equivalent at the full-stack level. The shim json-lines annotations, version constants, and DB delegation patterns all match the established conventions in the codebase. No resource management, GPU operations, or data-path logic is touched.

No files require special attention. The multi-version shim files in spark332db (SparkUpgradeExceptionShims, CreateDataSourceTableAsSelectRules, WriteFilesExecShims) have the widest applicability (332db → 411) and are the most worth spot-checking against the sql-plugin originals, which they match exactly.

Important Files Changed

Filename Overview
sql-plugin-shims/src/main/spark331/scala/com/nvidia/spark/rapids/shims/spark331/SparkShimServiceProvider.scala New SparkShimServiceProvider for Spark 3.3.1; faithful copy of sql-plugin counterpart with correct version constants and shim json-lines
sql-plugin-shims/src/main/spark332/scala/com/nvidia/spark/rapids/shims/spark332/SparkShimServiceProvider.scala New SparkShimServiceProvider for Spark 3.3.2; faithful copy of sql-plugin counterpart
sql-plugin-shims/src/main/spark332db/scala/com/nvidia/spark/rapids/shims/spark332db/SparkShimServiceProvider.scala New SparkShimServiceProvider for Databricks 12.2 (Spark 3.3.2 DB); correct DatabricksShimVersion(3,3,2,"12.2") and matchesVersion delegation
sql-plugin-shims/src/main/spark332db/scala/com/nvidia/spark/rapids/shims/CreateDataSourceTableAsSelectRules.scala Multi-version shim rule for CreateDataSourceTableAsSelectCommand covering spark332db through spark411; exact copy of sql-plugin source
sql-plugin-shims/src/main/spark332db/scala/com/nvidia/spark/rapids/shims/WriteFilesExecShims.scala Multi-version shim exec rule for WriteFilesExec covering spark332db through spark411; exact copy of sql-plugin source
sql-plugin-shims/src/main/spark332db/scala/org/apache/spark/sql/rapids/shims/SparkDateTimeExceptionShims.scala Multi-version shim factory for SparkDateTimeException using the Spark 3.3+ Map-based constructor; exact copy of sql-plugin source
sql-plugin-shims/src/main/spark332db/scala/org/apache/spark/sql/rapids/shims/SparkUpgradeExceptionShims.scala Multi-version shim factory for SparkUpgradeException using Map(version->message) messageParameters; exact copy of the existing sql-plugin source
sql-plugin-shims/src/main/spark333/scala/com/nvidia/spark/rapids/shims/spark333/SparkShimServiceProvider.scala New SparkShimServiceProvider for Spark 3.3.3; faithful copy of sql-plugin counterpart
sql-plugin-shims/src/main/spark334/scala/com/nvidia/spark/rapids/shims/spark334/SparkShimServiceProvider.scala New SparkShimServiceProvider for Spark 3.3.4; faithful copy of sql-plugin counterpart
sql-plugin-shims/src/main/spark334/scala/org/apache/spark/sql/rapids/shims/SequenceSizeTooLongUnsuccessfulErrorBuilder.scala Multi-version trait for sequence-size error messages covering spark334, 342-344, 351-358 (intentionally skips 340/341); exact copy of sql-plugin source with copyright year updated to 2026
sql-plugin-shims/src/main/spark340/scala/com/nvidia/spark/rapids/shims/spark340/SparkShimServiceProvider.scala New SparkShimServiceProvider for Spark 3.4.0; faithful copy with correct SparkShimVersion(3,4,0)
sql-plugin-shims/src/main/spark340/scala/org/apache/spark/sql/rapids/shims/OriginContextShim.scala Multi-version OriginContextShim for Spark 3.4.x–3.5.x handling SQLQueryContext vs wider QueryContext; null return on non-SQLQueryContext match is intentional and consistent with sql-plugin source
sql-plugin-shims/src/main/spark341/scala/com/nvidia/spark/rapids/shims/spark341/SparkShimServiceProvider.scala New SparkShimServiceProvider for Spark 3.4.1; correct version and override matchesVersion
sql-plugin-shims/src/main/spark341db/scala/com/nvidia/spark/rapids/shims/spark341db/SparkShimServiceProvider.scala New SparkShimServiceProvider for Databricks 13.3 (Spark 3.4.1 DB); correct DatabricksShimVersion(3,4,1,"13.3") and delegation to DatabricksShimServiceProvider
sql-plugin-shims/src/main/spark342/scala/com/nvidia/spark/rapids/shims/spark342/SparkShimServiceProvider.scala New SparkShimServiceProvider for Spark 3.4.2; correct version constant and override modifier
sql-plugin-shims/src/main/spark343/scala/com/nvidia/spark/rapids/shims/spark343/SparkShimServiceProvider.scala New SparkShimServiceProvider for Spark 3.4.3; correct version constant and override modifier
sql-plugin-shims/src/main/spark344/scala/com/nvidia/spark/rapids/shims/spark344/SparkShimServiceProvider.scala New SparkShimServiceProvider for Spark 3.4.4; correct version constant and override modifier

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[sql-plugin-shims module] --> B[spark331 / spark332 / spark333 / spark334]
    A --> C[spark332db]
    A --> D[spark340 / spark341 / spark341db]
    A --> E[spark342 / spark343 / spark344]

    B --> B1[SparkShimServiceProvider\nSparkShimVersion 3.3.x]
    C --> C1[SparkShimServiceProvider\nDatabricksShimVersion 3.3.2 DB12.2]
    C --> C2[CreateDataSourceTableAsSelectRules\nmulti-version: 332db to 411]
    C --> C3[WriteFilesExecShims\nmulti-version: 332db to 411]
    C --> C4[SparkDateTimeExceptionShims\nmulti-version: 332db to 411]
    C --> C5[SparkUpgradeExceptionShims\nmulti-version: 332db to 411]
    D --> D1[SparkShimServiceProvider\nSparkShimVersion 3.4.0/3.4.1\nor DatabricksShimVersion DB13.3]
    D --> D2[OriginContextShim\nmulti-version: 340 to 358\nSQLQueryContext typed]
    E --> E1[SparkShimServiceProvider\nSparkShimVersion 3.4.2-3.4.4]

    B1 -.->|also covers via spark334| F[SequenceSizeTooLongUnsuccessfulErrorBuilder\n334 / 342-344 / 351-358]
    E1 -.-> F
Loading

Reviews (1): Last reviewed commit: "Add SQL shim module sources for Spark 3...." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants