Skip to content

Remove old Spark 3.3 DBR shim sources#15053

Open
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02s-shim-cleanup-330from
codex/unshim-stack-02t-shim-cleanup-330-dbr
Open

Remove old Spark 3.3 DBR shim sources#15053
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02s-shim-cleanup-330from
codex/unshim-stack-02t-shim-cleanup-330-dbr

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It removes old Spark 3.3 DBR shim sources as a separate cleanup layer. DBR changes are intentionally isolated because they are not locally buildable and tend to be review outliers.

Stack context

Testing and validation notes

  • No standalone DBR validation was run locally because Databricks shims require proprietary Databricks nodes/images. Keeping this cleanup isolated makes Databricks-node validation targeted.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch 2 times, most recently from c0d071d to c6d9166 Compare June 10, 2026 21:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch 2 times, most recently from d54de00 to fad5d01 Compare June 10, 2026 21:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch 3 times, most recently from b930032 to 5febf4b Compare June 10, 2026 22:20
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from 499dac2 to 04da28c Compare June 10, 2026 22:20
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from 5febf4b to 7bb6083 Compare June 10, 2026 22:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from 04da28c to 5c9d602 Compare June 10, 2026 22:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from 7bb6083 to 796810e Compare June 10, 2026 22:41
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch 2 times, most recently from 6828be5 to bb0cf14 Compare June 10, 2026 22:46
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from 796810e to 2663bb3 Compare June 10, 2026 22:46
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from bb0cf14 to b2a7a9e Compare June 10, 2026 22:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch 2 times, most recently from d746dd5 to cd226f8 Compare June 10, 2026 23:12
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from b2a7a9e to 8bb1ce8 Compare June 10, 2026 23:12
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from cd226f8 to e4d544e Compare June 10, 2026 23:15
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from 8bb1ce8 to e23faf4 Compare June 10, 2026 23:15
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from e4d544e to 7ee72ed Compare June 10, 2026 23:29
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from e23faf4 to 77b338e Compare June 10, 2026 23:29
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from 7ee72ed to 3d5e85e Compare June 10, 2026 23:33
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch 2 times, most recently from 2bc489a to 52ebd8c Compare June 10, 2026 23:48
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch 2 times, most recently from 47cc5a9 to c2f0e73 Compare June 10, 2026 23:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from 52ebd8c to e8921a0 Compare June 10, 2026 23:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from c2f0e73 to 3c2a442 Compare June 11, 2026 00:25
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from e8921a0 to 0ac124a Compare June 11, 2026 00:25
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from 3c2a442 to f27a5af Compare June 11, 2026 00:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from 0ac124a to 9965c09 Compare June 11, 2026 00:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from f27a5af to df68682 Compare June 11, 2026 00:51
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch 2 times, most recently from b0a7d69 to 9f4924d Compare June 11, 2026 01:18
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch 2 times, most recently from 9d8e2a1 to d8581c4 Compare June 11, 2026 01:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from 9f4924d to 101e6cd Compare June 11, 2026 01:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from d8581c4 to 9973398 Compare June 11, 2026 01:43
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from 101e6cd to a0d26e1 Compare June 11, 2026 01:43
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from 9973398 to 9bb4096 Compare June 11, 2026 01:58
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch 2 times, most recently from f0a54b4 to 36f44d8 Compare June 11, 2026 02:26
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch 2 times, most recently from 903afa7 to 0b03989 Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from 36f44d8 to 0e24722 Compare June 13, 2026 12:13
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02t-shim-cleanup-330-dbr branch from 0b03989 to 9103bb7 Compare June 13, 2026 12:20
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02s-shim-cleanup-330 branch from 0e24722 to 4ec5b87 Compare June 13, 2026 12:20
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:49
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR removes several old Spark 3.3 DBR shim source files from sql-plugin/src/main/spark330db/ as one isolated layer of the broader unshim stack (#15025). It also introduces a new consolidated CheckOverflowInTableInsertShims.scala covering all supported Spark versions, and applies minor cleanups to Spark330PlusDBShims, SparkShims, GpuGroupedPythonRunnerFactory, and arithmetic.scala.

  • Deleted files that covered other DB versions: DatabricksShimServiceProvider.scala had a shim-json header listing 332db, 341db, 350db143, and 400db173 in addition to 330db; its removal leaves those four DB builds referencing a non-existent object. Similarly, the deleted OriginContextShim.scala covered both 330db and 332db, leaving 332db without an implementation for OriginContextShim.
  • New shared shim: CheckOverflowInTableInsertShims.scala cleanly consolidates the CheckOverflowInTableInsert expr rule across the full supported version range; the merge order in Spark330PlusDBShims.getExprs is correct.
  • GpuGroupedPythonRunnerFactory: Changed from case class to class + Serializable and the default argNames = None parameter was dropped; both known callers already pass argNames explicitly so no breakage at those sites.

Confidence Score: 3/5

The changes to modified files are clean, but two deleted files had shim-json coverage extending to other DB versions still active in the tree; those builds reference objects that no longer exist anywhere in the codebase.

The original DatabricksShimServiceProvider.scala was the sole definition of the DatabricksShimServiceProvider object for 332db, 341db, 350db143, and 400db173. All four of those service providers still call DatabricksShimServiceProvider.matchesVersion(...) in the working tree, pointing to a compilation gap if this layer is merged independently. The OriginContextShim deletion poses the same risk for 332db. The PR author notes that DBR validation was not run locally and frames this as a stack layer only valid when the full stack is applied together, which limits confidence when reviewing this diff in isolation.

The two deleted files — DatabricksShimServiceProvider.scala and OriginContextShim.scala — are the most important to scrutinize; a second look should confirm that a prior or subsequent stack layer supplies their replacements for 332db and other affected DB variants.

Important Files Changed

Filename Overview
sql-plugin/src/main/spark330db/scala/com/nvidia/spark/rapids/DatabricksShimServiceProvider.scala Deleted — the object was the sole definition of DatabricksShimServiceProvider and its shim-json covered 332db/341db/350db143/400db173; those DB service providers still reference it and cannot compile without a replacement
sql-plugin/src/main/spark330db/scala/org/apache/spark/sql/rapids/shims/OriginContextShim.scala Deleted — covered both 330db and 332db; no replacement for 332db exists in the working tree, leaving GpuCast and mathExpressions without an OriginContextShim for that build
sql-plugin/src/main/spark330db/scala/com/nvidia/spark/rapids/shims/CheckOverflowInTableInsertShims.scala New shared shim file consolidating CheckOverflowInTableInsert expr rule across 330db through 411; moves the inline definition out of Spark330PlusDBShims cleanly
sql-plugin/src/main/spark330db/scala/com/nvidia/spark/rapids/shims/Spark330PlusDBShims.scala Removes inline CheckOverflowInTableInsert rule and delegates to CheckOverflowInTableInsertShims.exprs; merge order preserved correctly
sql-plugin/src/main/spark330db/scala/com/nvidia/spark/rapids/shims/SparkShims.scala Switches from GpuOverrides.dataWriteCmd to GpuOverrides.dataWriteCmdFromShim + CreateDataSourceTableAsSelectRules; removes now-unused CreateDataSourceTableAsSelectCommand import
sql-plugin/src/main/spark330db/scala/org/apache/spark/sql/rapids/execution/python/shims/GpuGroupedPythonRunnerFactory.scala Changed from case class to class + Serializable; removes default argNames=None — both callers already pass argNames explicitly, so no breakage
sql-plugin/src/main/spark330db/scala/org/apache/spark/sql/rapids/arithmetic.scala Removes unused Logging mixin and import from GpuDecimalRemainder; no log calls remain in the class body
sql-plugin/src/main/spark330db/scala/org/apache/spark/sql/rapids/shims/SparkDateTimeExceptionShims.scala Deleted — was 330db-only; assumes replacement exists elsewhere in the stack for any remaining callers
sql-plugin/src/main/spark330db/scala/org/apache/spark/sql/rapids/shims/SparkUpgradeExceptionShims.scala Deleted — was 330db-only; assumes replacement exists elsewhere in the stack for any remaining callers
sql-plugin/src/main/spark330db/scala/com/nvidia/spark/rapids/shims/Spark321PlusDBShims.scala Minor style change: Nil replaced with List.empty in InputCheck; no functional impact
sql-plugin/src/main/spark330db/scala/com/nvidia/spark/rapids/shims/spark330db/SparkShimServiceProvider.scala Deleted — removes the service provider registration for the spark330db shim variant; correct for retiring 330db support as part of the unshim stack

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Deleted["Deleted from spark330db/"]
        A[DatabricksShimServiceProvider.scala\nshim-json: 330db,332db,341db,350db143,400db173]
        B[SparkShimServiceProvider.scala\nshim-json: 330db only]
        C[OriginContextShim.scala\nshim-json: 330db, 332db]
        D[SparkDateTimeExceptionShims.scala\nshim-json: 330db]
        E[SparkUpgradeExceptionShims.scala\nshim-json: 330db]
    end

    subgraph Added["Added to spark330db/"]
        F[CheckOverflowInTableInsertShims.scala\nshim-json: 330db,331,332,...,411]
    end

    subgraph Modified["Modified in spark330db/"]
        G[Spark330PlusDBShims.scala\nDelegates to CheckOverflowInTableInsertShims.exprs]
        H[SparkShims.scala\nUses dataWriteCmdFromShim]
        I[GpuGroupedPythonRunnerFactory.scala\ncase class → class + Serializable]
        J[arithmetic.scala\nRemoves Logging from GpuDecimalRemainder]
    end

    subgraph StillReferencing["Still reference deleted DatabricksShimServiceProvider"]
        K[spark332db/SparkShimServiceProvider]
        L[spark341db/SparkShimServiceProvider]
        M[spark350db143/SparkShimServiceProvider]
        N[spark400db173/SparkShimServiceProvider]
    end

    A -->|deleted, breaks| K
    A -->|deleted, breaks| L
    A -->|deleted, breaks| M
    A -->|deleted, breaks| N
    C -->|deleted, 332db now missing OriginContextShim| K
    F --> G
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    subgraph Deleted["Deleted from spark330db/"]
        A[DatabricksShimServiceProvider.scala\nshim-json: 330db,332db,341db,350db143,400db173]
        B[SparkShimServiceProvider.scala\nshim-json: 330db only]
        C[OriginContextShim.scala\nshim-json: 330db, 332db]
        D[SparkDateTimeExceptionShims.scala\nshim-json: 330db]
        E[SparkUpgradeExceptionShims.scala\nshim-json: 330db]
    end

    subgraph Added["Added to spark330db/"]
        F[CheckOverflowInTableInsertShims.scala\nshim-json: 330db,331,332,...,411]
    end

    subgraph Modified["Modified in spark330db/"]
        G[Spark330PlusDBShims.scala\nDelegates to CheckOverflowInTableInsertShims.exprs]
        H[SparkShims.scala\nUses dataWriteCmdFromShim]
        I[GpuGroupedPythonRunnerFactory.scala\ncase class → class + Serializable]
        J[arithmetic.scala\nRemoves Logging from GpuDecimalRemainder]
    end

    subgraph StillReferencing["Still reference deleted DatabricksShimServiceProvider"]
        K[spark332db/SparkShimServiceProvider]
        L[spark341db/SparkShimServiceProvider]
        M[spark350db143/SparkShimServiceProvider]
        N[spark400db173/SparkShimServiceProvider]
    end

    A -->|deleted, breaks| K
    A -->|deleted, breaks| L
    A -->|deleted, breaks| M
    A -->|deleted, breaks| N
    C -->|deleted, 332db now missing OriginContextShim| K
    F --> G
Loading

Comments Outside Diff (2)

  1. sql-plugin/src/main/spark330db/scala/com/nvidia/spark/rapids/DatabricksShimServiceProvider.scala

    P1 DatabricksShimServiceProvider deleted but still referenced by remaining DB shim service providers

    The deleted file carried a shim-json header listing 330db, 332db, 341db, 350db143, and 400db173, meaning it provided the matchesVersion logic for all of those DB builds. After this deletion, a grep of the working tree shows that spark332db, spark341db, spark350db143, and spark400db173 each still call DatabricksShimServiceProvider.matchesVersion(...) in their own SparkShimServiceProvider.scala, but the object definition no longer exists anywhere in the codebase. Attempting to compile any of those four DB shim variants in isolation would produce a "not found: value DatabricksShimServiceProvider" error. The PR notes that no local DBR validation was run; if this layer is applied independently (e.g., cherry-picked or merged before the next stack layer that supplies a replacement), those DB builds will be broken.

  2. sql-plugin/src/main/spark330db/scala/org/apache/spark/sql/rapids/shims/OriginContextShim.scala

    P1 OriginContextShim removal leaves 332db without an implementation

    The deleted file covered both {"spark": "330db"} and {"spark": "332db"}. The surviving OriginContextShim.scala lives in spark340/ and covers 340358 (including 341db and 350db143), but 332db is not in its shim-json list. GpuCast.scala and mathExpressions.scala in the shared scala/ source root both reference OriginContextShim, so 332db builds that pull in those files would also fail to compile after this deletion. Worth confirming that a previous stack layer already supplies an OriginContextShim for 332db, or that the next layer does so.

Reviews (1): Last reviewed commit: "Remove old Spark 3.3 DBR shim sources" | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants