Minimize the amount of version-specific classes [databricks]#14834
Closed
gerashegalov wants to merge 4 commits into
Closed
Minimize the amount of version-specific classes [databricks]#14834gerashegalov wants to merge 4 commits into
gerashegalov wants to merge 4 commits into
Conversation
Collaborator
Author
|
build |
Collaborator
|
NOTE: release/26.06 has been created from main. Please retarget your PR to release/26.06 if it should be included in the release. |
3 tasks
pxLi
reviewed
May 29, 2026
| <parent> | ||
| <groupId>com.nvidia</groupId> | ||
| <artifactId>rapids-4-spark-shim-deps-parent_2.13</artifactId> | ||
| <version>26.06.0-SNAPSHOT</version> |
Member
There was a problem hiding this comment.
Since this is still in draft, just a reminder:
If this one targets main, please update the all versions to 26.08. Thanks
Collaborator
Author
|
build |
2 similar comments
Collaborator
Author
|
build |
Collaborator
Author
|
build |
c4dd209 to
ca09fdc
Compare
Collaborator
Author
|
build |
3e6e9fe to
3cf1779
Compare
This was referenced Jun 10, 2026
Collaborator
Author
|
Official GitHub stacked PRs are now enabled, but GitHub stacks cannot include fork-head PRs. I created the stack from upstream branch refs instead:
This PR is now the old fork-head record for the same top branch tip and should no longer be the primary review target. |
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
aabd751 to
6ebd35c
Compare
3cf1779 to
c99cc88
Compare
6ebd35c to
f608c12
Compare
This was referenced Jun 10, 2026
This was referenced Jun 10, 2026
2db9df3 to
e557937
Compare
Collaborator
Author
|
This big PR was split into Stack #15054 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR continues isolating Spark-version-specific shim bytecode from the common
RAPIDS SQL plugin artifact so more classes can live in the conventional root jar
layout instead of the parallel-world layout.
The final approach keeps the existing Maven module layout. It does not add a new
module in this PR, and it does not use reflection to force bytecode identity.
Instead, packaging now treats bitwise-identical common classes as unshimmed by
default and keeps only explicit exceptions shimmed.
Review Stack
This branch has been reconstructed from the final diff against current
NVIDIA/maininto four logical review layers. The old monolithic branch tip ispreserved at
gerashegalov:codex/isolate-sql-plugin-shims-original.Native
gh-stackmetadata could not be created because GitHub reported:The reviewable branch stack is:
Packaging/default-unshim flow
Branch:
gerashegalov:codex/unshim-stack-01-packagingDiff: main...gerashegalov:spark-rapids:codex/unshim-stack-01-packaging
Adds the default common-class root promotion flow, keep-list support,
analyzer diagnostics, private build-info root-resource promotion, buildall fast-path support, and shim documentation.
SQL plugin helper module reshaping
Branch:
gerashegalov:codex/unshim-stack-02-sql-plugin-modulesDiff: gerashegalov/spark-rapids@codex/unshim-stack-01-packaging...codex/unshim-stack-02-sql-plugin-modules
Moves Java-only helper surfaces into dedicated Java modules and updates
SQL plugin, shuffle plugin, shims, tests, and Scala 2.13 build wiring.
Delta/Iceberg adaptation
Branch:
gerashegalov:codex/unshim-stack-03-delta-icebergDiff: gerashegalov/spark-rapids@codex/unshim-stack-02-sql-plugin-modules...codex/unshim-stack-03-delta-iceberg
Adapts Delta Lake and Iceberg integration code to the shared helper layout.
UDF/docs cleanup
Branch:
gerashegalov:codex/unshim-stack-04-udf-docsDiff: gerashegalov/spark-rapids@codex/unshim-stack-03-delta-iceberg...codex/unshim-stack-04-udf-docs
Updates the UDF compiler and related documentation for the shared helper
layout.
Ultimate Approach
The migration now uses these mechanisms:
Promote bitwise-identical common classes by default.
dist/scripts/binary-dedupe.shalready computes which files have a singlechecksum across the selected Spark shims. Common
spark-sharedclass filesfrom that proven-identical set are now promoted into the root jar layout by
default.
This inverts the old maintenance model. New common classes should normally
remain unshimmed automatically as long as binary dedupe proves they are
identical across shims.
Keep a small explicit exclusion list.
dist/keep-in-spark-shared.txtis the exception list for bitwise-identicalcommon classes that still must remain in
spark-sharedfor compatibility orpackaging reasons. It is intentionally empty today.
Keep resource and per-shim root lists narrow.
dist/unshimmed-common-from-single-shim.txtnow contains only root-layoutresources that are not selected by default class promotion, currently
META-INFfiles, thespark-rapids-privatebuild-info resource, and Python worker files.dist/unshimmed-from-each-spark3xx.txtremains the mechanism for per-shimroot artifacts. Those are not common
spark-sharedclass files and are notreplaced by the default common-class promotion.
Retain verification and diagnostics.
Binary dedupe still verifies that root-layout classes requiring shared
identity are bitwise-identical across shims. The dependency analyzer now
prints diagnostic output and writes
root-safe-spark-shared.txt, but it isnot the gate for default class promotion. The gate is binary identity.
Normalize small shim implementations only when source changes are clean.
When a class is semantically common but bytecode differs because Spark changed
a helper signature, inherited trait shape, or constant value, the source is
moved to common code only if the version-specific part can be expressed with
stable public APIs or local
VersionUtilschecks.Recent examples include:
BridgeUnsafeProjectionCodegen: replaced Spark'sCodeGeneratorWithInterpretedFallbackdependency with localcodegen-then-interpreted fallback logic.
GpuPythonFunction: uses Spark's stablePythonFuncExpression/sqlrendering path instead of calling the version-sensitive
toPrettySQLhelper directly.
BloomFilterConstantsShims: common object with a runtime Spark-versionpredicate for the bloom filter format version.
ArrayInvalidArgumentErrorUtils: common trait that constructs the stableSpark runtime exception shape directly for the changed length-error API.
DecimalMultiply128: common helper with the existing Spark/DB versionpredicate selecting the correct JNI overload.
CastTimeToIntShim: common helper with a direct call fromGpuCast; theprevious reflective lookup was removed.
GetJsonObjectShim: common helper with a Spark 4.x predicate for the JSONpath quoted-name regexp.
Classes remain in
spark-sharedorsparkXYZwhen they require Spark APIs thatare absent in another supported Spark line, or when commonizing them would
require reflection or a broader bridge design.
Quantification
Source-level shim code, defined as tracked
.scalaand.javafiles under productionsrc/main/spark*source roots:NVIDIA/mainatb1eebc41a3e6e9fe95src/test/spark*shim source is unchanged at 149 files and 16,238 lines.The larger impact of this PR is binary placement: most classes that still come
from shim-built artifacts no longer stay in the shim classloader layout when
binary dedupe proves they are identical.
For the validated
330,358fast parallel-world assembly:spark-sharedsparkXYZspark330andspark358The same assembly selected 4,557 class files and 4 non-class resources for root
promotion. Local noSnapshots builds on this branch validate the full OSS shim matrix for both Scala lines, excluding Databricks shims which require proprietary Databricks build images:
spark-sharedclass entriessparkXYZclass entriessparkXYZclass pathsThe root class-entry counts include copied JNI/UCX dependency classes. The
root-promoted common classescolumn is the plugin-side class promotion countfrom
default-unshimmed-spark-shared.txt.For comparison, current OSS nightly snapshot jars still carry most
classes in shim-loader layout:
spark-sharedentriessparkXYZentries26.08.0-20260605.172952-9-cuda1226.08.0-20260605.164124-9-cuda12These nightly counts are jar-entry counts.
sparkXYZcounts include one copyper Spark shim; the unique class-path column deduplicates those copies by
removing the leading
sparkXYZ/orspark-shared/prefix.Build-Time Iteration Support
For repeated unshim analysis,
build/buildallhas a cheaper fast path:In fast mode the build skips expensive Maven work, disables dist jar compression,
and ignores stale shim revision metadata caused by
rapids.build.info.skip=true.The revision mismatch is still printed for visibility.
Validation
Focused ScalaTests were rerun after fixing the CI-reported serialization failure:
MortgageSparkSuiteandMortgageAdaptiveSparkSuite: 12 tests passed.HashAggregateRetrySuite: 7 tests passed.The serialization fix makes
GpuHashAggregateMetricsserializable soGpuHashAggregateExec.internalDoExecuteColumnarno longer captures anon-serializable metrics holder in the Spark task closure.
Packaging validation for the inverted default-unshim logic:
Regression validation for the private build-info resource reported missing by CI:
Result:
330,358fast assembly produceddist/target/parallel-world/rapids4spark-private-version-info.propertiesat root, so root-loadedRapidsPluginUtilscan read the private dependency build metadata.root-safe-spark-shared.txtdiagnostics.Additional local checks:
bash -n dist/scripts/binary-dedupe.shbash -O extglob -n build/buildallpython3 -m py_compile dist/scripts/build-unshim-parallel-world.py dist/scripts/analyze-parallel-world-deps.pygit diff --checkFollow-Up Direction
The separate Java-only Maven module idea remains a follow-up cleanup path, not
this PR's primary mechanism. Good follow-up pilots are still:
com/nvidia/spark/rapids/format/*.java,GpuTypeShimsare untangled.
Checklists
Documentation
Testing
Performance