Skip to content

Adapt core execution callers to columnar helpers#15051

Open
gerashegalov wants to merge 6 commits into
codex/unshim-stack-02n-shuffle-spill-callersfrom
codex/unshim-stack-02o-core-execution-callers
Open

Adapt core execution callers to columnar helpers#15051
gerashegalov wants to merge 6 commits into
codex/unshim-stack-02n-shuffle-spill-callersfrom
codex/unshim-stack-02o-core-execution-callers

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It updates core execution callers to use the moved columnar helpers. This keeps the central execution-path adaptations in one reviewable layer.

Stack context

Testing and validation notes

  • No standalone behavior change is intended in this layer. It is covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025 and the existing tests for the affected subsystem.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02o-core-execution-callers branch from 9792db2 to fb9aa40 Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from 739aece to e0350c2 Compare June 10, 2026 21:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02o-core-execution-callers branch from fb9aa40 to 14eaeb5 Compare June 10, 2026 21:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from e0350c2 to afac240 Compare June 10, 2026 21:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02o-core-execution-callers branch 2 times, most recently from 830cf4e to b6bb8fe Compare June 10, 2026 21:36
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from 2b813f5 to ef56038 Compare June 10, 2026 22:20
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02o-core-execution-callers branch 3 times, most recently from c7d443e to b8602d4 Compare June 10, 2026 22:41
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from 05bae3b to 3cfcc3a Compare June 10, 2026 22:41
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02o-core-execution-callers branch from b8602d4 to b0f5b13 Compare June 10, 2026 22:46
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from 2db1b2a to 733c9d0 Compare June 10, 2026 23:12
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02o-core-execution-callers branch 2 times, most recently from 05f51a5 to cac80aa Compare June 10, 2026 23:15
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from 4150e69 to ea52688 Compare June 10, 2026 23:33
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02o-core-execution-callers branch 2 times, most recently from 43e499a to 94fc626 Compare June 10, 2026 23:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from ea52688 to 762a24f Compare June 10, 2026 23:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02o-core-execution-callers branch 2 times, most recently from eb42a47 to 916ddb5 Compare June 11, 2026 00:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from 6f6f3d3 to ad5eccd Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02o-core-execution-callers branch from 0434223 to 98459fb Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02o-core-execution-callers branch from 98459fb to 8b9f3f0 Compare June 13, 2026 12:20
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:49
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This is one reviewable layer in the unshim stack (introduced by #15025). It adapts core execution callers in sql-plugin to use class definitions that have been moved to the sql-plugin-columnar Java module in previous layers.

  • case classclass migration: Dozens of Scala case classes (GpuSortEachBatchIterator, BatchedByKey, BatchesToCoalesce, GpuHashAggregateMetrics, AcquireFailed, JoinInfo, OutOfCoreBatch, etc.) are converted to plain classes with explicit val fields and Serializable markers, mirroring Java-friendly definitions now living in sql-plugin-columnar.
  • Logging trait removal: org.apache.spark.internal.Logging is removed from ~20 classes/objects and replaced with per-class SLF4J loggers and private logXxx wrappers, eliminating the dependency on Spark's internal logging API.
  • New ShimDataWritingCommand trait: Consolidates the run()runColumnar() dispatch into a shared shim in TreeNode.scala, removing the duplicate implementation from GpuDataWritingCommand; callers now go through runColumnarFromAny for cross-module type safety.

Confidence Score: 4/5

The PR is a clean structural refactoring with no intended behavior change; the mechanical case-class-to-class conversions are correctly applied across all call sites.

Two minor concerns keep this from a perfect score: the reflection-based SparkPlan.session lookup loses compile-time safety compared to the shimmed approach, and the assert(batches.isEmpty) in ShimDataWritingCommand.run() silently swallows unexpected results when JVM assertions are disabled. Neither should affect current behavior since the stack is validated end-to-end, but both could mask future regressions.

GpuExec.scala (new reflection path for sessionFromPlan) and shims/TreeNode.scala (new ShimDataWritingCommand trait with the assertion).

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuExec.scala Replaces shimmed SparkSessionUtils.sessionFromPlan with reflection-based lookup; imports adjusted to use org.apache.spark.sql.SparkSession directly.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/shims/TreeNode.scala Adds new ShimDataWritingCommand trait consolidating run() and runColumnarFromAny() dispatch; cleanly extracted from GpuDataWritingCommand.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala Converts case class/case object types to plain classes and removes Logging mixin; callers updated from AggregateModeInfo(...) apply to AggregateModeInfo.from(...) (Java factory defined in sql-plugin-columnar).
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RmmRapidsRetryIterator.scala Removes Logging mixin; switches to direct SLF4J logger; AutoCloseableTargetSize case class removed (now in sql-plugin-columnar as Java class); callers updated to new constructor.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala GpuSortEachBatchIterator and GpuOutOfCoreSortIterator converted from case class to class; default metric parameters removed, all call sites updated; TrampolineUtil.fromAttributesDataTypeUtilsShim.fromAttributes.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCoalesceBatches.scala BatchedByKey converted from case class to class with explicit Product implementation; BatchesToCoalesce similarly converted; Logging removed and SLF4J added.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSemaphore.scala AcquireFailed converted from case class to plain class; GpuTaskMemoryEstimator TIME_WINDOW constant inlined; SemaphoreTaskInfo and GpuSemaphore remove Logging mixin with SLF4J replacements.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuDataWritingCommandExec.scala GpuDataWritingCommand now extends ShimDataWritingCommand instead of duplicating run(); dispatch via runColumnarFromAny.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRunnableCommandExec.scala Adds asInstanceOf[GpuSparkSession] cast because GpuExec.sparkSession now returns org.apache.spark.sql.SparkSession; safe at runtime since classic.SparkSession extends SparkSession in Spark 4.x.
tests/src/test/scala/com/nvidia/spark/rapids/GpuSortRetrySuite.scala Tests updated to use explicit new GpuSortEachBatchIterator(...) with all required NoopMetric arguments now that default parameters are gone.

Sequence Diagram

sequenceDiagram
    participant GpuDataWritingCommandExec
    participant GpuDataWritingCommand
    participant ShimDataWritingCommand
    participant ColumnarHelpers as sql-plugin-columnar (Java)

    Note over GpuDataWritingCommand,ShimDataWritingCommand: ShimDataWritingCommand new trait in TreeNode.scala

    GpuDataWritingCommandExec->>GpuDataWritingCommand: runColumnarFromAny(sparkSession, child)
    GpuDataWritingCommand->>ShimDataWritingCommand: cast AnyRef to SparkSession + runColumnar()
    ShimDataWritingCommand->>GpuDataWritingCommand: runColumnar(SparkSession, SparkPlan)
    GpuDataWritingCommand-->>GpuDataWritingCommandExec: Seq[ColumnarBatch]

    Note over ColumnarHelpers: Moved definitions
    GpuDataWritingCommandExec->>ColumnarHelpers: new HostAllocResult
    GpuDataWritingCommandExec->>ColumnarHelpers: AggregateModeInfo.from
    GpuDataWritingCommandExec->>ColumnarHelpers: new AutoCloseableTargetSize
Loading

Reviews (1): Last reviewed commit: "Update Spark 4.1 tool metadata" | Re-trigger Greptile

Comment on lines 133 to +139

object GpuExec {
@transient private[this] lazy val sparkPlanSessionMethod =
classOf[SparkPlan].getMethod("session")

def sessionFromPlan(plan: SparkPlan): SparkSession =
sparkPlanSessionMethod.invoke(plan).asInstanceOf[SparkSession]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Bare reflection lookup without NoSuchMethodException handling

classOf[SparkPlan].getMethod("session") is called inside a lazy val, so any NoSuchMethodException is deferred until the first access of sparkSession and surfaces as an opaque failure. The previous shimmed path (SparkSessionUtils.sessionFromPlan) called plan.session with full Scala type-safety; here we lose that at the cost of no visible error if the public method name changes or its visibility tightens in a future Spark version. Consider wrapping in a Try with a meaningful error message, or retaining a comment explaining why reflection is intentionally used instead of the shim.

Comment on lines +83 to +88
override def run(sparkSession: SparkSession, child: SparkPlan): Seq[org.apache.spark.sql.Row] = {
com.nvidia.spark.rapids.Arm.withResource(runColumnar(sparkSession, child)) { batches =>
assert(batches.isEmpty)
}
Seq.empty[org.apache.spark.sql.Row]
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent batch discard when JVM assertions are disabled

assert(batches.isEmpty) is a JVM assertion that can be suppressed with -da. When disabled, non-empty batches returned by runColumnar are silently closed and discarded without any log message or exception. This same pattern existed in the old GpuDataWritingCommand.run(), but moving it here makes it the default for all ShimDataWritingCommand implementors. A plain if (!batches.isEmpty) throw new IllegalStateException(...) (or at minimum a logError) would make the failure visible in production.

@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile encountered an error while reviewing this PR. Please reach out to support@greptile.com for assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant