Skip to content

Adapt shuffle and spill callers to Java helpers#15050

Open
gerashegalov wants to merge 2 commits into
codex/unshim-stack-02m-kudo-file-scansfrom
codex/unshim-stack-02n-shuffle-spill-callers
Open

Adapt shuffle and spill callers to Java helpers#15050
gerashegalov wants to merge 2 commits into
codex/unshim-stack-02m-kudo-file-scansfrom
codex/unshim-stack-02n-shuffle-spill-callers

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It updates shuffle and spill callers to use the moved Java-friendly helper modules. This isolates the shuffle/spill call-site migration from unrelated execution changes.

Stack context

Testing and validation notes

  • No standalone behavior change is intended in this layer. It is covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025 and the existing tests for the affected subsystem.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02m-kudo-file-scans branch from 0850c48 to df44354 Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch 3 times, most recently from e0350c2 to afac240 Compare June 10, 2026 21:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02m-kudo-file-scans branch 2 times, most recently from 8eeb39d to 8717459 Compare June 10, 2026 21:36
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch 2 times, most recently from 2b813f5 to ef56038 Compare June 10, 2026 22:20
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02m-kudo-file-scans branch from 1ffdf02 to 9fb1bad Compare June 10, 2026 22:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from ef56038 to 05bae3b Compare June 10, 2026 22:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02m-kudo-file-scans branch from 9fb1bad to 78de8f8 Compare June 10, 2026 22:41
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from 05bae3b to 3cfcc3a Compare June 10, 2026 22:41
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02m-kudo-file-scans branch from 78de8f8 to d617692 Compare June 10, 2026 22:46
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from 3cfcc3a to 2db1b2a Compare June 10, 2026 22:46
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02m-kudo-file-scans branch from d617692 to f5e488f Compare June 10, 2026 23:12
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch 2 times, most recently from 733c9d0 to 4150e69 Compare June 10, 2026 23:15
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02m-kudo-file-scans branch from f5e488f to a206a28 Compare June 10, 2026 23:15
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from 4150e69 to ea52688 Compare June 10, 2026 23:33
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02m-kudo-file-scans branch from a206a28 to eb2e327 Compare June 10, 2026 23:33
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from ea52688 to 762a24f Compare June 10, 2026 23:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02m-kudo-file-scans branch from 8cd5d42 to df09b9c Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02n-shuffle-spill-callers branch from 6f6f3d3 to ad5eccd Compare June 13, 2026 12:13
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:49
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR migrates shuffle and spill callers from Scala case classes and Spark's internal Logging trait to Java-friendly helpers, enabling Java interoperability in the broader unshim stack. It is a purely mechanical refactor with no intended behavioral changes.

  • Converts case class definitions to explicit class with val fields and extends Serializable, replacing Scala-generated apply/unapply/equals/hashCode with new construction and type-pattern matching throughout.
  • Replaces org.apache.spark.internal.Logging with inline SLF4J loggers (org.slf4j.LoggerFactory) across 20+ files, and adds buffer.asInstanceOf[Buffer].flip/clear/rewind/limit() casts to fix Java 9+ covariant ByteBuffer return types.

Confidence Score: 4/5

Safe to merge; all changes are mechanical caller-side adaptations with no behavioral intent, backed by a full tree-equivalence build check.

The entire PR is a structural migration: case class → class, Spark Logging → SLF4J, and ByteBuffer covariant-return casts. The pattern-match rewrites (extractor patterns to typed patterns) are semantically identical. The only finding is two logWarning helpers that miss the isWarnEnabled guard present elsewhere in the same PR, causing by-name msg lambdas to be evaluated even when warn logging is off — a minor consistency gap, not a correctness issue.

SpillFramework.scala and SpillablePartialFileHandle.scala have the logging guard inconsistency noted above; all other files look clean.

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/io/async/AsyncRunners.scala Converts state sealed-trait implementations (Init, ScheduleFailed, ExecFailed, Closed, HostResource, AcquireSuccessful, AcquireExcepted) from case classes to plain classes with Serializable; pattern matching updated to typed patterns (_: T) throughout.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/io/async/ResourceBoundedThreadExecutor.scala Replaces Spark Logging with SLF4J; updates all state construction sites (new ExecFailed, new Init, new Closed, etc.) and corrects pattern match from extractor patterns to typed patterns.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala Replaces Logging with SLF4J across multiple classes/traits; fixes ByteBuffer.flip() cast to Buffer.flip(); moves HostByteBufferIterator import from internal to public package. Minor: SpillableHandle.logWarning missing isWarnEnabled guard.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillablePartialFileHandle.scala Replaces Logging with SLF4J; fixes ByteBuffer.limit() to Buffer.limit() in two write paths. Minor: logWarning missing isWarnEnabled guard.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/shuffle/RapidsShuffleClient.scala PendingTransferRequest converted from case class to class with Serializable; Logging replaced with SLF4J; construction site updated to new PendingTransferRequest(...).
sql-plugin/src/main/scala/com/nvidia/spark/rapids/shuffle/WindowedBlockIterator.scala BlockWithSize trait and BlockRange case class removed (moved to Java helpers); BlockWindow, BlockWithOffset, BlocksForWindow converted to private classes; all construction sites updated to new T(...).
sql-plugin/src/main/scala/com/nvidia/spark/rapids/shuffle/RapidsShuffleTransport.scala TransactionStats case class removed; Logging replaced with SLF4J; multiple ByteBuffer covariant-return casts added (rewind, clear, limit) for Java 9+ compatibility.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/MultithreadedShuffleBufferCatalog.scala PartitionSegment converted from case class to class; ByteBuffer.flip() fixed to Buffer.flip(); Logging replaced with SLF4J; java.nio.Buffer added to imports.
sql-plugin/src/main/scala/org/apache/spark/shuffle/sort/io/RapidsLocalDiskShuffleMapOutputWriter.scala Logging replaced with @transient lazy SLF4J logger, correctly avoiding serialization of the logger field.
sql-plugin/src/main/scala/org/apache/spark/storage/RapidsShuffleBlockFetcherIterator.scala Logging replaced with inline SLF4J logger with full suite of helpers (logInfo, logWarning, logDebug, logTrace, logError with and without Throwable).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Before["Before (Scala case classes + Spark Logging)"]
        A1["case class PendingTransferRequest\n(auto equals/hashCode/unapply)"]
        A2["case class BlockRange\n(in WindowedBlockIterator.scala)"]
        A3["case class CompressedTable\n(in TableCompressionCodec.scala)"]
        A4["extends Logging\n(org.apache.spark.internal.Logging)"]
        A5["ByteBuffer.flip()\nReturns Buffer (Java 8)\nReturns ByteBuffer (Java 9+)"]
    end

    subgraph After["After (Java-friendly helpers + SLF4J)"]
        B1["class PendingTransferRequest\nextends Serializable\n(val fields, new keyword)"]
        B2["class BlockRange\n(moved to Java helper, same package)"]
        B3["class CompressedTable\n(moved to Java helper)"]
        B4["private val log = SLF4J\nprivate def logX(msg: => String)"]
        B5["buffer.asInstanceOf[Buffer].flip()\nExplicit cast for covariant return"]
    end

    A1 -->|"case class → class"| B1
    A2 -->|"moved to Java"| B2
    A3 -->|"moved to Java"| B3
    A4 -->|"replaced with"| B4
    A5 -->|"fixed for Java 9+"| B5
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    subgraph Before["Before (Scala case classes + Spark Logging)"]
        A1["case class PendingTransferRequest\n(auto equals/hashCode/unapply)"]
        A2["case class BlockRange\n(in WindowedBlockIterator.scala)"]
        A3["case class CompressedTable\n(in TableCompressionCodec.scala)"]
        A4["extends Logging\n(org.apache.spark.internal.Logging)"]
        A5["ByteBuffer.flip()\nReturns Buffer (Java 8)\nReturns ByteBuffer (Java 9+)"]
    end

    subgraph After["After (Java-friendly helpers + SLF4J)"]
        B1["class PendingTransferRequest\nextends Serializable\n(val fields, new keyword)"]
        B2["class BlockRange\n(moved to Java helper, same package)"]
        B3["class CompressedTable\n(moved to Java helper)"]
        B4["private val log = SLF4J\nprivate def logX(msg: => String)"]
        B5["buffer.asInstanceOf[Buffer].flip()\nExplicit cast for covariant return"]
    end

    A1 -->|"case class → class"| B1
    A2 -->|"moved to Java"| B2
    A3 -->|"moved to Java"| B3
    A4 -->|"replaced with"| B4
    A5 -->|"fixed for Java 9+"| B5
Loading

Reviews (1): Last reviewed commit: "Fix shuffle test helper construction" | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant