Skip to content

Move shared API and shuffle format helpers to Java modules#15040

Open
gerashegalov wants to merge 2 commits into
codex/unshim-stack-02-sql-plugin-modulesfrom
codex/unshim-stack-02a-api-format
Open

Move shared API and shuffle format helpers to Java modules#15040
gerashegalov wants to merge 2 commits into
codex/unshim-stack-02-sql-plugin-modulesfrom
codex/unshim-stack-02a-api-format

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It moves shared API, UCX, and shuffle-format helper code into Java-friendly module locations. This keeps cross-shim helper bytecode available from shared modules before the Hadoop/file I/O and columnar moves build on it.

Stack context

Testing and validation notes

  • No standalone behavior change is intended in this layer. It is covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025 and the existing tests for the affected subsystem.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@gerashegalov gerashegalov changed the base branch from codex/unshim-stack-01-packaging to codex/unshim-stack-02-sql-plugin-modules June 10, 2026 15:45
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02a-api-format branch from 99d64a2 to 84ed0c3 Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02-sql-plugin-modules branch from 24e3c48 to a5598d9 Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02-sql-plugin-modules branch from a5598d9 to b4ebb9d Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02a-api-format branch from 84ed0c3 to 85f0628 Compare June 13, 2026 12:13
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:48
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR is the third layer of a modular unshim stack refactor. It relocates shared helpers into more Java-accessible modules: FlatBuffers format classes move from sql-plugin to sql-plugin-format, HashedPriorityQueue and a new ThreadFactoryBuilder land in sql-plugin-api, and three UCX Scala case classes (Rkeys, UCXActiveMessage, UCXError) are rewritten as plain Java classes in shuffle-plugin.

  • UCX case-class → Java: Rkeys, UCXActiveMessage, and UCXError are converted from Scala case class to hand-written Java POJOs with explicit equals/hashCode/toString; all call-sites updated from companion-object apply to new.
  • sql-pluginsql-plugin-format: Ten FlatBuffers-generated classes plus TransferState move to the dedicated format module, gaining Apache license headers.
  • sql-pluginsql-plugin-api: HashedPriorityQueue is renamed in-place; ThreadFactoryBuilder is rewritten in Java (functionally equivalent to the deleted Scala version); a new ShimCommandRules.scala introduces typed shim-command rule wrappers.

Confidence Score: 4/5

This is a pure structural refactor with no intended behavioral changes; the Scala call-sites are mechanically updated and the Java replacements faithfully replicate the original logic.

The conversion is clean and well-scoped. Two minor issues stand out: Rkeys.java exposes scala.collection.Seq in its public API despite the PR's Java-friendliness goal, and UCXActiveMessage.toString inlines hex-format strings that duplicate UCX.formatAmIdAndHeader/TransportUtils.toHex — a silent divergence risk if that utility is ever updated.

shuffle-plugin/src/main/java/com/nvidia/spark/rapids/shuffle/ucx/Rkeys.java and UCXActiveMessage.java are worth a second look.

Important Files Changed

Filename Overview
shuffle-plugin/src/main/java/com/nvidia/spark/rapids/shuffle/ucx/Rkeys.java New Java class replacing Scala case class Rkeys; uses scala.collection.Seq<ByteBuffer> which limits Java-friendliness despite the stated goal of this refactor.
shuffle-plugin/src/main/java/com/nvidia/spark/rapids/shuffle/ucx/UCXActiveMessage.java New Java class replacing Scala case class UCXActiveMessage; toString inlines hex-format logic that previously delegated to UCX.formatAmIdAndHeader/TransportUtils.toHex, creating a maintenance divergence point.
shuffle-plugin/src/main/java/com/nvidia/spark/rapids/shuffle/ucx/UCXError.java New Java class replacing Scala case class UCXError; clean conversion with correct Objects.equals/Objects.hash null-safe implementations.
shuffle-plugin/src/main/scala/com/nvidia/spark/rapids/shuffle/ucx/UCX.scala Removes the three Scala case classes (Rkeys, UCXActiveMessage, UCXError) that moved to Java; call-sites updated from companion-object apply to new constructor calls — straightforward mechanical change.
shuffle-plugin/src/main/scala/com/nvidia/spark/rapids/shuffle/ucx/UCXConnection.scala Removes UCXError case class definition and extends Logging from the companion object (unused there); adds Buffer super-type casts on ByteBuffer for Java 8/9 covariant-return compatibility — correct and intentional.
shuffle-plugin/src/main/scala/com/nvidia/spark/rapids/shuffle/ucx/UCXShuffleTransport.scala Converts SendBounceBuffers companion-apply calls to new, and ClientAndBufferReceiveState from case class to class with explicit val fields — both correct since neither is used for structural equality.
sql-plugin-api/src/main/java/com/nvidia/spark/rapids/ThreadFactoryBuilder.java Clean Java rewrite of the deleted Scala ThreadFactoryBuilder; format-string validation in setNameFormat and the nullable Boolean daemon field are functionally equivalent to the Scala Option-based approach.
sql-plugin-api/src/main/scala/com/nvidia/spark/rapids/ShimCommandRules.scala New file introducing ShimExecRule, ShimDataWritingCommandRule, and ShimRunnableCommandRule — lightweight typed wrappers for the shim command-rule registry, no logic concerns.
sql-plugin-format/src/main/java/com/nvidia/spark/rapids/format/TransferState.java Exact content move from sql-plugin to sql-plugin-format, adding the missing Apache license header; the old file in sql-plugin is correctly deleted.
sql-plugin-api/src/main/java/com/nvidia/spark/rapids/HashedPriorityQueue.java Pure rename from sql-plugin to sql-plugin-api with no content changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Before["Before (source locations)"]
        A["sql-plugin/.../HashedPriorityQueue.java"]
        B["sql-plugin/.../ThreadFactoryBuilder.scala"]
        C["sql-plugin/.../format/TransferState.java\n+ 10 other FlatBuffers classes"]
        D["UCX.scala\ncase class Rkeys\ncase class UCXActiveMessage\ncase class UCXError"]
        E["UCXConnection.scala\ncase class UCXError"]
    end

    subgraph After["After (destination locations)"]
        F["sql-plugin-api/.../HashedPriorityQueue.java"]
        G["sql-plugin-api/.../ThreadFactoryBuilder.java"]
        H["sql-plugin-api/.../ShimCommandRules.scala NEW"]
        I["sql-plugin-format/.../TransferState.java\n+ 10 other FlatBuffers classes"]
        J["shuffle-plugin/.../Rkeys.java NEW Java class"]
        K["shuffle-plugin/.../UCXActiveMessage.java NEW Java class"]
        L["shuffle-plugin/.../UCXError.java NEW Java class"]
    end

    A -->|rename| F
    B -->|rewrite as Java| G
    C -->|move + add license header| I
    D -->|extract to Java| J
    D -->|extract to Java| K
    E -->|extract to Java| L
Loading

Reviews (1): Last reviewed commit: "Move shuffle format metadata to Java mod..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants