Skip to content

Move aggregate and shuffle stats to the columnar module#15047

Open
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02j-columnar-profile-statsfrom
codex/unshim-stack-02k-columnar-aggregate-stats
Open

Move aggregate and shuffle stats to the columnar module#15047
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02j-columnar-profile-statsfrom
codex/unshim-stack-02k-columnar-aggregate-stats

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It moves aggregate and shuffle stat value classes into the columnar helper module, completing the focused statistics-helper movement before caller updates.

Stack context

Testing and validation notes

  • No standalone behavior change is intended in this layer. It is covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025 and the existing tests for the affected subsystem.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch from da19e3a to bc92d8e Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02k-columnar-aggregate-stats branch from d922d5a to b94b8e4 Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch from bc92d8e to b758f91 Compare June 10, 2026 21:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02k-columnar-aggregate-stats branch from b94b8e4 to 4e3e434 Compare June 10, 2026 21:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch from b758f91 to 68788a1 Compare June 10, 2026 21:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02k-columnar-aggregate-stats branch from 4e3e434 to 0a315f6 Compare June 10, 2026 21:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch from 68788a1 to cd3b1c4 Compare June 10, 2026 21:36
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02k-columnar-aggregate-stats branch 2 times, most recently from b614cf0 to 413b5ee Compare June 10, 2026 22:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch from cd3b1c4 to 3cbe182 Compare June 10, 2026 22:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02k-columnar-aggregate-stats branch from 413b5ee to 7a668b3 Compare June 10, 2026 22:46
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch from 596d45e to 767f014 Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02k-columnar-aggregate-stats branch from 7a668b3 to 1db64a9 Compare June 13, 2026 12:13
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:49
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This is an intermediate layer in a stacked refactor that migrates aggregate and shuffle statistics value classes from Scala case classes in sql-plugin into Java classes in the new sql-plugin-columnar module. No logic changes are intended.

  • Eight new Java value classes are added to sql-plugin-columnar covering aggregate mode info, async/throttling metrics, shuffle block/transaction types, and join options/cardinality stats — all porting the Scala originals faithfully with matching validation messages and floating-point comparison via Double.compare.
  • WindowedBlockIteratorSuite is updated from the Scala companion-object call style (BlockRange(...)) to explicit constructor syntax (new BlockRange(...)) so the tests compile correctly once the Scala case class is removed in a subsequent stack layer.
  • BlockRange and BlockWithSize temporarily exist in both sql-plugin (Scala) and sql-plugin-columnar (Java) under the same package; this intermediate state is intentional and resolves when the originals are deleted in the next layer.

Confidence Score: 4/5

Safe to merge as part of the full stack; the Java value classes are faithful ports with no behavioral change.

All eight new Java classes are straightforward read-only value types with no GPU memory allocation, no concurrency changes, and no altered business logic. The only noteworthy inconsistency is that JoinCardinalityStats and JoinOptions declare serialVersionUID = 0L while the five other new Serializable classes in the same PR use 1L; this does not affect correctness at runtime but could cause silent InvalidClassException failures in an unlikely rolling-upgrade scenario.

JoinCardinalityStats.java and JoinOptions.java — the serialVersionUID = 0L choice differs from peer classes and is worth confirming as intentional before the Scala originals are removed.

Important Files Changed

Filename Overview
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/AggregateModeInfo.java New Java value class for aggregate mode info, ported from Scala case class; all fields private final with correct equals/hashCode/toString.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/io/async/AsyncMetrics.java New Java value class for async task scheduling/execution timings; immutable, Serializable, correct equals/hashCode.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/io/async/ThrottlingExecutorStats.java New Java mutable stats class matching the original Scala case class with var fields; public mutable fields are intentional to allow in-place accumulation by ThrottlingExecutor.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/shuffle/BlockRange.java Java port of the Scala case class BlockRange; preserves validation logic and error message ("requirement failed:") from the Scala original.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/shuffle/BlockWithSize.java New Java interface corresponding to the Scala trait BlockWithSize; minimal and correct.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/shuffle/TransactionStats.java New Java value class for shuffle transaction stats; correctly uses Double.compare for floating-point equality; not Serializable (consistent with its local-logging purpose).
sql-plugin-columnar/src/main/java/org/apache/spark/sql/rapids/execution/JoinCardinalityStats.java New Java port of JoinCardinalityStats; leftNullCounts and rightNullCounts use Seq<?> (required due to Scala Long erasure), losing the Seq[Long] type contract; serialVersionUID is 0L inconsistently with peer classes.
sql-plugin-columnar/src/main/java/org/apache/spark/sql/rapids/execution/JoinOptions.java New Java port of JoinOptions; serialVersionUID is 0L inconsistently with peer classes; otherwise correct.
tests/src/test/scala/com/nvidia/spark/rapids/shuffle/WindowedBlockIteratorSuite.scala Test updated to use new BlockRange(…) constructor syntax instead of Scala companion-object apply; semantically equivalent and forward-compatible with the Java class replacement.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph sql-plugin-columnar [sql-plugin-columnar NEW Java classes]
        AM[AggregateModeInfo]
        AMX[AsyncMetrics]
        TE[ThrottlingExecutorStats]
        BR[BlockRange]
        BW[BlockWithSize]
        TS[TransactionStats]
        JC[JoinCardinalityStats]
        JO[JoinOptions]
        BW --> BR
    end

    subgraph sql-plugin [sql-plugin Scala originals - removed in later layer]
        AM2[AggregateModeInfo case class]
        TE2[ThrottlingExecutorStats case class]
        BR2[BlockRange case class]
        BW2[BlockWithSize trait]
        JC2[JoinCardinalityStats case class]
        JO2[JoinOptions case class]
        WBI[WindowedBlockIterator]
        BW2 --> BR2
        BR2 --> WBI
    end

    subgraph tests
        WBIS[WindowedBlockIteratorSuite\nnew BlockRange - updated syntax]
    end

    sql-plugin-columnar --> sql-plugin
    sql-plugin --> tests
Loading

Reviews (1): Last reviewed commit: "Add columnar aggregate and shuffle stat ..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants