Skip to content

Move profile and write stats to the columnar module#15046

Open
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02i-columnar-table-valuesfrom
codex/unshim-stack-02j-columnar-profile-stats
Open

Move profile and write stats to the columnar module#15046
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02i-columnar-table-valuesfrom
codex/unshim-stack-02j-columnar-profile-stats

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It moves profile and write-stat value classes into the columnar helper module. This isolates another group of Java-compatible runtime values before aggregate and shuffle stat values are moved.

Stack context

Testing and validation notes

  • No standalone behavior change is intended in this layer. It is covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025 and the existing tests for the affected subsystem.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02i-columnar-table-values branch from 25ca800 to c43daa4 Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch 2 times, most recently from bc92d8e to b758f91 Compare June 10, 2026 21:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02i-columnar-table-values branch 2 times, most recently from 1b0c44a to f1b1f2a Compare June 10, 2026 21:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch 2 times, most recently from 68788a1 to cd3b1c4 Compare June 10, 2026 21:36
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02i-columnar-table-values branch 2 times, most recently from 8ca4d3c to a4baab8 Compare June 10, 2026 22:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch from cd3b1c4 to 3cbe182 Compare June 10, 2026 22:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02i-columnar-table-values branch from a4baab8 to 65aa39a Compare June 10, 2026 22:46
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch from 3cbe182 to 596d45e Compare June 10, 2026 22:46
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02i-columnar-table-values branch from 65aa39a to 7c9b4f7 Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02j-columnar-profile-stats branch from 596d45e to 767f014 Compare June 13, 2026 12:13
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:49
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR moves profile-messaging value classes (ProfileMsg and its five implementations) and write-stat helper classes (BasicColumnarWriteTaskStats, NanoTime, SizeInBytes) from the main sql-plugin Scala module into the sql-plugin-columnar Java module as part of a staged unshim refactor.

  • Six profile message classes are translated from Scala case class/trait to Java; five of the six (all String-field variants) are clean translations.
  • ProfileJobStageQueryMsg carries two int[] fields and its hand-written equals/hashCode/toString use Objects.equals/Objects.hash/concatenation, which operate on array identity rather than content.
  • Three write-stat classes (BasicColumnarWriteTaskStats, NanoTime, SizeInBytes) are straightforward translations with no issues.

Confidence Score: 4/5

Safe to merge after addressing the array-equality issue in ProfileJobStageQueryMsg; all other files are clean translations.

Eight of the nine new files are straightforward, correct translations of Scala value classes with no logic changes. ProfileJobStageQueryMsg explicitly hand-writes equals/hashCode/toString but uses Objects.equals/hash on int[] arrays, which compares by reference rather than by content. While the original Scala case class had the same limitation via auto-generation, an explicitly written Java equals method is expected to provide content equality. The current code will silently return false when two independently-created instances carry the same job/stage data, which could mislead future consumers of this API.

sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/ProfileJobStageQueryMsg.java

Important Files Changed

Filename Overview
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/ProfileMsg.java Marker interface extending Serializable — clean translation of the original Scala trait.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/ProfileEndMsg.java Two-field String value class with correct equals/hashCode using Objects.equals — clean translation of the Scala case class.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/ProfileErrorMsg.java Two-field String value class with correct equals/hashCode — clean translation of the Scala case class.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/ProfileInitMsg.java Two-field String value class with correct equals/hashCode — clean translation of the Scala case class.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/ProfileJobStageQueryMsg.java Two int-array fields; equals/hashCode use Objects.equals/hash which performs reference comparison on arrays instead of content comparison — breaks value semantics for this explicitly hand-written Java equals.
sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/ProfileStatusMsg.java Two-field String value class — clean translation of the Scala case class.
sql-plugin-columnar/src/main/java/org/apache/spark/sql/rapids/BasicColumnarWriteTaskStats.java WriteTaskStats implementation with five fields; equals/hashCode correct, Scala Seq interop preserved — clean translation.
sql-plugin-columnar/src/main/java/org/apache/spark/sql/rapids/NanoTime.java Boxed Long wrapper with formatted toString — clean translation of the Scala case class with consistent null-handling behaviour.
sql-plugin-columnar/src/main/java/org/apache/spark/sql/rapids/SizeInBytes.java Boxed Long wrapper with human-readable byte-size toString — clean translation of the Scala case class.

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class ProfileMsg {
        <<interface>>
        +Serializable
    }
    class ProfileInitMsg {
        -String executorId
        -String path
    }
    class ProfileEndMsg {
        -String executorId
        -String path
    }
    class ProfileStatusMsg {
        -String executorId
        -String msg
    }
    class ProfileErrorMsg {
        -String executorId
        -String msg
    }
    class ProfileJobStageQueryMsg {
        -int[] activeJobs
        -int[] activeStages
    }
    class BasicColumnarWriteTaskStats {
        <<WriteTaskStats>>
        -Seq~InternalRow~ partitions
        -int numFiles
        -int numWriters
        -long numBytes
        -long numRows
    }
    class NanoTime {
        <<Serializable>>
        -Long value
    }
    class SizeInBytes {
        <<Serializable>>
        -Long value
    }
    ProfileMsg <|.. ProfileInitMsg
    ProfileMsg <|.. ProfileEndMsg
    ProfileMsg <|.. ProfileStatusMsg
    ProfileMsg <|.. ProfileErrorMsg
    ProfileMsg <|.. ProfileJobStageQueryMsg
Loading

Reviews (1): Last reviewed commit: "Add columnar profile and write stat valu..." | Re-trigger Greptile

Comment on lines +44 to +54
}
if (!(other instanceof ProfileJobStageQueryMsg)) {
return false;
}
ProfileJobStageQueryMsg that = (ProfileJobStageQueryMsg) other;
return Objects.equals(activeJobs, that.activeJobs) &&
Objects.equals(activeStages, that.activeStages);
}

@Override
public int hashCode() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Array equality and hashing by reference, not by content

Objects.equals(activeJobs, that.activeJobs) delegates to Object.equals() on int[], which is identity (reference) comparison — two independently-constructed ProfileJobStageQueryMsg instances with identical contents will return false from equals. Objects.hash(activeJobs, activeStages) has the same problem: it hashes the array references, not their contents, so hash values differ even for content-equal instances. The original Scala case class had the same limitation because Scala auto-generates array-field equality as reference equality, but now that this equals is hand-written in Java there is a clean opportunity to fix it with Arrays.equals / Arrays.hashCode. The toString on line 58 has the same issue and would print [I@<hex> for both fields.

Suggested change
}
if (!(other instanceof ProfileJobStageQueryMsg)) {
return false;
}
ProfileJobStageQueryMsg that = (ProfileJobStageQueryMsg) other;
return Objects.equals(activeJobs, that.activeJobs) &&
Objects.equals(activeStages, that.activeStages);
}
@Override
public int hashCode() {
import java.util.Arrays;
import java.util.Objects;
public class ProfileJobStageQueryMsg implements ProfileMsg {
private static final long serialVersionUID = 1L;
private final int[] activeJobs;
private final int[] activeStages;
public ProfileJobStageQueryMsg(int[] activeJobs, int[] activeStages) {
this.activeJobs = activeJobs;
this.activeStages = activeStages;
}
public int[] activeJobs() {
return activeJobs;
}
public int[] activeStages() {
return activeStages;
}
@Override
public boolean equals(Object other) {
if (this == other) {
return true;
}
if (!(other instanceof ProfileJobStageQueryMsg)) {
return false;
}
ProfileJobStageQueryMsg that = (ProfileJobStageQueryMsg) other;
return Arrays.equals(activeJobs, that.activeJobs) &&
Arrays.equals(activeStages, that.activeStages);
}
@Override
public int hashCode() {
return Objects.hash(Arrays.hashCode(activeJobs), Arrays.hashCode(activeStages));
}
@Override
public String toString() {
return "ProfileJobStageQueryMsg(" + Arrays.toString(activeJobs) + "," +
Arrays.toString(activeStages) + ")";
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant