Move profile and write stats to the columnar module#15046
Conversation
25ca800 to
c43daa4
Compare
bc92d8e to
b758f91
Compare
1b0c44a to
f1b1f2a
Compare
68788a1 to
cd3b1c4
Compare
8ca4d3c to
a4baab8
Compare
cd3b1c4 to
3cbe182
Compare
a4baab8 to
65aa39a
Compare
3cbe182 to
596d45e
Compare
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
65aa39a to
7c9b4f7
Compare
596d45e to
767f014
Compare
Greptile SummaryThis PR moves profile-messaging value classes (
Confidence Score: 4/5Safe to merge after addressing the array-equality issue in ProfileJobStageQueryMsg; all other files are clean translations. Eight of the nine new files are straightforward, correct translations of Scala value classes with no logic changes. ProfileJobStageQueryMsg explicitly hand-writes equals/hashCode/toString but uses Objects.equals/hash on int[] arrays, which compares by reference rather than by content. While the original Scala case class had the same limitation via auto-generation, an explicitly written Java equals method is expected to provide content equality. The current code will silently return false when two independently-created instances carry the same job/stage data, which could mislead future consumers of this API. sql-plugin-columnar/src/main/java/com/nvidia/spark/rapids/ProfileJobStageQueryMsg.java Important Files Changed
Class Diagram%%{init: {'theme': 'neutral'}}%%
classDiagram
class ProfileMsg {
<<interface>>
+Serializable
}
class ProfileInitMsg {
-String executorId
-String path
}
class ProfileEndMsg {
-String executorId
-String path
}
class ProfileStatusMsg {
-String executorId
-String msg
}
class ProfileErrorMsg {
-String executorId
-String msg
}
class ProfileJobStageQueryMsg {
-int[] activeJobs
-int[] activeStages
}
class BasicColumnarWriteTaskStats {
<<WriteTaskStats>>
-Seq~InternalRow~ partitions
-int numFiles
-int numWriters
-long numBytes
-long numRows
}
class NanoTime {
<<Serializable>>
-Long value
}
class SizeInBytes {
<<Serializable>>
-Long value
}
ProfileMsg <|.. ProfileInitMsg
ProfileMsg <|.. ProfileEndMsg
ProfileMsg <|.. ProfileStatusMsg
ProfileMsg <|.. ProfileErrorMsg
ProfileMsg <|.. ProfileJobStageQueryMsg
Reviews (1): Last reviewed commit: "Add columnar profile and write stat valu..." | Re-trigger Greptile |
| } | ||
| if (!(other instanceof ProfileJobStageQueryMsg)) { | ||
| return false; | ||
| } | ||
| ProfileJobStageQueryMsg that = (ProfileJobStageQueryMsg) other; | ||
| return Objects.equals(activeJobs, that.activeJobs) && | ||
| Objects.equals(activeStages, that.activeStages); | ||
| } | ||
|
|
||
| @Override | ||
| public int hashCode() { |
There was a problem hiding this comment.
Array equality and hashing by reference, not by content
Objects.equals(activeJobs, that.activeJobs) delegates to Object.equals() on int[], which is identity (reference) comparison — two independently-constructed ProfileJobStageQueryMsg instances with identical contents will return false from equals. Objects.hash(activeJobs, activeStages) has the same problem: it hashes the array references, not their contents, so hash values differ even for content-equal instances. The original Scala case class had the same limitation because Scala auto-generates array-field equality as reference equality, but now that this equals is hand-written in Java there is a clean opportunity to fix it with Arrays.equals / Arrays.hashCode. The toString on line 58 has the same issue and would print [I@<hex> for both fields.
| } | |
| if (!(other instanceof ProfileJobStageQueryMsg)) { | |
| return false; | |
| } | |
| ProfileJobStageQueryMsg that = (ProfileJobStageQueryMsg) other; | |
| return Objects.equals(activeJobs, that.activeJobs) && | |
| Objects.equals(activeStages, that.activeStages); | |
| } | |
| @Override | |
| public int hashCode() { | |
| import java.util.Arrays; | |
| import java.util.Objects; | |
| public class ProfileJobStageQueryMsg implements ProfileMsg { | |
| private static final long serialVersionUID = 1L; | |
| private final int[] activeJobs; | |
| private final int[] activeStages; | |
| public ProfileJobStageQueryMsg(int[] activeJobs, int[] activeStages) { | |
| this.activeJobs = activeJobs; | |
| this.activeStages = activeStages; | |
| } | |
| public int[] activeJobs() { | |
| return activeJobs; | |
| } | |
| public int[] activeStages() { | |
| return activeStages; | |
| } | |
| @Override | |
| public boolean equals(Object other) { | |
| if (this == other) { | |
| return true; | |
| } | |
| if (!(other instanceof ProfileJobStageQueryMsg)) { | |
| return false; | |
| } | |
| ProfileJobStageQueryMsg that = (ProfileJobStageQueryMsg) other; | |
| return Arrays.equals(activeJobs, that.activeJobs) && | |
| Arrays.equals(activeStages, that.activeStages); | |
| } | |
| @Override | |
| public int hashCode() { | |
| return Objects.hash(Arrays.hashCode(activeJobs), Arrays.hashCode(activeStages)); | |
| } | |
| @Override | |
| public String toString() { | |
| return "ProfileJobStageQueryMsg(" + Arrays.toString(activeJobs) + "," + | |
| Arrays.toString(activeStages) + ")"; | |
| } | |
| } |
Related to #14834.
Description
This PR is one reviewable layer in the unshim stack introduced by #15025. It moves profile and write-stat value classes into the columnar helper module. This isolates another group of Java-compatible runtime values before aggregate and shuffle stat values are moved.
Stack context
Testing and validation notes
Checklists
Documentation
Testing
(Covered by the validation notes in the PR description.)
Performance