Skip to content

Conversation

@fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Oct 2, 2025

For the changes since ROCm 6.4.3, see the ROCm 7.0.2 release notes.

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 2, 2025

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 2, 2025

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2025

A new Pull Request was created by @fwyzard for branch IB/CMSSW_16_0_X/master.

@akritkbehera, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2025

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 2, 2025

test parameters:

  • full_cmssw = true
  • enable = gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 2, 2025

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 2, 2025

To do:

  • check if the crashes reported with ROCm 7.0.0 are fixed by 7.0.1
  • check the performance of ROCm 7.0.x vs 6.4.3

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2025

Pull request has been put on hold by @fwyzard
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

@cmsbuild cmsbuild added the hold label Oct 2, 2025
@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2025

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/48435/summary.html
COMMIT: 4181c91
CMSSW: CMSSW_16_0_X_2025-10-02-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10106/48435/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/48435/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/48435/git-merge-result

Comparison Summary

Summary:

  • You potentially removed 157 lines from the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3924341
  • DQMHistoTests: Total failures: 41
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3924280
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 216.49999999999994 KiB( 50 files compared)
  • DQMHistoSizes: changed ( 10224.0,... ): 4.356 KiB EgammaV/ElectronMcSignalValidatorPt1000
  • DQMHistoSizes: changed ( 10224.0,... ): 4.304 KiB EgammaV/ElectronMcSignalValidator
  • Checked 218 log files, 188 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

NVIDIA_T4 Comparison Summary

Summary:

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 6, 2025

please test for el9_amd64_gcc13

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 6, 2025

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/48471/summary.html
COMMIT: 4181c91
CMSSW: CMSSW_16_0_X_2025-10-05-0000/el9_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10106/48471/install.sh to create a dev area with all the needed externals and cmssw changes.

@fwyzard fwyzard force-pushed the IB/CMSSW_16_0_X/master_rocm701 branch from 4181c91 to 60630c9 Compare October 26, 2025 06:02
@fwyzard fwyzard changed the title Update ROCm to version 7.0.1 Update ROCm to version 7.0.2 Oct 26, 2025
@cmsbuild
Copy link
Contributor

Pull request #10106 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 26, 2025

please test

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/48815/summary.html
COMMIT: 60630c9
CMSSW: CMSSW_16_0_X_2025-10-26-0000/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10106/48815/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation warning when building: See details on the summary page.

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 4, 2025

the DQM errors should be unrelated ...

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 4, 2025

please test for el9_amd64_gcc13

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2025

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49227/summary.html
COMMIT: 0b87588
CMSSW: CMSSW_16_0_X_2025-11-02-2300/el9_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10106/49227/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49227/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49227/git-merge-result

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 4, 2025

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2025

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49227/summary.html
COMMIT: 0b87588
CMSSW: CMSSW_16_0_X_2025-11-02-2300/el9_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10106/49227/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49227/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49227/git-merge-result

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2025

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49227/summary.html
COMMIT: 0b87588
CMSSW: CMSSW_16_0_X_2025-11-02-2300/el9_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10106/49227/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49227/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49227/git-merge-result

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2025

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49238/summary.html
COMMIT: 0b87588
CMSSW: CMSSW_16_0_X_2025-11-03-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10106/49238/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49238/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-276af9/49238/git-merge-result

Comparison Summary

Summary:

AMD_MI300X Comparison Summary

There are some workflows for which there are errors in the baseline:
29834.402 step 3
29834.403 step 3
29834.404 step 3
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

  • You potentially removed 310 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 162 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 116142
  • DQMHistoTests: Total failures: 16202
  • DQMHistoTests: Total nulls: 10
  • DQMHistoTests: Total successes: 99930
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 8 files compared)
  • Checked 40 log files, 45 edm output root files, 9 DQM output files
  • TriggerResults: no differences found

NVIDIA_H100 Comparison Summary

There are some workflows for which there are errors in the baseline:
29834.402 step 3
29834.403 step 3
29834.404 step 3
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

  • You potentially removed 288 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 166 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 116142
  • DQMHistoTests: Total failures: 10553
  • DQMHistoTests: Total nulls: 8
  • DQMHistoTests: Total successes: 105581
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 8 files compared)
  • Checked 40 log files, 45 edm output root files, 9 DQM output files
  • TriggerResults: no differences found

NVIDIA_L40S Comparison Summary

There are some workflows for which there are errors in the baseline:
29834.402 step 3
29834.403 step 3
29834.404 step 3
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

  • You potentially removed 303 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 186 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 116142
  • DQMHistoTests: Total failures: 11530
  • DQMHistoTests: Total nulls: 9
  • DQMHistoTests: Total successes: 104603
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 8 files compared)
  • Checked 40 log files, 45 edm output root files, 9 DQM output files
  • TriggerResults: found differences in 1 / 8 workflows

NVIDIA_T4 Comparison Summary

There are some workflows for which there are errors in the baseline:
29834.402 step 3
29834.403 step 3
29834.404 step 3
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

  • You potentially removed 274 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 194 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 116142
  • DQMHistoTests: Total failures: 11300
  • DQMHistoTests: Total nulls: 10
  • DQMHistoTests: Total successes: 104832
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 8 files compared)
  • Checked 40 log files, 45 edm output root files, 9 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 5, 2025

@akritkbehera, @iarspider, @smuzaffar this should (finally) be ready to be merged

@iarspider
Copy link
Contributor

+externals
LGTM.

@iarspider iarspider merged commit 9f9ee8c into cms-sw:IB/CMSSW_16_0_X/master Nov 5, 2025
35 checks passed
@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 5, 2025

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_16_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @ftenchini, @sextonkennedy, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants