Skip to content

Conversation

@rovere
Copy link
Contributor

@rovere rovere commented Oct 7, 2025

PR description:

Remove unnecessary memset operations on all cases in which the memory is in any case written by the kernels w/o assuming any previous value. In all other cases, leave the memset operations to preserve the correctness of the algorithms.

PR validation:

Run on a set of events, the output, in terms of Tracksters is identical.
The performance has improved both running on GPU and, also, using the alpaka Serial CPU backend (thanks @mmusich for the results of the tests!!)

Screenshot from 2025-10-06 16-06-34 Screenshot from 2025-10-06 10-56-56

To properly test this PR, we also need an updated version of the CLUE external library:

cms-sw/cmsdist#10114

Remove unnecessary memset operations on all cases in which the memory is
in any case written by the kernels w/o assuming any previous value. In
all other cases, leave the memset operations to preserve the correctness
of the algorithms.
@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 7, 2025

cms-bot internal usage

@mmusich
Copy link
Contributor

mmusich commented Oct 7, 2025

enable gpu

@mmusich
Copy link
Contributor

mmusich commented Oct 7, 2025

test parameters:

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 7, 2025

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 7, 2025

A new Pull Request was created by @rovere for master.

It involves the following packages:

  • RecoLocalCalo/HGCalRecProducers (reconstruction, upgrade)

@Moanwar, @cmsbuild, @jfernan2, @mandrenguyen, @srimanob, @subirsarkar can you please review it and eventually sign? Thanks.
@apsallid, @bsunanda, @cseez, @denizsun, @edjtscott, @felicepantaleo, @hatakeyamak, @lecriste, @lgray, @pfs, @salimcerci, @sameasy, @sethzenz, @vandreev11, @youyingli this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Oct 7, 2025

@cmsbuild, please test

@mmusich
Copy link
Contributor

mmusich commented Oct 7, 2025

type performance-improvements

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 7, 2025

-1

Failed Tests: RelVals-AMD_MI300X
Size: This PR adds an extra 24KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0d3348/48510/summary.html
COMMIT: cecd70b
CMSSW: CMSSW_16_0_X_2025-10-06-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49078/48510/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0d3348/48510/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0d3348/48510/git-merge-result

RelVals-AMD_MI300X

  • 29834.40429834.404_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka_Profiling/step2_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka_Profiling.log
  • 29834.40329834.403_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka_Validation/step2_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka_Validation.log
  • 29834.40229834.402_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka/step2_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka.log
Expand to see more relval errors ...

Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

NVIDIA_T4 Comparison Summary

Summary:

@mmusich
Copy link
Contributor

mmusich commented Oct 8, 2025

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 8, 2025

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0d3348/48536/summary.html
COMMIT: cecd70b
CMSSW: CMSSW_16_0_X_2025-10-07-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49078/48536/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0d3348/48536/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0d3348/48536/git-merge-result

Comparison Summary

Summary:

  • You potentially added 23 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3940073
  • DQMHistoTests: Total failures: 424
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3939629
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 218 log files, 188 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 275 differences found in the comparisons
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 146621
  • DQMHistoTests: Total failures: 28205
  • DQMHistoTests: Total nulls: 12
  • DQMHistoTests: Total successes: 118404
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: found differences in 1 / 10 workflows

NVIDIA_H100 Comparison Summary

There are some workflows for which there are errors in the baseline:
29834.402 step 2
29834.403 step 2
29834.404 step 2
29834.704 step 2
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

NVIDIA_L40S Comparison Summary

Summary:

NVIDIA_T4 Comparison Summary

Summary:

@mmusich
Copy link
Contributor

mmusich commented Oct 8, 2025

For the record the performance of this PR in combination with the latest commit of cms-sw/cmsdist#10114 has been checked, resulting in a huge timing performance gain when running the HLT Phase2 timing menu HLT:75e33_timing (in presence of the alpaka modifier) on the CPU backend [1]:

Screenshot from 2025-10-08 13-30-21

[1]

#!/bin/bash -ex                                                                                                                                                                            

ALL_FILES_TTBAR='file:/shared/data/012dcc7c-fc39-45ad-b603-7cb987156456.root,file:/shared/data/02e8911a-095a-40c5-9200-a1b5efbfad45.root,file:/shared/data/08fb354c-6ed3-481e-b230-17822759dcdf.root,file:/shared/data/093f401c-5bc6-4101-9722-8f487b36d4d6.root,file:/shared/data/6058b392-1f46-4247-bf54-c753640131f8.root,file:/shared/data/613b476d-0041-4514-a0e6-911bc96c6516.root,file:/shared/data/6328ef1a-9228-442b-a427-45612bb7ce54.root,file:/shared/data/ac6c7f8a-f32d-4983-89c8-8029533e379c.root,file:/shared/data/aef94c88-cf94-4863-8f05-d7684b38a409.root,file:/shared/data/afbcecf7-2f9b-4bd3-9bb4-76cc8204cad1.root'

cmsDriver.py step2 -s L1P2GT,HLT:75e33_timing \
             --conditions auto:phase2_realistic_T33 \
             --datatier DQMIO,NANOAODSIM \
             -n 1000 \
             --eventcontent DQMIO,NANOAODSIM \
             --geometry ExtendedRun4D110 \
             --era Phase2C17I13M9 \
             --procModifier alpaka \
             --filein $ALL_FILES_TTBAR \
             --nThreads 24 \
             --process HLTX \
             --inputCommands='keep *, drop *_hlt*_*_HLT, drop triggerTriggerFilterObjectWithRefs_l1t*_*_HLT' \
             --no_exec \
	     --python_filename 75e33_timing_config_ONCPU.py

cat <<@EOF >> 75e33_timing_config_ONCPU.py
process.options.accelerators = ['cpu']
@EOF

@jfernan2
Copy link
Contributor

jfernan2 commented Oct 9, 2025

assign heterogeneous

@jfernan2
Copy link
Contributor

jfernan2 commented Oct 9, 2025

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2025

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Moanwar
Copy link
Contributor

Moanwar commented Oct 9, 2025

+Upgrade

@fwyzard
Copy link
Contributor

fwyzard commented Oct 9, 2025

+heterogeneous

@fwyzard
Copy link
Contributor

fwyzard commented Oct 9, 2025

any idea why the other fraction goes down ?

@mmusich
Copy link
Contributor

mmusich commented Oct 9, 2025

any idea why the other fraction goes down ?

no, we've been wondering that ourselves without a clear explanation.

@fwyzard
Copy link
Contributor

fwyzard commented Oct 9, 2025

I guess one more reason to try and break down the other into meaningful components 🤷🏻‍♂️

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2025

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @sextonkennedy, @ftenchini (and backports should be raised in the release meeting by the corresponding L2)

@ftenchini
Copy link

+1

@cmsbuild cmsbuild merged commit 55f271d into cms-sw:master Oct 10, 2025
26 checks passed
@mmusich
Copy link
Contributor

mmusich commented Oct 13, 2025

type ngt

@cmsbuild cmsbuild added the ngt label Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants