Skip to content

Conversation

@slava77
Copy link
Contributor

@slava77 slava77 commented Jul 8, 2025

the monitoring sequence is added in combination with trackingLST proc modifier for the offline setup.
In presence of the DQM (TrackingOfflineDQMClient) sequence this would trigger running a clone of the regular LST tracking in HighPtTripletStepTaskSerialSync (so that a no-DQM variant, e.g. used for timing, would not run the extra variant of this iteration).

DQM plots should show up in the /DQMData/Run 1/Tracking/Run summary/TrackBuilding/ValidationWRTSerialSync/highPtTripletStep folder

I made an attempt for a somewhat generic setup by introducing a trackToTrackCPUSequence, which can eventually be populated with other comparisons.

The implementation and population of the HighPtTripletStepTaskSerialSync task is a bit tedious,

Tested with 29834.704
The efficiency and deltaeta plots are shown below.

image image

this PR is supposed to address the Implement a "GPU vs CPU" workflow part of #46746 (a followup to the LST integration PR #45117 (comment)) and was somewhat high on the wish list related to the presentation in the GPU meeting May 19

A specific implementation was not particularly well specified; so, this may become an RFC.

@VourMa @fwyzard

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2025

cms-bot internal usage

@slava77
Copy link
Contributor Author

slava77 commented Jul 8, 2025

test parameters:

  • enable_tests = gpu
  • workflows_gpu = 29634.704,29834.704
  • workflows = 29634.703,29834.703,29834.755,29634.757,29834.757
  • relvals_opt = -w upgrade,standard
  • relvals_opt_gpu = -w upgrade,standard

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2025

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48508/45442

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2025

A new Pull Request was created by @slava77 for master.

It involves the following packages:

  • Configuration/PyReleaseValidation (upgrade, pdmv)
  • DQM/TrackingMonitorClient (dqm)
  • DQM/TrackingMonitorSource (dqm)
  • RecoTracker/IterativeTracking (reconstruction)

@AdrianoDee, @Moanwar, @antoniovagnerini, @cmsbuild, @ctarricone, @DickyChant, @jfernan2, @mandrenguyen, @miquork, @rseidita, @srimanob, @subirsarkar can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @Martin-Grunewald, @VinInn, @VourMa, @arossi83, @dgulhan, @fabiocos, @felicepantaleo, @fioriNTU, @gpetruc, @idebruyn, @jandrea, @makortel, @missirol, @mmusich, @mtosi, @richa2710, @rovere, @slomeo, @sroychow, @threus this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor Author

slava77 commented Jul 8, 2025

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2025

-1

Failed Tests: RelVals rocmUnitTests
Size: This PR adds an extra 56KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8c869e/47138/summary.html
COMMIT: 3d93e7e
CMSSW: CMSSW_15_1_X_2025-07-08-2300/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48508/47138/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

----- Begin Fatal Exception 09-Jul-2025 04:18:05 CEST-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
   [1] Validating configuration of module: class=MergeClusterProducer label='hltMergeLayerClusters'
Exception Message:
Illegal parameters found in configuration.  The parameters are named:
 'layerClustersEE'
 'time_layerclustersEE'
You could be trying to use parameter names that are not
allowed for this plugin or they could be misspelled.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 09-Jul-2025 04:18:05 CEST-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
   [1] Validating configuration of module: class=MergeClusterProducer label='hltMergeLayerClusters'
Exception Message:
Illegal parameters found in configuration.  The parameters are named:
 'layerClustersEE'
 'time_layerclustersEE'
You could be trying to use parameter names that are not
allowed for this plugin or they could be misspelled.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 09-Jul-2025 04:18:04 CEST-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
   [1] Validating configuration of module: class=MergeClusterProducer label='hltMergeLayerClusters'
Exception Message:
Illegal parameters found in configuration.  The parameters are named:
 'layerClustersEE'
 'time_layerclustersEE'
You could be trying to use parameter names that are not
allowed for this plugin or they could be misspelled.
----- End Fatal Exception -------------------------------------------------

ROCm Unit Tests

I found 0 errors in the following unit tests:


CUDA Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 183 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 117626
  • DQMHistoTests: Total failures: 14384
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 103242
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 9638.794 KiB( 8 files compared)
  • DQMHistoSizes: changed ( 29634.704,... ): 4819.397 KiB Tracking/TrackBuilding
  • Checked 32 log files, 36 edm output root files, 9 DQM output files
  • TriggerResults: found differences in 1 / 8 workflows

ROCM Comparison Summary

Summary:

  • You potentially added 5 lines to the logs
  • Reco comparison results: 149 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 117626
  • DQMHistoTests: Total failures: 13183
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 104443
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 9638.794 KiB( 8 files compared)
  • DQMHistoSizes: changed ( 29634.704,... ): 4819.397 KiB Tracking/TrackBuilding
  • Checked 32 log files, 36 edm output root files, 9 DQM output files
  • TriggerResults: no differences found

@mmusich
Copy link
Contributor

mmusich commented Jul 9, 2025

-1

failures are unrelated see #47859 (comment)

@slava77
Copy link
Contributor Author

slava77 commented Jul 14, 2025

@cmsbuild please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: rocmUnitTests
Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8c869e/47220/summary.html
COMMIT: 3d93e7e
CMSSW: CMSSW_15_1_X_2025-07-14-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/48508/47220/install.sh to create a dev area with all the needed externals and cmssw changes.

ROCm Unit Tests

I found 0 errors in the following unit tests:


Comparison Summary

Summary:

  • You potentially added 3 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 57
  • DQMHistoTests: Total histograms compared: 4447845
  • DQMHistoTests: Total failures: 4
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4447821
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 9634.59 KiB( 56 files compared)
  • DQMHistoSizes: changed ( 29634.703,... ): 4817.295 KiB Tracking/TrackBuilding
  • Checked 240 log files, 206 edm output root files, 57 DQM output files
  • TriggerResults: no differences found

CUDA Comparison Summary

Summary:

  • You potentially added 4 lines to the logs
  • Reco comparison results: 175 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 117626
  • DQMHistoTests: Total failures: 12871
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 104755
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 9638.794 KiB( 8 files compared)
  • DQMHistoSizes: changed ( 29634.704,... ): 4819.397 KiB Tracking/TrackBuilding
  • Checked 32 log files, 36 edm output root files, 9 DQM output files
  • TriggerResults: no differences found

ROCM Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 134 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 117626
  • DQMHistoTests: Total failures: 10754
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 106872
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 9638.794 KiB( 8 files compared)
  • DQMHistoSizes: changed ( 29634.704,... ): 4819.397 KiB Tracking/TrackBuilding
  • Checked 32 log files, 36 edm output root files, 9 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

+1

@Moanwar
Copy link
Contributor

Moanwar commented Jul 27, 2025

+Upgrade

@slava77
Copy link
Contributor Author

slava77 commented Jul 31, 2025

it would be nice to get some feedback (or an expectation of when it may be coming)

@cms-sw/dqm-l2 @cms-sw/pdmv-l2 signatures are missing

I'm not sure still if @cms-sw/heterogeneous-l2 is going to assign this PR

@fwyzard
Copy link
Contributor

fwyzard commented Jul 31, 2025

From a cursory look I don't think we need to assign to and sign for @cms-sw/heterogeneous-l2 .

@slava77
Copy link
Contributor Author

slava77 commented Aug 4, 2025

it would be nice to get some feedback (or an expectation of when it may be coming)

@cms-sw/dqm-l2 @cms-sw/pdmv-l2 signatures are missing

@cms-sw/orp-l2 this was open for 4 weeks
without comments/feedback from DQM and PDMV. Are the L2s away or is there an issue with this PR?

@AdrianoDee
Copy link
Contributor

+pdmv

@slava77
Copy link
Contributor Author

slava77 commented Aug 12, 2025

@cms-sw/dqm-l2 (now also directly @antoniovagnerini @rseidita @ctarricone)
your signature or comments are needed.
Please clarify on the status of the review of this PR.
Thank you

@rseidita
Copy link
Contributor

+dqm

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit d63a45c into cms-sw:master Aug 13, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants