Skip to content

Conversation

@alejands
Copy link
Contributor

@alejands alejands commented Nov 1, 2021

PR description:

PPD has requested DQM subsystems to monitor several GPU-enabled collections being introduced in CMSSW_12_1_X. We have introduced a new EcalMonitorTask called GpuTask designed to take in CPU and GPU generated rec hits and produce plots comparing several rec hit quantities for each run. We don't explicitly plot the GPU rec hit values for the sake of memory efficiency. Additional plots may be added in the future.

This task will not run by default on the regular Online DQM workflow.

PR validation:

One caveat to this version of the code is that we have not been able to test it on actual GPU rec hits. There appears to be no workflow currently that produces both types of rec hit collections. We have been informed by Thomas Reis that CPU and GPU ECAL rec hits are the same data type, so our testing is done by changing the input tags so that both collections are the CPU rec hits.

This code was run with the runTheMatrix workflow 10842.512 and the new plots look as expected.

Backport

This is a backport of #35946 to include this code in the 12_1_X release.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2021

A new Pull Request was created by @alejands (Alejandro Sanchez) for CMSSW_12_1_X.

It involves the following packages:

  • DQM/EcalMonitorTasks (dqm)

@emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please review it and eventually sign? Thanks.
@rchatter, @simonepigazzini, @thomreis, @argiro this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@emanueleusai
Copy link
Member

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2021

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-788a01/20168/summary.html
COMMIT: a47505d
CMSSW: CMSSW_12_1_X_2021-11-01-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35947/20168/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test TestDQMServicesDemo had ERRORS

Comparison Summary

Summary:

  • You potentially added 4033 lines to the logs
  • Reco comparison results: 88 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 2901440
  • DQMHistoTests: Total failures: 32
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2901385
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 207.673 KiB( 41 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 3.350 KiB EcalBarrel/EBGpuTask
  • DQMHistoSizes: changed ( 1000.0,... ): 3.350 KiB EcalEndcap/EEGpuTask
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

jfernan2 commented Nov 2, 2021

backport

@perrotta
Copy link
Contributor

perrotta commented Nov 2, 2021

backport of #35946

@alejands
Copy link
Contributor Author

alejands commented Nov 2, 2021

We believe that the tests failed due to the GPU ecalRecHits not being enabled in CMSSW at the moment. The @cuda version is commented out here: RecoLocalCalo/EcalRecProducers/python/ecalRecHit_cff.py#43. For the time being, we'll modify the GPU input tags to also read from the CPU ecalRecHits. This can be quickly changed back when the GPU rec hits are available.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2021

Pull request #35947 was updated. @emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please check and sign again.

@jfernan2
Copy link
Contributor

jfernan2 commented Nov 2, 2021

@alejands the UnitTest failed due to the issue fixed in this PR #35921 not to your code, so I would bring back the code as you had it. Thanks

@alejands
Copy link
Contributor Author

alejands commented Nov 2, 2021

@jfernan2 I have removed the modification commit from the commit history

@jfernan2
Copy link
Contributor

jfernan2 commented Nov 2, 2021

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-788a01/20200/summary.html
COMMIT: a47505d
CMSSW: CMSSW_12_1_X_2021-11-02-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35947/20200/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 3983 lines to the logs
  • Reco comparison results: 90 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 2901440
  • DQMHistoTests: Total failures: 33
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2901385
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 207.669 KiB( 41 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 3.350 KiB EcalBarrel/EBGpuTask
  • DQMHistoSizes: changed ( 1000.0,... ): 3.350 KiB EcalEndcap/EEGpuTask
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2021

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_1_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_2_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@jfernan2
Copy link
Contributor

jfernan2 commented Nov 3, 2021

@perrotta @qliphy I am not sure if a test with enable gpu may give some check to this PR given its nature

@jfernan2
Copy link
Contributor

jfernan2 commented Nov 3, 2021

-1
Same comment as in master PR

@jfernan2
Copy link
Contributor

jfernan2 commented Nov 3, 2021

enable gpu

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2021

Pull request #35947 was updated. @emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please check and sign again.

@jfernan2
Copy link
Contributor

jfernan2 commented Nov 3, 2021

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-788a01/20232/summary.html
COMMIT: 5c364e9
CMSSW: CMSSW_12_1_X_2021-11-03-1100/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35947/20232/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19782
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19776
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 3.064 KiB( 3 files compared)
  • DQMHistoSizes: changed ( 11634.512 ): 1.532 KiB EcalBarrel/EBGpuTask
  • DQMHistoSizes: changed ( 11634.512 ): 1.532 KiB EcalEndcap/EEGpuTask
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 2901440
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2901411
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 94.984 KiB( 41 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 1.532 KiB EcalBarrel/EBGpuTask
  • DQMHistoSizes: changed ( 1000.0,... ): 1.532 KiB EcalEndcap/EEGpuTask
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@qliphy
Copy link
Contributor

qliphy commented Jan 10, 2022

please test

@alejands
Copy link
Contributor Author

@qliphy I have significant updates to this code that I'm about to submit as a new PR very soon. Perhaps it's best to close this PR and submit it all together

@qliphy
Copy link
Contributor

qliphy commented Jan 10, 2022

please abort

@qliphy
Copy link
Contributor

qliphy commented Jan 10, 2022

@qliphy I have significant updates to this code that I'm about to submit as a new PR very soon. Perhaps it's best to close this PR and submit it all together

Ok, thanks @alejands

@alejands alejands closed this Jan 18, 2022
@alejands alejands deleted the gpuTask_12_1_X branch June 23, 2022 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants