Skip to content

Conversation

@thomreis
Copy link
Contributor

PR description:

  • Place ECAL GPU vs. CPU validation plots for HLT inside the HLT/HeterogeneousComparison/ DQM directory.
  • Make a comparison of Alpaka serial vs. Alpaka instead of legacy vs. Alpaka when the alpakaValidationEcal modifier is active. The legacy vs. Alpaka comparison can be activated with the gpuValidationEcal modifier instead, which also takes care that the timing algorithms match.

PR validation:

Passes 18446.413.

…lpaka instead of legacy vs. Alpkaka. Use gpuValidationEcal modifier for the later instead.
@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 28, 2026

cms-bot internal usage

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49965/47746

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @thomreis for master.

It involves the following packages:

  • DQM/EcalMonitorTasks (dqm)
  • DQMOffline/Trigger (dqm)
  • EventFilter/EcalRawToDigi (reconstruction)
  • RecoLocalCalo/EcalRecProducers (reconstruction)

@Moanwar, @cmsbuild, @ctarricone, @gabrielmscampos, @jfernan2, @mandrenguyen, @nothingface0, @rseidita, @srimanob can you please review it and eventually sign? Thanks.
@Fedespring, @HuguesBrun, @Martin-Grunewald, @ReyerBand, @apsallid, @argiro, @cericeci, @denizsun, @jhgoh, @missirol, @mmusich, @mtosi, @rchatter, @rociovilar, @salimcerci, @thomreis, @trocino, @wang0jin this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@thomreis
Copy link
Contributor Author

type ecal

@cmsbuild cmsbuild added the ecal label Jan 28, 2026
Uncalib2DJitterError = _ecalGpuTask.MEs.Uncalib2DJitterError.clone(path = _hltdir + '%(subdet)s/UncalibRecHits/%(prefix)sGT uncalib rec hit jitterError gpu-cpu map2D'),
Uncalib2DChi2 = _ecalGpuTask.MEs.Uncalib2DChi2.clone(path = _hltdir + '%(subdet)s/UncalibRecHits/%(prefix)sGT uncalib rec hit chi2 gpu-cpu map2D'),
Uncalib2DOOTAmp = _ecalGpuTask.MEs.Uncalib2DOOTAmp.clone(path = _hltdir + '%(subdet)s/UncalibRecHits/%(prefix)sGT uncalib rec hit OOT amplitude %(OOTAmp)s gpu-cpu map2D'),
Uncalib2DFlags = _ecalGpuTask.MEs.Uncalib2DFlags.clone(path = _hltdir + '%(subdet)s/UncalibRecHits/%(prefix)sGT uncalib rec hit flags gpu-cpu map2D')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if new plots are added in the cloned PSet? Should this be made automatic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If new plots are added they will be going to the default path. But since all variables of the data formats are covered already I do not see why new plots should be added.

But if you have a suggestion to make the changes automatic (add _hltdir and remove %(prefix)sGpuTask/) that would surely be an improvement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work:

_hltdir = 'HLT/HeterogeneousComparison/'
_remove = '%(prefix)sGpuTask/'

def cloneMEsWithPathFix(srcMEs, prefix):
    clones = {}
    for name, me in srcMEs.parameters_().items():
        if hasattr(me, 'path'):
            old = me.path.value()
            # remove the unwanted component if present                                                                                                                                                      
            new = old.replace(_remove, '')
            # prepend the HLT directory if not already there                                                                                                                                                
            new = prefix + new
            clones[name] = me.clone(path = new)
        else:
            clones[name] = me.clone()
    return clones

hltEcalGpuTask = _ecalGpuTask.clone(
    params = _ecalGpuTask.params.clone(
        runGpuTask = True,
        enableRecHit = False
    ),
    MEs = cloneMEsWithPathFix(_ecalGpuTask.MEs, _hltdir)
)

By the way I noticed that in your earlier implementation the ones that start with RecHit2DChi2 are not touched. Is it done purposefully?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks I'll put that in.
I did not change the RecHit monitoring paths because they are switched off with enableRecHit = False.

@nothingface0
Copy link
Contributor

@cmsbuild please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests RelVals RelVals-INPUT AddOn
Size: This PR adds an extra 36KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f721bb/50985/summary.html
COMMIT: b58036b
CMSSW: CMSSW_16_1_X_2026-01-28-2300/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49965/50985/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Unit Tests

I found 2 errors in the following unit tests:

---> test test_MC_22_crosscheck had ERRORS
---> test test_MC_23_crosscheck had ERRORS

Failed RelVals

  • 135.4135.4_ZEEFS_13/step1_ZEEFS_13.log
  • 2025.00000012025.0000001_RunZeroBias2025B_10k/step3_RunZeroBias2025B_10k.log
  • 2024.00700012024.0070001_RunTau2024I_10k/step3_RunTau2024I_10k.log
Expand to see more relval errors ...

Failed RelVals-INPUT

  • 138.5138.5_ExpressCollisions2021/step2_ExpressCollisions2021.log
  • 138.4138.4_PromptCollisions2021/step2_PromptCollisions2021.log
  • 1030.01030.0_RunHLTPhy2017B/step2_RunHLTPhy2017B.log
Expand to see more relval errors ...

Failed AddOn Tests

UNKNOWN
UNKNOWN
UNKNOWN
Expand to see more addon errors ...

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #49965 was updated. @Moanwar, @cmsbuild, @ctarricone, @gabrielmscampos, @jfernan2, @mandrenguyen, @nothingface0, @rseidita, @srimanob can you please check and sign again.

@thomreis
Copy link
Contributor Author

enable gpu

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #49965 was updated. @Moanwar, @cmsbuild, @ctarricone, @gabrielmscampos, @jfernan2, @mandrenguyen, @nothingface0, @rseidita, @srimanob can you please check and sign again.

@thomreis
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-AMD_MI300X
Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f721bb/51017/summary.html
COMMIT: 98e461d
CMSSW: CMSSW_16_1_X_2026-01-30-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49965/51017/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-AMD_MI300X

  • 34634.40334634.403_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation/step2_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation.log
  • 34634.40234634.402_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka/step2_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka.log
  • 34634.75134634.751_TTbar_14TeV+Run4D121PU_HLT75e33TimingAlpaka/step2_TTbar_14TeV+Run4D121PU_HLT75e33TimingAlpaka.log
Expand to see more relval errors ...

Comparison Summary

Summary:

  • You potentially added 3 lines to the logs
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 52
  • DQMHistoTests: Total histograms compared: 4028550
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4028524
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 51 files compared)
  • Checked 222 log files, 193 edm output root files, 52 DQM output files
  • TriggerResults: no differences found

@thomreis
Copy link
Contributor Author

The errors look unrelated to the changes in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants