Skip to content

Conversation

@Electricks94
Copy link
Contributor

This PR removes PortableHostMultiCollection and PortableDeviceMultiCollection and replaces them with SoABlocks (see #48629).

This has two major advantages:

  1. Simplifications of dictionaries because variadic templates are not necessary anymore
  2. Code simplifications because BlocksView holds views to all sublayouts

In a future PRs the code can be simplified even more because with kernels that use a BlocksView as input can work on the entire data. Hence, it is not necessarry anymore to have every sub-view as input. In this PR, however, I tried to make as few changes as possible to not blow up the PR too much.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

This is a backport of #49734 to 16_0_X

ATTN: @fwyzard

@felicepantaleo @leobeltra fyi

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 21, 2026

A new Pull Request was created by @Electricks94 for CMSSW_16_0_X.

It involves the following packages:

  • DQM/SiPixelHeterogeneous (dqm)
  • DataFormats/HGCalDigi (simulation)
  • DataFormats/Portable (heterogeneous)
  • DataFormats/PortableTestObjects (heterogeneous)
  • DataFormats/SoATemplate (heterogeneous)
  • DataFormats/TrackSoA (heterogeneous, reconstruction)
  • DataFormats/TrackingRecHitSoA (heterogeneous, reconstruction)
  • DataFormats/VertexSoA (heterogeneous, reconstruction)
  • HeterogeneousCore/AlpakaTest (heterogeneous)
  • RecoLocalTracker/Phase2TrackerRecHits (reconstruction)
  • RecoLocalTracker/SiPixelRecHits (reconstruction)
  • RecoTauTag/HLTProducers (hlt)
  • RecoTracker/LST (reconstruction)
  • RecoTracker/LSTCore (reconstruction)
  • RecoTracker/PixelSeeding (reconstruction)
  • RecoTracker/PixelTrackFitting (reconstruction)
  • RecoVertex/PixelVertexFinding (reconstruction)

@Martin-Grunewald, @Moanwar, @civanch, @cmsbuild, @ctarricone, @fwyzard, @gabrielmscampos, @jfernan2, @kpedro88, @makortel, @mandrenguyen, @mdhildreth, @mmusich, @nothingface0, @rseidita, @srimanob can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @IzaakWN, @VinInn, @VourMa, @azotz, @dgulhan, @dkotlins, @elusian, @fabiocos, @felicepantaleo, @ferencek, @fioriNTU, @gpetruc, @idebruyn, @jandrea, @makortel, @martinamalberti, @mbluj, @missirol, @mmasciov, @mmusich, @mroguljic, @mtosi, @pfs, @rovere, @threus, @tsusa this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 21, 2026

cms-bot internal usage

@fwyzard
Copy link
Contributor

fwyzard commented Jan 21, 2026

backport #49734

@fwyzard
Copy link
Contributor

fwyzard commented Jan 21, 2026

enable gpu

@fwyzard
Copy link
Contributor

fwyzard commented Jan 21, 2026

please test

@fwyzard
Copy link
Contributor

fwyzard commented Jan 28, 2026

test parameters:

  • enable = gpu
  • gpu = nvidia_t4

@fwyzard
Copy link
Contributor

fwyzard commented Jan 28, 2026

The other tests seem fine.

Let's try to re-run only the tests on the NVIDIA T4 GPU ...

@fwyzard
Copy link
Contributor

fwyzard commented Jan 28, 2026

please test

@fwyzard
Copy link
Contributor

fwyzard commented Jan 28, 2026

urgent

aimed at CMSSW_16_0_0

@mmusich
Copy link
Contributor

mmusich commented Jan 28, 2026

@cms-sw/reconstruction-l2 @cms-sw/simulation-l2 kind ping

@Moanwar
Copy link
Contributor

Moanwar commented Jan 28, 2026

+1

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-NVIDIA_T4
Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-967374/50951/summary.html
COMMIT: c6fed3a
CMSSW: CMSSW_16_0_X_2026-01-27-2300/el8_amd64_gcc13
Additional Tests: GPU,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49882/50951/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-NVIDIA_T4

ValueError: Undefined workflows: 17034.422, 17034.403, 17034.406, 17034.412, 17034.402, 17034.423, 29834.402, 29834.403, 29834.404, 29834.704, 29834.751

Comparison Summary

Summary:

  • You potentially removed 2 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 55
  • DQMHistoTests: Total histograms compared: 4382508
  • DQMHistoTests: Total failures: 114
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4382374
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 54 files compared)
  • Checked 235 log files, 208 edm output root files, 55 DQM output files
  • TriggerResults: no differences found

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 8 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 2025.0000001_RunZeroBias2025B_10k step2 max memory diff 531.9 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0000001_RunZeroBias2025B_10k step3 max memory diff 492.5 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step2 max memory diff 531.9 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step3 max memory diff 488.5 exceeds +/- 90.0 MiB
  • Error: Workflow 18434.0_TTbar_14TeV+2026 step3 max memory diff 488.2 exceeds +/- 90.0 MiB
  • Error: Workflow 18434.0_TTbar_14TeV+2026 step2 max memory diff 532.8 exceeds +/- 90.0 MiB
  • Error: Workflow 18634.0_TTbar_14TeV+2026PU step3 max memory diff 552.5 exceeds +/- 90.0 MiB
  • Error: Workflow 18634.0_TTbar_14TeV+2026PU step2 max memory diff 532.8 exceeds +/- 90.0 MiB

@fwyzard
Copy link
Contributor

fwyzard commented Jan 28, 2026

Does anyone know why the bot tests failed with

ValueError: Undefined workflows: 17034.422, 17034.403, 17034.406, 17034.412, 17034.402, 17034.423, 29834.402, 29834.403, 29834.404, 29834.704, 29834.751

?

@kpedro88
Copy link
Contributor

https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-967374/50951/runTheMatrixNVIDIA_T4-results/matrixTestsNVIDIA_T4.log includes:

ignoring relval_gpu from default matrix

which is where these workflows are defined. But that seems obviously wrong given the test parameters provided... @smuzaffar ?

@smuzaffar
Copy link
Contributor

there was a \r char in the cms-sw/cms-bot@178b963 (file for ondemad gpu flavors) and due to that bot failed to match nvida_t4 as a valid gpu flavor and did not pass -w gpu option to runTheMatrix. This is fixed now and I have restarted the t4 relval job (https://cmssdt.cern.ch/jenkins/job/ib-run-pr-relvals/62036/console ) is now properly use -w gpu

@fwyzard
Copy link
Contributor

fwyzard commented Jan 28, 2026

Thanks @smuzaffar 🙇🏻‍♂️

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-967374/50951/summary.html
COMMIT: c6fed3a
CMSSW: CMSSW_16_0_X_2026-01-27-2300/el8_amd64_gcc13
Additional Tests: GPU,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49882/50951/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 2 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 55
  • DQMHistoTests: Total histograms compared: 4382508
  • DQMHistoTests: Total failures: 114
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4382374
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 54 files compared)
  • Checked 235 log files, 208 edm output root files, 55 DQM output files
  • TriggerResults: no differences found

NVIDIA_T4 Comparison Summary

Summary:

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 8 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 2025.0000001_RunZeroBias2025B_10k step2 max memory diff 531.9 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0000001_RunZeroBias2025B_10k step3 max memory diff 492.5 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step2 max memory diff 531.9 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step3 max memory diff 488.5 exceeds +/- 90.0 MiB
  • Error: Workflow 18434.0_TTbar_14TeV+2026 step3 max memory diff 488.2 exceeds +/- 90.0 MiB
  • Error: Workflow 18434.0_TTbar_14TeV+2026 step2 max memory diff 532.8 exceeds +/- 90.0 MiB
  • Error: Workflow 18634.0_TTbar_14TeV+2026PU step3 max memory diff 552.5 exceeds +/- 90.0 MiB
  • Error: Workflow 18634.0_TTbar_14TeV+2026PU step2 max memory diff 532.8 exceeds +/- 90.0 MiB

@fwyzard
Copy link
Contributor

fwyzard commented Jan 28, 2026

@cms-sw/orp-l2 IMHO the PR is ready to be merged.

@ftenchini
Copy link

+1

@ftenchini
Copy link

merge

@cmsbuild cmsbuild merged commit 81e8d53 into cms-sw:CMSSW_16_0_X Jan 28, 2026
23 checks passed
@mmusich
Copy link
Contributor

mmusich commented Jan 29, 2026

After this PR was merged, we observe a failure in TSG integration tests in the IB CMSSW_16_0_X_2026-01-28-2300 (see log).

...
05:55:51 hltIntegrationTests OnLine_HLT_GRun.py -x realData=0 -x globalTag=@ -d HLT_Integration_GRun_MC -i file:../RelVal_Raw_GRun_MC.root -n 100 -j 8 -a cpu >& HLT_Integration_GRun_MC.log
13656.188u 3939.475s 46:54.87 625.0%	0+0k 13776+2382552io 749pf+0w
06:42:46 exit status: 1

This generally means that running one of the path standalone gives different results w.r.t. when it's run in the whole HLT menu.
As the affected path seems to be HLT_Integration_GRun_MC_HLT_SinglePNetTauhPFJet130_Tight_L2NN_eta2p3_v11, it's not unreasonable to think this is due to the changes in RecoTauTag/HLTProducers

@fwyzard
Copy link
Contributor

fwyzard commented Jan 29, 2026

What about 16.1.x ?

@mmusich
Copy link
Contributor

mmusich commented Jan 29, 2026

What about 16.1.x ?

no failures -- so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants