-
Notifications
You must be signed in to change notification settings - Fork 4.6k
SiStripClusterizer, an Alpaka port of the CUDA ClustersFromRawProducerGPU #47629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SiStripClusterizer, an Alpaka port of the CUDA ClustersFromRawProducerGPU #47629
Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47629/44152
|
|
sorry, a very naive first trial of this via: #!/bin/bash -ex
# cmsrel CMSSW_15_1_X_2025-03-18-2300
# cd CMSSW_15_1_X_2025-03-18-2300/src/
# cmsenv
# git cms-merge-topic 47629
# scram b -j 20
hltGetConfiguration /dev/CMSSW_15_0_0/GRun \
--globaltag 150X_dataRun3_HLT_v1 \
--data \
--unprescale \
--output minimal \
--max-events 100 \
--eras Run3_2024 --l1-emulator uGT --l1 L1Menu_Collisions2024_v1_3_0_xml \
--customise RecoLocalTracker/SiStripClusterizer/customizeStripClustersFromRaw.customizeHLTStripClustersFromRaw_alpaka \
--input /store/data/Run2024I/EphemeralHLTPhysics0/RAW/v1/000/386/593/00000/91a08676-199e-404c-9957-f72772ef1354.root \
> hltData.py
cmsRun hltData.py >& hltData.logleads me to: was this ever tested in recent Run 3 real data? @cms-sw/trk-dpg-l2 FYI |
This is the same behaviour that we found when running on real data today. After this independent check (thank you) I am afraid that another bullet point should be inserted, in order to allow for the raw->digi kernel to unpack the ZS_LITE8, at the very least. When I ran tests, I always used MC data as for example |
As far as I know (all of this information has been cross-checked with Tracker Ops):
|
|
Hi @dan131riley , can you comment on the motivation you had the ZS readout mode in your original PR with cuda implementation? Thank you! Pietro |
this is in Italian, not very useful |
It is true: former discussion on the mattermost channel, as you noted, is in italian. This is about the stages from the cuda version to the alpaka one, leading to the 1:1 match of the cuda product in alpaka. This module is the subject of the PR. However, the PR is in draft mode because - as clearly emerged from the previous comments - in order to integrate this module one has to go beyond 1:1 port of the cuda code. In fact, implementation of fed raw unpacking shall include at the very least the 8-bit ZS mode. On the premise that (a) the material in the channel so far - indeed in italian - concers the "relatively-trivial" stages of the cuda-alpaka porting, I directly shared the link to the channel. Giving for granted that once this discussion on the additional features to append to the PR with you experts would have been by default in english. Now that this background is more clear, would you recommend to create a fresh channel from scratch? |
I think that it's more practical to discuss code specifics for what needs updates in an available PR thread (here). Maybe for some future discussion the MM can be still useful |
|
assign heterogeneous |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47629/44214
|
|
please test |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47629/47429
|
|
Pull request #47629 was updated. @Moanwar, @jfernan2, @mandrenguyen, @srimanob can you please check and sign again. |
|
+1 |
|
This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @ftenchini, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2) |
|
-1 Failed Tests: RelVals RelVals-NVIDIA_T4 Failed RelValsFailed RelVals-NVIDIA_T4
|
|
ignore tests-rejected with ib-failure |
|
+1 |
|
@pietroGru can you please prepare a backport for 16.0.X? |
- Kernels work divider of 256u from optimization (cms-sw#47629 (comment)) - Bug fix for events with `nStrips` = 0 (https://gist.github.com/mmusich/a7928a000b4eb6ea00ac5ab9cfa2238e) -- Fix for packet code of non-lite ZS buffers (https://gist.github.com/mmusich/a7928a000b4eb6ea00ac5ab9cfa2238e) - Use Acc1D directly (cms-sw#47629 (comment)) - Fix errors in static analysis (cms-sw#47629 (comment)) Co-authored-by: Andrea Bocci <[email protected]>
- Kernels work divider of 256u from optimization (cms-sw#47629 (comment)) - Bug fix for events with `nStrips` = 0 (https://gist.github.com/mmusich/a7928a000b4eb6ea00ac5ab9cfa2238e) -- Fix for packet code of non-lite ZS buffers (https://gist.github.com/mmusich/a7928a000b4eb6ea00ac5ab9cfa2238e) - Use Acc1D directly (cms-sw#47629 (comment)) - Fix errors in static analysis (cms-sw#47629 (comment)) Co-authored-by: Andrea Bocci <[email protected]>
PR 47629
Description
Pull request to merge an heterogeneous implementation of the silicon strip unpacker/clusterizer
SiStripClusterizerFromRawmodule (legacy), whose purpose is producing strip clusters from raw FED data.Legacy and heterogeneous implementations of the unpacker/clusterizer share both unpacking and clustering algorithms, with the heter. version having the needed generalizations to allow for parallel implementation.
Overview of the implementation
Details of the heterogeneous implementation can be found in here[1], while a summary is presented below. The parallel implementation consists of:
ESProducercalledSiStripClusterizerConditionsESProducerAlpaka. It is analogous to the legacy SiStripClusterizerConditionsESProducer. It has the purpose of reshuffling the strip conditions (good strips, cabling, etc.) in a more convenient portable format for the parallel algo;SiStripRawToCluster(alpakastream::SynchronizingEDProducer). It takes the inputRawFEDCollectionand unpacks the fed raw bytes into pairs of (strip, ADC) - henceforth referred as strip digi. It performs the clusterization of the strip digi according to the ThreeThreshold algo. Finally, the resulting digi and cluster data are exported as SoA collections;SiStripClustersToLegacy(edmglobal::EDProducer) EDProducer, which is a cpu producer converting the clusters and digi SoA collections into the legacy format used downstream by other modules (DetSetVector<SiStripCluster>, with digi amplitudes asSiStripClustermembers).The two DataFormats are introduced:
SiStripClusterSoA, which contains the data members ofSiStripClusterlegacy class reshuffled into a PortableCollection SoA;SiStripDigiSoA, as above but for theSiStripDigiclass.MaxSeedStrips
As explained in [1], the clusterizer's parallel implementation requires to pre-allocate the cluster candidate collection (host operation) at a time where the total number of cluster candidates is not known. Therefore, the
MaxSeedStripsparameter is introduced in the parallel version. It determines the maximum number of cluster candidates per event that can be produced.It is configurable during module setup. The sensitivity of clusters produced as a function of this parameter is investigated with PU MC run [2]. The default value is set to$2e5$ . The parameter can be personalized with
Physics validation
The heterogeneous module is validated by comparing the track quality with respect to legacy, and by looking at the strip DQM plots [7], using 9400 events from the dataset
/RelValTTbar_14TeV/CMSSW_15_0_0-PU_142X_mcRun3_2025_realistic_v7_STD_2025_PU-v3/GEN-SIM-DIGI-RAWwith the menu/dev/CMSSW_15_0_0/GRun/V76runningMC_ReducedIterativeTracking_v24path only inCMSSW_15_0_6.Validation plots are available here [8], with the recipe to reproduce (and files) reported in the accordion below.
Recipe
Steps
Preparation of step 1 (
hltx_trackingOnly_MC_1.py) configuration files:The step 2 for the
HLTRACKVALIDATORprocess is available in [9].The step 3 is the standard harvester for
hltMerged[10].For example, the following script is used to generate legacy/alpakaGPU/alpakaSerial files
Files
Early validation (in CMSSW_15_1_0_pre1)
An early validation was done on a smaller number of events (731) from
/store/mc/Run3Winter25Digi/TT_TuneCP5_13p6TeV_powheg-pythia8/GEN-SIM-DIGI-RAW/TrkFEVT_142X_mcRun3_2025_realistic_v7-v2/910002/0157fbd9-e915-4a10-bfe2-11db61e2b70d.rootof PU MC data, inCMSSW_15_1_0_pre1with/dev/CMSSW_15_1_0/GRun/V3running onlyMC_ReducedIterativeTracking_v22path [5].Deviations were found between the legacy and heterogeneous (i.e. see [5.1]) and discussed on Tracking POG meeting [6]. A cluster excess O(1) is found in the heterogeneous with respect to legacy having occurrence of O(12/500) events. The deviation legacy-heterogeneous was considered to have negligible impact on the performance by tracking experts during [6].
Timing
The most recent timing measurement is reported below, using
Run2025dataRun392642with L1L1Menu_Collisions2025_v1_1_1-d2and HLT/dev/CMSSW_15_0_0/GRun/V76, GlobalTag150X_dataRun3_HLT_v1inCMSSW_15_0_6releaseunpacking+clustering (+legacy conversion)
hltSiStripRawToClustersFacility (34.2 ms)
hltSiStripRawToClustersFacilityAlpaka (6.0 ms) + hltSiStripRawToClustersFacility (7.5 ms)
hltSiStripRawToClustersFacilityAlpaka (59.5 ms) + hltSiStripRawToClustersFacility (9.1 ms)
The measures have been done on the timing server one after the other, launching the jobs in area mode from lxplus8 from a fresh CMSSW area with the PR rebased on 15_0_6.
Previous measurements (Run2024)
using
Run2024dataRun392642with L1L1Menu_Collisions2024_v1_3_0-d1_xmland HLT/dev/CMSSW_15_0_0/GRun/V79, GlobalTag150X_dataRun3_HLT_v1inCMSSW_15_0_6release:unpacking+clustering (+legacy conversion)
hltSiStripRawToClustersFacility (32.1 ms)
hltSiStripRawToClustersFacilityAlpaka (46.9 ms) + hltSiStripRawToClustersFacility (7.9 ms)
hltSiStripRawToClustersFacilityAlpaka (5.8 ms) + hltSiStripRawToClustersFacility (7.7 ms)
Usage and customizer
Compilation
To create a
CMSSW_15_0_1_pre1areascram project CMSSW_15_0_1_pre1 cd CMSSW_15_0_1_pre1/src cmsenv git cms-merge-topic pietroGru:47629 scram bTiming measurements were done after rebase in
CMSSW_15_0_6scram project CMSSW_15_0_6 cd CMSSW_15_0_6/src cmsenv git cms-merge-topic --old-base CMSSW_15_0_1_pre1 pietroGru:47629 scram bCustomizer
A customizer called
customizeHLTStripClustersFromRaw_alpakaperforms the following operations to replace the legacy module with the heterogeneous one: (a) it attaches toprocesstheSiStripClusterizerConditionsESProducerAlpakaESProducer generating device conditions; (b) it attaches toprocessthe cluster producer (process.hltSiStripRawToClustersFacilityAlpaka) which inherits all the arguments of the legacy module fromprocess.hltSiStripRawToClustersFacility; (c) it replaces the legacyprocess.hltSiStripRawToClustersFacilitywith the module converting the cluster SoA into the legacy format (i.e., what the legacy module outputs).For example
To run the module serially on CPU
The following can be appended to the
hlt.pyHistory
CMSSW_14_2_0, then moved toCMSSW_15_0_1_pre1References
[1] https://indico.cern.ch/event/1554466/#68-update-on-alpaka-strip-unpa
[2] using
/store/mc/Run3Winter25Digi/TT_TuneCP5_13p6TeV_powheg-pythia8https://pgrutta.web.cern.ch/siStripClusterizer_1510pre1/validation_v4_compare/clustersMonitor.html[3] #34618
[4] https://mattermost.web.cern.ch/cms-exp/channels/sistrip-unpacking-on-gpu
[5] https://pgrutta.web.cern.ch/siStripClusterizer_1510pre1/validation_v2/
[5.1] https://pgrutta.web.cern.ch/siStripClusterizer_1510pre1/validation_v2/plots_hlt_hltMerged/effandfakePtEtaPhi.pdf
[6] https://indico.cern.ch/event/1549492/#66-strip-detector-unpacking-on
[7] DQM/HLTEvF/python/HLTSiStripMonitoring_cff.py
[8] https://pgrutta.web.cern.ch/siStripClusterizer_1510pre1/validation_v4.2
[9] https://pgrutta.web.cern.ch/siStripClusterizer_1510pre1/config/hltValidation_default.py
[10] https://pgrutta.web.cern.ch/siStripClusterizer_1510pre1/config/Harvesting.py, https://pgrutta.web.cern.ch/siStripClusterizer_1510pre1/config/Harvesting_all.py
[11] https://indico.cern.ch/event/1567945/
[12] https://github.com/pietroGru/cmssw/tree/backup/siStripClusterizer_1510pre1_preSquash
[13] https://pgrutta.web.cern.ch/siStripClusterizer_1510pre1/backup_47629.pdf
[A] https://cmshlttiming.app.cern.ch/display/pgrutta/CMSSW_15_0_6_LegacyGlobal_16.20250528_183634
[B] https://cmshlttiming.app.cern.ch/display/pgrutta/CMSSW_15_0_6_Heterogeneous_Serial_16.20250528_163330
[C] https://cmshlttiming.app.cern.ch/display/pgrutta/CMSSW_15_0_6_Heterogeneous_16.20250528_163207
[D] https://cmshlttiming.app.cern.ch/display/pgrutta/CMSSW_15_0_6_LegacyBaseline.20250725_151017
[E] https://cmshlttiming.app.cern.ch/display/pgrutta/CMSSW_15_0_6_Heterogeneous.20250725_150923
[F] https://cmshlttiming.app.cern.ch/display/pgrutta/CMSSW_15_0_6_Heterogeneous_serial.20250725_150820