Skip to content

Revision of the PSimHit type storage, revert trackId to its basic meaning#49969

Open
fabiocos wants to merge 9 commits intocms-sw:masterfrom
fabiocos:fc-psimhit-20260122
Open

Revision of the PSimHit type storage, revert trackId to its basic meaning#49969
fabiocos wants to merge 9 commits intocms-sw:masterfrom
fabiocos:fc-psimhit-20260122

Conversation

@fabiocos
Copy link
Contributor

PR description:

Following #49732 and the discussion in the Simulation meeting https://indico.cern.ch/event/1634520/contributions/6878108/attachments/3199713/5699141/SIM_20260116.pdf , this PR proposes a revision of the mechanism used to store and propagate the hit type classification, so far used only by MTD for Phase2, but in principle applicable to any interested sub-detector.

trackId is reverted to the pure Geant4 id, without any offset and limitation on the maximum available (beyond the uint32_t capability), so as not to interfere with exceptionally populated events, and software developments for GPU.

The hit type is moved as a 7-bits subfield in the processType member of PSimHit, where the maximum code is at present 403, and this is a used-defined integer, not a pseudo-random variable. The interface is adapted, the assignment of hit type in MTD is adjusted accordingly, and all the dependencies in the code (truth accumulator, validation) are adapted.

PR validation:

Tests on 100 single pions (wf 34506.0) are successful, the detailed Geant4 debug printout shows the desired behaviour, when activated, and the usual DQM histogram test shows perfect comparison between histograms.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 28, 2026

cms-bot internal usage

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49969/47752

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fabiocos for master.

It involves the following packages:

  • RecoMTD/TimingIDTools (reconstruction)
  • SimDataFormats/CaloAnalysis (simulation)
  • SimDataFormats/TrackingHit (simulation)
  • SimG4CMS/Forward (simulation)
  • SimG4Core/Application (simulation)
  • SimG4Core/Notification (simulation)
  • SimGeneral/CaloAnalysis (simulation)
  • Validation/MtdValidation (dqm)

@Moanwar, @civanch, @cmsbuild, @ctarricone, @gabrielmscampos, @jfernan2, @kpedro88, @mandrenguyen, @mdhildreth, @nothingface0, @rseidita, @srimanob can you please review it and eventually sign? Thanks.
@ReyerBand, @VinInn, @VourMa, @apsallid, @bsunanda, @denizsun, @elusian, @felicepantaleo, @makortel, @martinamalberti, @missirol, @mmasciov, @mmusich, @mtosi, @rovere, @salimcerci, @slomeo, @thomreis, @wang0jin this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@fabiocos
Copy link
Contributor Author

@civanch @kpedro88 this is a possible solution to the problem of avoiding limits to trackId, that preserves the existing functionalities. Please comment.

@fabiocos
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests RelVals RelVals-INPUT
Size: This PR adds an extra 116KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-cea6a3/50974/summary.html
COMMIT: 5606fe1
CMSSW: CMSSW_16_1_X_2026-01-28-1100/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49969/50974/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Unit Tests

I found 1 errors in the following unit tests:

---> test testPhase2PixelNtuple had ERRORS

Failed RelVals

  • 25202.0A fatal system signal has occurred: segmentation violation
  • 14234.0A fatal system signal has occurred: segmentation violation
  • 312.0A fatal system signal has occurred: segmentation violation
Expand to see more relval errors ...

Failed RelVals-INPUT

  • 11024.211024.2_TTbar_13UP18HEfailINPUT/step2_TTbar_13UP18HEfailINPUT.log

@fabiocos
Copy link
Contributor Author

There si apparently a problem in MixingModule, likely reading old PSimHit. The backward compatibility mechanism needs to be verified.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #49969 was updated. @Moanwar, @civanch, @cmsbuild, @ctarricone, @gabrielmscampos, @jfernan2, @kpedro88, @mandrenguyen, @mdhildreth, @nothingface0, @rseidita, @srimanob can you please check and sign again.

@fabiocos
Copy link
Contributor Author

please test with cms-sw/cmsdist#10212

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-cea6a3/51342/summary.html
COMMIT: e5d8dbb
CMSSW: CMSSW_16_1_X_2026-02-15-0000/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/49969/51342/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Unit Tests

I found 1 errors in the following unit tests:

---> test testDiMuonBiasesPlotting had ERRORS

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4167466
  • DQMHistoTests: Total failures: 154
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4167292
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 198 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 1 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step2 max memory diff -92.8 exceeds +/- 90.0 MiB

@fabiocos
Copy link
Contributor Author

fabiocos commented Feb 20, 2026

I did some tests and investigation of the crash observed in reading back old PSimHit collections. A manipulation of the output without accessing onfile.xxx members, nut using the newObj pointer like

   <ioread sourceClass = "PSimHit" version="[10]" targetClass="PSimHit" source="" target="theTrackId">
     <![CDATA[
     newObj->setTrackId(999);
    ]]>
   </ioread>

let the code end, and produces a final crash. On the contrary using the old object content directly:

   <ioread sourceClass = "PSimHit" version="[10]" targetClass="PSimHit" source="unsigned int theTrackId" target="theTrackId">
     <![CDATA[
     newObj->setTrackId(onfile.theTrackId);
    ]]>
   </ioread>

produces an immediate failure. I tried to use newObj following the example and discussion about SimTrack in #47682 , ideally the implementation (now commented) would be

   <!--<ioread sourceClass = "PSimHit" version="[10]" targetClass="PSimHit" source="unsigned int theTrackId; unsigned int theDetUnitId" target="theTrackId">-->                                                                                                                                                              
     <!--<![CDATA[-->
     <!--if ((onfile.theDetUnitId >> 25) == 0x31) { newObj->setTrackId(onfile.theTrackId % 200000000); }-->
    <!--]]>-->
   <!--</ioread>-->
   <!--<ioread sourceClass = "PSimHit" version="[10]" targetClass="PSimHit" source="unsigned int theTrackId; unsigned int theDetUnitId; unsigned short theProcessType" target="theProcessType">-->                                                                                                                           
     <!--<![CDATA[-->
     <!--if ((onfile.theDetUnitId >> 25) == 0x31) {unsigned short tmp = onfile.theProcessType; tmp |= (onfile.theTrackId / 200000000) << 9 ; newObj->setHitProdType(tmp);}-->
    <!--]]>-->
   <!--</ioread>-->

The failure happens always in the TClass call for new object when calling the default constructor for the object Point3DBase:

#9  0x00007ffff2a84db9 in TVirtualArray::TVirtualArray (this=0xc137a20, cl=0xc138eb0, size=34) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/core/meta/inc/TVirtualArray.h:36                                          
warning: 36     /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/core/meta/inc/TVirtualArray.h: No such file or directory                                                                                                    
(gdb) down
#8  0x00007ffff755df62 in TClass::NewObjectArray (this=0xc138eb0, nElements=34, defConstructor=TClass::kClassNew) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/core/meta/src/TClass.cxx:5414                          
warning: 5414   /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/core/meta/src/TClass.cxx: No such file or directory                                                                                                         
(gdb) down
#7  0x00007ffff25b6676 in TStreamerInfo::NewArray (this=0xc0e5e40, nElements=34, ary=0x0) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/io/io/src/TStreamerInfo.cxx:5158                                               
warning: 5158   /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/io/io/src/TStreamerInfo.cxx: No such file or directory                                                                                                      
(gdb) down
#6  0x00007ffff25b6367 in TStreamerInfo::New (this=0xc0e5e40, obj=0xc1d2460) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/io/io/src/TStreamerInfo.cxx:5084                                                            
5084    in /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/io/io/src/TStreamerInfo.cxx                                                                                                                                      
(gdb) down
#5  0x00007ffff755d459 in TClass::New (this=0x7d3cec0, arena=0xc1d2468, defConstructor=TClass::kClassNew) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/core/meta/src/TClass.cxx:5233                                  
warning: 5233   /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/core/meta/src/TClass.cxx: No such file or directory                                                                                                         
(gdb) down
#4  0x00007ffff755d550 in TClass::NewObject (this=0x7d3cec0, arena=0xc1d2468, defConstructor=TClass::kClassNew) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/core/meta/src/TClass.cxx:5257                            
5257    in /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/lcg/root/6.36.09-d082026c0b3c0e769837de82b04a743f/root-6.36.09/core/meta/src/TClass.cxx                                                                                                                                         
(gdb) down
#3  0x00007fffcdb2a524 in ROOT::new_Point3DBaselEfloatcOLocalTaggR (p=0xc1d2468) at tmp/el8_amd64_gcc13/src/DataFormats/GeometryVector/src/DataFormatsGeometryVector/lcgdict/DataFormatsGeometryVector_xr.cc:2701                                                                                                            
2701          return  p ? new(p) ::Point3DBase<float,LocalTag> : new ::Point3DBase<float,LocalTag>;
(gdb) down
#2  0x00007fffcdb2b596 in Point3DBase<float, LocalTag>::Point3DBase (this=<optimized out>, this=<optimized out>) at src/DataFormats/GeometryVector/interface/Point3DBase.h:23                                                                                                                                                
23        Point3DBase() {}
(gdb) down
#1  0x00007fffcdb2b526 in PV3DBase<float, PointTag, LocalTag>::PV3DBase (this=<optimized out>, this=<optimized out>) at src/DataFormats/GeometryVector/interface/PV3DBase.h:28                                                                                                                                               
28        PV3DBase() : theVector() {}
(gdb) down
#0  0x00007fffcdb2b428 in Basic3DVector<float>::Basic3DVector (this=<optimized out>, this=<optimized out>) at src/DataFormats/GeometryVector/interface/private/extBasic3DVector.h:45                                                                                                                                         
45        Basic3DVector() : v{0, 0, 0, 0} {}

at the line 5233 in method New
Screenshot 2026-02-20 at 13 29 08
calling ```NewObject``, that fails when the pointer to the object is created at line 5257:
Screenshot 2026-02-20 at 13 31 38

My guess is that the reading fails when accessing the initial members of PSimHit, one a Local3DPoint and the second a Local3DVector. Both have PV3DBase behind, although I would naively understand from the call to new_Point3DBaselEfloatcOLocalTaggR that the issue happens accessing the former.

I am afraid that progressing further is beyond my current limited knowledge of the object storage in ROOT. As SimTrack was recently manipulated in a similar way, I tried to see what could be the differences, but both are using not native type, even if in SimTrack they are ROOT vectors, instead of GeometryVector objects.

@makortel @pcanal any idea/suggestion? Do you see anything odd in the current structure of PSimHit that could explain such a behaviour? As far as I understand the overall class constructur is not playing a role at this stage, am I correct?

In principle giving up on backward compatibility would affect only MTD at present, and in the performance validation sector using MC truth, important but not a showstopper to continue development in newer releases. Of course it looks unpleasant to be unable to use the standard schema evolution, on the other hand I need to progress on this in one way or another. This PR is not introducing really new functionalities, just reimplementing old ones in a preferable way in view of the future.

@fabiocos
Copy link
Contributor Author

BTW, the above examples is based on the definition of a new PSimHit method


  void setTrackId(unsigned int trkId) { theTrackId = trkId; }

@fabiocos
Copy link
Contributor Author

There is one thing that sounds odd to me: why a Basic2DVector has 2 components and a Basic3DVector has 4 components... PV3DBase looks to manage 3 components, as expected from the name, but the underlying implementation called in https://github.com/cms-sw/cmssw/blob/master/DataFormats/GeometryVector/interface/Basic3DVector.h has 4 components , and when they are initialized in the default constructor:

https://github.com/cms-sw/cmssw/blob/master/DataFormats/GeometryVector/interface/private/extBasic3DVector.h#L45

the crash happens exactly here...

@fabiocos
Copy link
Contributor Author

In practice this constructor https://github.com/cms-sw/cmssw/blob/master/DataFormats/GeometryVector/interface/private/extBasic3DVector.h#L78 should be called when the PSimHit content is created, and the fourth element should be by default stored as 0, so the code should work correctly anyway....

@pcanal
Copy link
Contributor

pcanal commented Feb 20, 2026

@fabiocos @makortel Do you have a reproducer (with a debug build of ROOT) that I can use to track this down?

@makortel
Copy link
Contributor

makortel commented Feb 20, 2026

cms-sw/cmsdist#10212 (comment) (i.e. /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10212/51353/install.sh) gives a relatively fresh ROOT debug build (should be available for about a week or so).

@fabiocos
Copy link
Contributor Author

@pcanal on cmsdev45 I have used this PR built with the ROOT debug version mentioned by @makortel . Let me refresh it.

@fabiocos
Copy link
Contributor Author

please test with cms-sw/cmsdist#10212

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-cea6a3/51493/summary.html
COMMIT: e5d8dbb
CMSSW: CMSSW_16_1_X_2026-02-20-2300/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/49969/51493/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-cea6a3/51493/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-cea6a3/51493/git-merge-result

Failed Unit Tests

I found 1 errors in the following unit tests:

---> test testDiMuonBiasesPlotting had ERRORS

Comparison Summary

Summary:

  • You potentially added 16 lines to the logs
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4172102
  • DQMHistoTests: Total failures: 229
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4171853
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 198 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 1 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step2 max memory diff -99.2 exceeds +/- 90.0 MiB

@pcanal
Copy link
Contributor

pcanal commented Feb 22, 2026

I am able to reproduce the problem and I am investigating.

@pcanal
Copy link
Contributor

pcanal commented Feb 23, 2026

The problem is strictly linked to the fact that PV3DBase<float,VectorTag,LocalTag> must be 16 bytes aligned. Even-though this is properly announced in the C++ code, this information is not (yet) propagated in ROOT Core/Meta and thus the part of the code that deal with the I/O Customization rules and needs to create partial objects is getting it 'wrong'. I have not yet discovered a work-around.

@fabiocos
Copy link
Contributor Author

@pcanal thank you for the investigation. I keep for the time being this PR on hold, pending a possible solution on the ROOT side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants