Skip to content

Conversation

@slava77
Copy link
Contributor

@slava77 slava77 commented Jan 13, 2025

  • fill charge_ and barycenter_ for all SiStripClusters
  • use another bit in firstStrip_ to signal it's an approximate cluster (to preserve detection of approximate clusters)

motivated by technical performance: offline pp reco calls SiStripCluster::barycenter and ::charge around x100 times the number of clusters per event, adding up to around 0.88% of reco time.

Smaller speedup is expected in the HLT setup (different CPE and no strip seeding).

  • In a setup with mkFit running in HLT the run time of MkFitSiStripHitConverter goes down fractionally by 28% (0.45% of a job running tracking only HLT)

No changes are expected in workflows using regular clustering (although in pp I wouldn't be surprised if some math compiler optimization can create a difference).

There may be some change in (HI) workflows reading stored data with approximate clusters converted to regular clusters. From a brief discussion, it sounds like this is only in specialized workflows, probably not captured by anything in the bot tests.

@cms-sw/trk-dpg-l2 @cms-sw/tracking-pog-l2

…stStrip_ to signal it's an approximate cluster
@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 13, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @slava77 for master.

It involves the following packages:

  • DataFormats/SiStripCluster (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@VinInn, @alesaggio, @echabert, @jlidrych, @missirol, @mmusich, @robervalwalsh, @rovere, @threus this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor Author

slava77 commented Jan 13, 2025

@cmsbuild please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Size: This PR adds an extra 32KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b8ba9c/43747/summary.html
COMMIT: 7ad80ba
CMSSW: CMSSW_15_0_X_2025-01-13-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47094/43747/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 3 errors in the following unit tests:

---> test DiElectronVertex had ERRORS
---> test DMRall had ERRORS
---> test PVall had ERRORS

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 96 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3819085
  • DQMHistoTests: Total failures: 61
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3819004
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 214 log files, 184 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor Author

slava77 commented Jan 14, 2025

Reco comparison results: 96 differences found in the comparisons

I forgot the obvious that for the old saved data the charge_ and barycenter_ should be appropriately filled.
I will make an update that triggers a class version change (and an ioread rule).

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #47094 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again.

Comment on lines 3 to 16
<class name="SiStripCluster" ClassVersion="14">
<version ClassVersion="14" checksum="1374720584"/>
<version ClassVersion="13" checksum="1374720584"/>
<version ClassVersion="12" checksum="2984011925"/>
<version ClassVersion="11" checksum="3702468681"/>
<version ClassVersion="10" checksum="3791198690"/>
<field name="error_x" transient="true"/>
</class>
<ioread sourceClass="SiStripCluster" version="[10]" targetClass="SiStripCluster" source="std::vector<uint8_t> amplitudes_; uint16_t firstStrip_;" target="barycenter_,charge_" >
<![CDATA[SiStripCluster::initQB_(onfile.amplitudes_, onfile.firstStrip_, charge_, barycenter_);]]>
</ioread>
<ioread sourceClass="SiStripCluster" version="[11-13]" targetClass="SiStripCluster" source="std::vector<uint8_t> amplitudes_; uint16_t firstStrip_; float barycenter_;" target="barycenter_,charge_" >
<![CDATA[if (onfile.barycenter_ == 0) SiStripCluster::initQB_(onfile.amplitudes_, onfile.firstStrip_, charge_, barycenter_);]]>
</ioread>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cms-sw/core-l2
this compiled for me, but the old data with FWLite Tree::Scan was crashing with a segfault.
So, I decided to push anyways to discuss

Something that I think may be unusual is that I updated the class version to 14 without changing the checksum (since the class data layout did not change).

... I tried to change the layout by adding a new data member, but for some reason there was no reaction from the reflex: the dictionary compiled no matter what I had in the version fields (e.g. using the old version 13 with the old checksum.
scram b updateclassversion or a regular scram build compiles without any complaints.
The log has
Skipped running EDM Class Version test. Is this expected?

@gartung
Copy link
Member

gartung commented Jan 16, 2025

@gartung
was igprof fixed recently (either on the running or the display side) or is that PR just in luck to run?
In the reco-profiling for 1500pre1 some wf links are empty, then some wfs have a healthy summary but the details are broken

No change has been made to IgProf. It is possible that your change fixed the heap error that was causing the segfault. If I run with IgProf enabled under valgrind massif or valgrind memcheck heap corruption is reported with CMSSW_14_2_1

@gartung
Copy link
Member

gartung commented Jan 16, 2025

@slava77
Copy link
Contributor Author

slava77 commented Jan 16, 2025

The failing fwlite comparisons are all apparently using AOD files. AODs are fully split, the symptoms are that the ioread rules are not executed.

I'm running a simple scan e->Scan("SiStripClusteredmNewDetSetVector_muonReducedTrackExtras__RECO.obj.m_data.barycenter()")

I confirmed with gdb, by setting a beak point at ROOT::read_SiStripCluster_1:

  • if I use the step2.root (RECO, not split) , gdb sees the breakpoint
  • if I read the step2_inAOD.root (AOD, fully split), there is no evidence of the rule call

@pcanal , is there some magic when calling the TTree::Scan to get the read rule to be executed?

@slava77
Copy link
Contributor Author

slava77 commented Jan 16, 2025

I'm running a simple scan e->Scan("SiStripClusteredmNewDetSetVector_muonReducedTrackExtras__RECO.obj.m_data.barycenter()")

@pcanal , is there some magic when calling the TTree::Scan to get the read rule to be executed?

just to clarify: the type of m_data is std::vector<SiStripCluster>

@slava77
Copy link
Contributor Author

slava77 commented Jan 16, 2025

@jfernan2
the last updates shouldn't affect the profiling results. So, the last iteration of tests with profiling enabled should be fine for inspection (so far no need to re-enable profiling).

@pcanal
Copy link
Contributor

pcanal commented Jan 16, 2025

is there some magic when calling the TTree::Scan to get the read rule to be executed?

What might (or might not) work would be to force TTree::Scan to read the whole object. To do so, one way is (as the first Scan items) to call a function on the object (which force it to read the whole object).

Whether this works or not, please send me a small file (10s of entries is enough) that contains the structure that fails to trigger the rules (and the Scan command that fails)

@slava77
Copy link
Contributor Author

slava77 commented Jan 16, 2025

is there some magic when calling the TTree::Scan to get the read rule to be executed?

What might (or might not) work would be to force TTree::Scan to read the whole object. To do so, one way is (as the first Scan items) to call a function on the object (which force it to read the whole object).

Whether this works or not, please send me a small file (10s of entries is enough) that contains the structure that fails to trigger the rules (and the Scan command that fails)

step2_inAOD.root is a link to the jenkins outputs, it can be directly downloaded with a browser; I used this file to do the test.

The Scan command is
Scan("SiStripClusteredmNewDetSetVector_muonReducedTrackExtras__RECO.obj.m_data.barycenter()")

Am I guessing correctly that somehow the fact that the barycenter() method is inlined is detected and as a result the whole object is not read?
If that would be the case, it sounds like a solution is to have some out-of-line no-op method to trigger the full object streaming.

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b8ba9c/43798/summary.html
COMMIT: 77ad11b
CMSSW: CMSSW_15_0_X_2025-01-16-1100/el8_amd64_gcc12
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47094/43798/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 17 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3819085
  • DQMHistoTests: Total failures: 115
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3818950
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 214 log files, 184 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor Author

slava77 commented Jan 17, 2025

If that would be the case, it sounds like a solution is to have some out-of-line no-op method to trigger the full object streaming.

I tried it (a mostly dummy function SiStripCluster::zero() const {return 0;} implemented in SiStripCluster.cc), but the ioread rule is still not called, as confirmed by running under gdb)

@slava77
Copy link
Contributor Author

slava77 commented Jan 17, 2025

disable profiling

@jfernan2
BTW, it doesn't look like this syntax works; from the instructions, one should issue enable none

@slava77
Copy link
Contributor Author

slava77 commented Jan 17, 2025

Reco comparison results: 17 differences found in the comparisons

this is from comparisons on AOD files (discussed above), supposedly affecting only FWLite.

DQMHistoTests: Total failures: 115

these are from non-repeatable workflows (phase-2 .911; all-mkFit .7, and messageLogger in PU wfs)

AFAIK, this is not the first issue when old fully split data is not read correctly with FWLite. So, I think that it should be acceptable to merge this change.
However, I'm open to a few days to investigate if the FWLite setup can be salvaged by forcing ioread rules in some way.

@jfernan2
Copy link
Contributor

enable none

@jfernan2
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @antoniovilela, @mandrenguyen, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 7172ca6 into cms-sw:master Jan 20, 2025
13 checks passed
@slava77
Copy link
Contributor Author

slava77 commented Jan 24, 2025

What might (or might not) work would be to force TTree::Scan to read the whole object. To do so, one way is (as the first Scan items) to call a function on the object (which force it to read the whole object).

@pcanal
I wanted to check if there is an update here. What needs to be in the function to trigger reading of the full object?
I tried to add an out of line method (below) that reads all values but the results are still the same (the read rule is not triggered).

float SiStripCluster::sumAll() const {
  return barycenter()+charge()+firstStrip()+filter()+size();
}

It seems like with the fully split mode the version check and schema evolution is just not running when calling via the FWLite.

@pcanal
Copy link
Contributor

pcanal commented Jan 25, 2025

@ slava77 I still need to reproduce this failures mode .. I'll let you know if I need more information.

@mmusich
Copy link
Contributor

mmusich commented Jan 29, 2025

type performance-improvements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants