Skip to content

Conversation

@jeongeun
Copy link
Contributor

PR description:

Based on the issue "Dataformat compatibility issue for HI SiStrip cluster in RAW" (#39106)

Aim :
Changing Data-format (edmNew::DetSetVectorr) in RAW to be simple-enough for infinite backwards compatibility. -> It has to be readable by all future CMSSW releases.
Re-defining the corresponding final data-types directly in the ApproximatedClusters.
Need to be straightforward to convert from edmNew::DetSetVector

The simplified data format has updated (recommended by Matti in 2022 Sep)
(master...makortel:cmssw:siStripApproximateClusterCollection_v2)

Target : 13_2_X release (current working release : 13_2_0_pre2)

PR validation:

Tested in CMSSW_13_2_0_pre2, the basic test passed in the CMSSW PR instructions

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42022/35981

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @jeongeun (JeongEun Lee) for master.

It involves the following packages:

  • DataFormats/SiStripCluster (reconstruction)
  • RecoLocalTracker/SiStripClusterizer (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks.
@echabert, @VourMa, @gbenelli, @mtosi, @yduhm, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @missirol, @felicepantaleo, @alesaggio, @gpetruc, @mmusich, @threus, @jlidrych, @robervalwalsh this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Jun 20, 2023

test parameters:

  • workflows = 140.58, 140.60

@mmusich
Copy link
Contributor

mmusich commented Jun 20, 2023

@cmsbuild, please test

Copy link
Contributor

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I'd suggest to add some unit tests to ensure the SiStripApproximateClusterCollection functions as expected.

I'd also ask to add similar backwards-compatibility-related tests that we recently added for all other data products that are part of the RAW backwards compatibility guarantee, and add a README.md stating that (see e.g. #41631 for an example).

* (like all RAW data). Any modifications need to be made with care.
* Please consult core software group if in doubt.
**/
using namespace std;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using namespace is not allowed in the global scope in header files.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42022/36036

ERROR: Build errors found during clang-tidy run.

DataFormats/SiStripCluster/interface/SiStripApproximateClusterCollection.h:90:16: error: unknown type name 'size_t'; did you mean 'std::size_t'? [clang-diagnostic-error]
  void reserve(size_t dets, size_t clusters);
               ^~~~~~
               std::size_t
--
DataFormats/SiStripCluster/interface/SiStripApproximateClusterCollection.h:90:29: error: unknown type name 'size_t'; did you mean 'std::size_t'? [clang-diagnostic-error]
  void reserve(size_t dets, size_t clusters);
                            ^~~~~~
                            std::size_t
--
DataFormats/SiStripCluster/src/SiStripApproximateClusterCollection.cc:3:51: error: unknown type name 'size_t'; did you mean 'std::size_t'? [clang-diagnostic-error]
void SiStripApproximateClusterCollection::reserve(size_t dets, size_t clusters) {
                                                  ^~~~~~
                                                  std::size_t
--
DataFormats/SiStripCluster/src/SiStripApproximateClusterCollection.cc:3:64: error: unknown type name 'size_t'; did you mean 'std::size_t'? [clang-diagnostic-error]
void SiStripApproximateClusterCollection::reserve(size_t dets, size_t clusters) {
                                                               ^~~~~~
                                                               std::size_t
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42022/36044

@mmusich
Copy link
Contributor

mmusich commented Jul 10, 2023

I've got some another errors below during my local test runTheMatrix -l 140.58 140.6 --ibeos

have you tried re-basing your branch ? 140.58 passes in IBs (log)

@perrotta
Copy link
Contributor

If HGCal reconstruction is run in a 2018 HI workflow, there is definitely something that does not work in (your implementation) of the otherwise running workflow.

@mmusich
Copy link
Contributor

mmusich commented Jul 10, 2023

there is definitely something that does not work in (your implementation) of the otherwise running workflow.

I don't see how this is possible. This PR is not touching any configuration fragment.

@jeongeun
Copy link
Contributor Author

I redo with git cms-rebase-topic jeongeun:ApproxCluster_dataformat now. (unfortunately it's very slow in my lxplus.. I will check and update as soon as I can)

@jeongeun
Copy link
Contributor Author

jeongeun commented Jul 11, 2023

@mmusich @mandrenguyen @perrotta
I'd like to let you know my current status.
Sorry for the delay but I've got following errors when I run scram b after git rebase.
Errors are look like :

>> Compiling  /afs/cern.ch/work/j/jelee/ServiceWork/CMSSW_13_2_0_pre2/src/DataFormats/L1TGlobal/src/GlobalObjectMapRecord.cc
Entering library rule at src/DataFormats/L1TGlobal/test
error: class 'l1t::PFJet' has a different checksum for ClassVersion 4. Increment ClassVersion to 5 and assign it to checksum 2                                                                                                                                                                                                                                                           599349078
Suggestion: You can run 'scram build updateclassversion' to generate src/DataFormats/L1TParticleFlow/src/classes_def.xml.gener                                                                                                                                                                                                                                                           ated with updated ClassVersion
gmake: *** [config/SCRAM/GMake/Makefile.rules:1793: tmp/slc7_amd64_gcc11/src/DataFormats/L1TParticleFlow/src/DataFormatsL1TPar                                                                                                                                                                                                                                                           ticleFlow/libDataFormatsL1TParticleFlow.so] Error 1
gmake: *** Waiting for unfinished jobs....
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

And as suggested, I've fixed DataFormats/L1TParticleFlow/src/classes_def.xml file and run scram build updateclassversion.

I've just finished scram build updateclassversion.
There were some error messages during updateclassversion just like below:

...
>> Checking src/SimDataFormats/Associations/src/classes_def.xml for EDM Class Version update
Error in <TCling::LoadPCM>: ROOT PCM /afs/cern.ch/work/j/jelee/ServiceWork/CMSSW_13_2_0_pre2/tmp/slc7_amd64_gcc11/src/SimDataFormats/Associations/src/SimDataFormatsAssociations/SimDataFormatsAssociations_xr_rdict.pcm file does not exist

>> Checking src/SimGeneral/TrackingAnalysis/src/classes_def.xml for EDM Class Version update
Error in <TCling::LoadPCM>: ROOT PCM /afs/cern.ch/work/j/jelee/ServiceWork/CMSSW_13_2_0_pre2/tmp/slc7_amd64_gcc11/src/SimGeneral/TrackingAnalysis/src/SimGeneralTrackingAnalysis/SimGeneralTrackingAnalysis_xr_rdict.pcm file does not exist

>> Checking src/CalibPPS/AlignmentRelative/src/classes_def.xml for EDM Class Version update
Error in <TCling::LoadPCM>: ROOT PCM /afs/cern.ch/work/j/jelee/ServiceWork/CMSSW_13_2_0_pre2/tmp/slc7_amd64_gcc11/src/CalibPPS/AlignmentRelative/src/CalibPPSAlignmentRelative/CalibPPSAlignmentRelative_xr_rdict.pcm file does not exist
...
...
...

Currently, I'm running scram b again.

@perrotta
Copy link
Contributor

@jeongeun please start from the PR as it is.
Your errors are definitely coming from your own private version, and they do not show up if you start from this PR (which has errors related to the updates you applied to the code, as it can be seen in the PR test outputs, but definitely not the ones that you are listing here).

@mmusich
Copy link
Contributor

mmusich commented Jul 11, 2023

Hi @jeongeun
I followed this recipe:

cmsrel  CMSSW_13_2_X_2023-07-10-2300
cd CMSSW_13_2_X_2023-07-10-2300/src/
git cms-merge-topic 42022
git rebase -i CMSSW_13_2_X_2023-07-10-2300
# solve merge conflicts manually in SiStripClusters2ApproxClusters.cc
git add RecoLocalTracker/SiStripClusterizer/plugins/SiStripClusters2ApproxClusters.cc
git rebase --continue
scramv1 b -j 20
git cms-init
git push my-cmssw +HEAD:ApproxCluster_dataformat

The branch thusly prepared compiles (but fails running 140.58)
You can find the corrected branch here

@jeongeun
Copy link
Contributor Author

jeongeun commented Jul 11, 2023

@mmusich @perrotta Thanks for your feedback and correcting.
IIUC, this is the final corrected branch here already tested.
So that I can submit a new commit with this correction without any pretest. Is this correct?

@perrotta
Copy link
Contributor

@mmusich @perrotta Thanks for your feedback and correcting. IIUC, this is the final corrected branch here already tested. So that I can submit a new commit with this correction without any pretest. Is this correct?

@jeongeun do whatever you think it is quicker.
Even this branch works (a part the errors in the code, that I don't know if you fixed in the new branch): just choose the one which is easier to deal with for you.

@jeongeun
Copy link
Contributor Author

jeongeun commented Jul 11, 2023

Hi @jeongeun I followed this recipe:

cmsrel  CMSSW_13_2_X_2023-07-10-2300
cd CMSSW_13_2_X_2023-07-10-2300/src/
git cms-merge-topic 42022
git rebase -i CMSSW_13_2_X_2023-07-10-2300
# solve merge conflicts manually in SiStripClusters2ApproxClusters.cc
git add RecoLocalTracker/SiStripClusterizer/plugins/SiStripClusters2ApproxClusters.cc
git rebase --continue
scramv1 b -j 20
git cms-init
git push my-cmssw +HEAD:ApproxCluster_dataformat

The branch thusly prepared compiles (but fails running 140.58) You can find the corrected branch here

I followed this recipe and checked with 140.58 first.

runTheMatrix.py -l 140.58 --ibeos
140.58_RunHI2018 Step0-PASSED Step1-PASSED Step2-FAILED Step3-NOTRUN  - time date Tue Jul 11 10:06:55 2023-date Tue Jul 11 10:02:27 2023; exit: 0 0 35584 0
1 1 0 0 tests passed, 0 0 1 0 failed

error In 140.58_RunHI2018/step3_RunHI2018.log

RAW2DIGI,L1Reco,RECO,ALCA:SiStripCalZeroBias+SiPixelCalZeroBias,SKIM:PbPbEMu+PbPbZEE+PbPbZMM+PbPbZMu,DQM:@commonFakeHLT+@standardDQMFakeHLT
entry file:step2.root
Step: RAW2DIGI Spec:
Step: L1Reco Spec:
Step: RECO Spec:
Step: ALCA Spec: ['SiStripCalZeroBias', 'SiPixelCalZeroBias']
Step: SKIM Spec: ['PbPbEMu', 'PbPbZEE', 'PbPbZMM', 'PbPbZMu']
Step: DQM Spec: ['@commonFakeHLT', '@standardDQMFakeHLT']
customising the process with customisePostEra_Run2_2018_pp_on_AA from Configuration/DataProcessing/RecoTLR
Starting  cmsRun  step3_RAW2DIGI_L1Reco_RECO_ALCA_SKIM_DQM.py
11-Jul-2023 10:13:56 CEST  Initiating request to open file file:step2.root
11-Jul-2023 10:13:56 CEST  Successfully opened file file:step2.root
Begin processing the 1st record. Run 326479, Event 1394020, LumiSection 7 on stream 0 at 11-Jul-2023 10:14:41.535 CEST
%MSG-e SiStripMonitorApproximateCluster:   SiStripMonitorApproximateCluster:SiStripMonitorApproximateCluster  11-Jul-2023 10:14:41 CEST Run: 326479 Event: 1394020
SiStripApproximate cluster collection is not valid!
%MSG
2023-07-11 10:14:43.714005: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-11 10:14:43.714196: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (34)
2023-07-11 10:14:43.714252: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (lxplus786.cern.ch): /proc/driver/nvidia/version does not exist
%MSG-w PixelCPEClusterRepair:   PixelCPEClusterRepairESProducer:templates2@callESModule  11-Jul-2023 10:14:51 CEST Run: 326479 Event: 1394020
different template ID between 1D and 2D 9701 9700
%MSG
%MSG-w PixelCPEClusterRepair:   PixelCPEClusterRepairESProducer:templates2_speed0@callESModule  11-Jul-2023 10:14:51 CEST Run: 326479 Event: 1394020
different template ID between 1D and 2D 9701 9700
%MSG

error message coming from SiStripMonitorApproximateCluster/plugins/SiStripMonitorApproximateCluster.cc

@mmusich
Copy link
Contributor

mmusich commented Jul 11, 2023

Segmentation error In 140.58_RunHI2018/step3_RunHI2018.log

yes, that's what I was writing above (#42022 (comment)), but afaik that should come from the changes as in this PR.

@mmusich
Copy link
Contributor

mmusich commented Jul 11, 2023

@jeongeun

error message coming from SiStripMonitorApproximateCluster/plugins/SiStripMonitorApproximateCluster.cc

I think this change is needed

diff --git a/DQM/SiStripMonitorApproximateCluster/plugins/SiStripMonitorApproximateCluster.cc b/DQM/SiStripMonitorApproximateCluster/plugins/SiStripMonitorApproximateCluster.cc
index 7c0c8d5167c..b497c4513c8 100644
--- a/DQM/SiStripMonitorApproximateCluster/plugins/SiStripMonitorApproximateCluster.cc
+++ b/DQM/SiStripMonitorApproximateCluster/plugins/SiStripMonitorApproximateCluster.cc
@@ -21,6 +21,7 @@
 #include "DQMServices/Core/interface/MonitorElement.h"
 #include "DataFormats/Common/interface/DetSet.h"
 #include "DataFormats/Common/interface/DetSetVectorNew.h"
+#include "DataFormats/SiStripCluster/interface/SiStripApproximateClusterCollection.h"
 #include "DataFormats/SiStripCluster/interface/SiStripApproximateCluster.h"
 #include "DataFormats/SiStripCluster/interface/SiStripCluster.h"
 #include "FWCore/Framework/interface/Event.h"
@@ -102,7 +103,7 @@ private:
   MonitorElement* h_deltaEndStrip_{nullptr};
 
   // Event Data
-  edm::EDGetTokenT<edmNew::DetSetVector<SiStripApproximateCluster>> approxClustersToken_;
+  edm::EDGetTokenT<SiStripApproximateClusterCollection> approxClustersToken_;
   edm::EDGetTokenT<edmNew::DetSetVector<SiStripCluster>> stripClustersToken_;
   const edmNew::DetSetVector<SiStripCluster>* stripClusterCollection_;
 
@@ -117,7 +118,7 @@ SiStripMonitorApproximateCluster::SiStripMonitorApproximateCluster(const edm::Pa
     : folder_(iConfig.getParameter<std::string>("folder")),
       compareClusters_(iConfig.getParameter<bool>("compareClusters")),
       // Poducer name of input StripClusterCollection
-      approxClustersToken_(consumes<edmNew::DetSetVector<SiStripApproximateCluster>>(
+      approxClustersToken_(consumes<SiStripApproximateClusterCollection>(
           iConfig.getParameter<edm::InputTag>("ApproxClustersProducer"))) {
   tkGeomToken_ = esConsumes();
   if (compareClusters_) {
@@ -139,7 +140,7 @@ void SiStripMonitorApproximateCluster::analyze(const edm::Event& iEvent, const e
   const auto tkDets = tkGeom->dets();
 
   // get collection of DetSetVector of clusters from Event
-  edm::Handle<edmNew::DetSetVector<SiStripApproximateCluster>> approx_cluster_detsetvector;
+  edm::Handle<SiStripApproximateClusterCollection> approx_cluster_detsetvector;
   iEvent.getByToken(approxClustersToken_, approx_cluster_detsetvector);
   if (!approx_cluster_detsetvector.isValid()) {
     edm::LogError("SiStripMonitorApproximateCluster")
@@ -164,11 +165,11 @@ void SiStripMonitorApproximateCluster::analyze(const edm::Event& iEvent, const e
   }
 
   int nApproxClusters{0};
-  const edmNew::DetSetVector<SiStripApproximateCluster>* clusterCollection = approx_cluster_detsetvector.product();
+  const SiStripApproximateClusterCollection* clusterCollection = approx_cluster_detsetvector.product();
 
   for (const auto& detClusters : *clusterCollection) {
     edmNew::DetSet<SiStripCluster> strip_clusters_detset;
-    const auto& detid = detClusters.detId();  // get the detid of the current detset
+    const auto& detid = detClusters.id();  // get the detid of the current detset
 
     // starts here comaparison with regular clusters
     if (compareClusters_) {
@@ -233,6 +234,7 @@ void SiStripMonitorApproximateCluster::analyze(const edm::Event& iEvent, const e
 
     }  // loop on clusters in a detset
   }    // loop on the detset vector
+
   h_nclusters_->Fill(nApproxClusters);
 }

on the other hand now it still segfaults for me (in SiStripApprox2Clusters::produce)

@jeongeun
Copy link
Contributor Author

@mmusich Thank you for your comments.
Without any changes, the segmentation fault error was suddenly disappear and I'm still stuck with following error.

cat 140.58_RunHI2018/step2_RunHI2018.log
REPACK:DigiToApproxClusterRaw,ENDJOB
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
entry filelist:step1_dasquery.log
found files:  ['/store/hidata/HIRun2018A/HIHardProbes/RAW/v1/000/326/479/00000/0E2CC5D5-9D87-7348-9219-B00CD718C847.root', '/store/hidata/HIRun2018A/HIHardProbes/RAW/v1/000/326/479/00000/45001EBC-B4D4-9043-A276-8F3AF621C64A.root', '/store/hidata/HIRun2018A/HIHardProbes/RAW/v1/000/326/479/00000/7B3F72ED-E183-3F4B-9FE4-DAE6D911403E.root', '/store/hidata/HIRun2018A/HIHardProbes/RAW/v1/000/326/479/00000/853DBE29-53BA-9A44-9FDD-58E4E9064EB1.root']
Step: REPACK Spec: ['DigiToApproxClusterRaw']
Step: ENDJOB Spec:
customising the process with customisePostEra_Run2_2018_pp_on_AA from Configuration/DataProcessing/RecoTLR
Starting  cmsRun  step2_REPACK.py
12-Jul-2023 09:38:44 CEST  Initiating request to open file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/hidata/HIRun2018A/HIHardProbes/RAW/v1/000/326/479/00000/0E2CC5D5-9D87-7348-9219-B00CD718C847.root
12-Jul-2023 09:38:46 CEST  Successfully opened file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/hidata/HIRun2018A/HIHardProbes/RAW/v1/000/326/479/00000/0E2CC5D5-9D87-7348-9219-B00CD718C847.root
Begin processing the 1st record. Run 326479, Event 1394020, LumiSection 7 on stream 0 at 12-Jul-2023 09:38:52.567 CEST
%MSG-w SiStripRawToDigi:  SiStripRawToDigiModule:siStripDigisHLT  12-Jul-2023 09:40:26 CEST Run: 326479 Event: 1394020
NULL pointer to FEDRawData for FED: id 434
Note: further warnings of this type will be suppressed (this can be changed by enabling debugging printout)
%MSG
Begin processing the 2nd record. Run 326479, Event 1579493, LumiSection 7 on stream 0 at 12-Jul-2023 09:40:27.126 CEST
Begin processing the 3rd record. Run 326479, Event 1402087, LumiSection 7 on stream 0 at 12-Jul-2023 09:40:27.673 CEST
Begin processing the 4th record. Run 326479, Event 1328354, LumiSection 7 on stream 0 at 12-Jul-2023 09:40:28.155 CEST
Begin processing the 5th record. Run 326479, Event 1597351, LumiSection 7 on stream 0 at 12-Jul-2023 09:40:28.673 CEST
Begin processing the 6th record. Run 326479, Event 1404390, LumiSection 7 on stream 0 at 12-Jul-2023 09:40:29.013 CEST
Begin processing the 7th record. Run 326479, Event 1455655, LumiSection 7 on stream 0 at 12-Jul-2023 09:40:29.532 CEST
Begin processing the 8th record. Run 326479, Event 1570276, LumiSection 7 on stream 0 at 12-Jul-2023 09:40:29.897 CEST
Begin processing the 9th record. Run 326479, Event 1367000, LumiSection 7 on stream 0 at 12-Jul-2023 09:40:30.395 CEST
Begin processing the 10th record. Run 326479, Event 1568025, LumiSection 7 on stream 0 at 12-Jul-2023 09:40:30.524 CEST
12-Jul-2023 09:40:31 CEST  Closed file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/hidata/HIRun2018A/HIHardProbes/RAW/v1/000/326/479/00000/0E2CC5D5-9D87-7348-9219-B00CD718C847.root
%MSG-w SiStripRawToDigi:  SiStripRawToDigiModule:siStripDigisHLT@endStream  12-Jul-2023 09:40:33 CEST PostEndProcessBlock
[sistrip::RawToDigiUnpacker::createDigis] warnings:
NULL pointer to FEDRawData for FED (10)
%MSG

cat 140.58_RunHI2018/step3_RunHI2018.log
RAW2DIGI,L1Reco,RECO,ALCA:SiStripCalZeroBias+SiPixelCalZeroBias,SKIM:PbPbEMu+PbPbZEE+PbPbZMM+PbPbZMu,DQM:@commonFakeHLT+@standardDQMFakeHLT
entry file:step2.root
Step: RAW2DIGI Spec:
Step: L1Reco Spec:
Step: RECO Spec:
Step: ALCA Spec: ['SiStripCalZeroBias', 'SiPixelCalZeroBias']
Step: SKIM Spec: ['PbPbEMu', 'PbPbZEE', 'PbPbZMM', 'PbPbZMu']
Step: DQM Spec: ['@commonFakeHLT', '@standardDQMFakeHLT']
customising the process with customisePostEra_Run2_2018_pp_on_AA from Configuration/DataProcessing/RecoTLR
Starting  cmsRun  step3_RAW2DIGI_L1Reco_RECO_ALCA_SKIM_DQM.py
12-Jul-2023 09:44:30 CEST  Initiating request to open file file:step2.root
12-Jul-2023 09:44:30 CEST  Successfully opened file file:step2.root
Begin processing the 1st record. Run 326479, Event 1394020, LumiSection 7 on stream 0 at 12-Jul-2023 09:46:37.365 CEST
2023-07-12 09:46:41.771568: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-12 09:46:41.784753: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (34)
2023-07-12 09:46:41.785097: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (lxplus7116.cern.ch): /proc/driver/nvidia/version does not exist
%MSG-w PixelCPEClusterRepair:   PixelCPEClusterRepairESProducer:templates2@callESModule  12-Jul-2023 09:46:54 CEST Run: 326479 Event: 1394020
different template ID between 1D and 2D 9701 9700
%MSG
%MSG-w PixelCPEClusterRepair:   PixelCPEClusterRepairESProducer:templates2_speed0@callESModule  12-Jul-2023 09:46:55 CEST Run: 326479 Event: 1394020
different template ID between 1D and 2D 9701 9700
%MSG

@mmusich
Copy link
Contributor

mmusich commented Jul 12, 2023

Without any changes, the segmentation fault error was suddenly disappear

this is extremely fishy...

I'm still stuck with following error.

I don't see sign of crashes. I think your job was killed.

@jeongeun
Copy link
Contributor Author

jeongeun commented Jul 12, 2023

Without any changes, the segmentation fault error was suddenly disappear

this is extremely fishy...

I'm still stuck with following error.

I don't see sign of crashes. I think your job was killed.

Indeed. Sorry for the confusion. Segmentation error is still exist.
TBH I don't know which part brings this error.

@mmusich
Copy link
Contributor

mmusich commented Jul 12, 2023

TBH I don't know which part brings this error.

in my private tests it segfaults here:

ff.push_back(SiStripCluster(cluster, nStrips));

I haven't traced it further back than this.

@smuzaffar smuzaffar modified the milestones: CMSSW_13_2_X, CMSSW_13_3_X Jul 18, 2023
@makortel
Copy link
Contributor

makortel commented Aug 1, 2023

Can I ask what is the plan for this development regarding the HI data taking?

@mmusich
Copy link
Contributor

mmusich commented Aug 4, 2023

@jeongeun @makortel so I had a second look and I think that this is segfaulting because in one particular event we're exceeding the maximum depth of the SiStripApproximateClusterCollection::DetSet

after applying the changes in #42022 (comment) and the patch below

diff --git a/RecoLocalTracker/SiStripClusterizer/plugins/SiStripApprox2Clusters.cc b/RecoLocalTracker/SiStripClusterizer/plugins/SiStripApprox2Clusters.cc
index 803c8949f90..a76b11c9f0e 100644
--- a/RecoLocalTracker/SiStripClusterizer/plugins/SiStripApprox2Clusters.cc
+++ b/RecoLocalTracker/SiStripClusterizer/plugins/SiStripApprox2Clusters.cc
@@ -53,9 +53,14 @@ void SiStripApprox2Clusters::produce(edm::StreamID id, edm::Event& event, const
     const StripTopology& p = dynamic_cast<const StripGeomDetUnit*>(*det)->specificTopology();
     nStrips = p.nstrips() - 1;
 
+    std::cout << "before pushing back detId:" << detId << " n. clusters: " << detClusters.end() - detClusters.begin() << std::endl;
+    //int counter{0};
     for (const auto& cluster : detClusters) {
+      //std::cout << "pushed " << counter << " clusters" << std::endl;
       ff.push_back(SiStripCluster(cluster, nStrips));
+      //counter++;
     }
+    std::cout << "after pushing back" << std::endl;
   }

I've re-run wf. 140.58 and I see in the log file of step3 right before the crash:

...
after pushing back
before pushing back detId:470444268 n. clusters: 10
after pushing back
before pushing back detId:470444272 n. clusters: 9
after pushing back
before pushing back detId:470444276 n. clusters: -239807

@makortel
Copy link
Contributor

makortel commented Aug 4, 2023

The sizes of SiStripApproximateClusterCollection should be completely dynamic. To me the detClusters.end() - detClusters.begin() becoming negative suggests that the logic in

clusEnd_(detIndex == coll_->beginIndices_.size() - 1 ? coll_->beginIndices_.size()
: coll_->beginIndices_[detIndex + 1]) {}

is somehow flawed (or the indexing in SiStripApproximateClusterCollection::beginIndices_ got somehow screwed up).

@mmusich
Copy link
Contributor

mmusich commented Aug 7, 2023

To me the detClusters.end() - detClusters.begin() becoming negative suggests that the logic in cmssw/DataFormats/SiStripCluster/interface/SiStripApproximateClusterCollection.h is somehow flawed (or the indexing in SiStripApproximateClusterCollection::beginIndices_ got somehow screwed up).

indeed, thanks for the suggestion. #42495 is a re-vamp of this PR with the necessary fixes. @jeongeun you might want to close this one.

@makortel
Copy link
Contributor

makortel commented Aug 7, 2023

To me the detClusters.end() - detClusters.begin() becoming negative suggests that the logic in cmssw/DataFormats/SiStripCluster/interface/SiStripApproximateClusterCollection.h is somehow flawed (or the indexing in SiStripApproximateClusterCollection::beginIndices_ got somehow screwed up).

indeed, thanks for the suggestion. #42495 is a re-vamp of this PR with the necessary fixes. @jeongeun you might want to close this one.

Good catch! (cef181a)

@mandrenguyen
Copy link
Contributor

-1
Deprecated by #42495

@perrotta
Copy link
Contributor

-1 Deprecated by #42495

@perrotta perrotta closed this Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants