Skip to content

Conversation

@LinaresToine
Copy link
Contributor

Replay Request

Requestor
Team or person that requests this replay

Describe the configuration

  • Release:
  • Run:
  • GTs:
    • expressGlobalTag:
    • promptrecoGlobalTag:
  • Additional changes:

Purpose of the test
A replay test is costly, both in computational and human resources. Please describe the reason why this test is needed.

T0 Operations cmsTalk thread
If necessary, provide a link to the cmsTalk thread announcing the test to the relevant groups.
Tier0 Operations cmsTalk Forum

@LinaresToine LinaresToine force-pushed the CosmicsHLTMonitor branch 2 times, most recently from c8c20de to 4baf05e Compare March 19, 2025 09:16
@fabiocos fabiocos mentioned this pull request Mar 20, 2025
scenario=cosmicsScenario,
diskNode="T0_CH_CERN_Disk",
data_tiers=["FEVTHLTALL"],
write_dqm=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinaresToine Sorry but this is still wrong:

     alca_producers=["TkAlHLTTracks", "TkAlHLTTracksZMuMu", "PromptCalibProdSiPixelAliHLTHGC"],

should be added to HLTMonitor and removed from CosmicsHLTMonitor.

@srimanob @fabiocos FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, I see that "HLTMonitor" comes with
alca_producers=["TkAlHLTTracks", "TkAlHLTTracksZMuMu", "PromptCalibProdSiPixelAliHLTHGC"],
while the CosmicsHLTMonitor includes
alca_producers=[],

This is correct, right? @mmusich Thx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, right?

yes, after the last couple of pushes, this corresponds to my expectations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiocos
Copy link
Contributor

@LinaresToine @mmusich I imagine we want to test this addition on cosmics, as it is meant for that, and using 15_0_2, where the corresponding monitoring cms-sw/cmssw#47568 has been activated. Am I correct?

@mmusich
Copy link
Contributor

mmusich commented Mar 20, 2025

@fabiocos

Am I correct?

Not really, the list of alca producers alca_producers=["TkAlHLTTracks", "TkAlHLTTracksZMuMu", "PromptCalibProdSiPixelAliHLTHGC"], here is wrong. We don't need to run them on cosmics.

@fabiocos
Copy link
Contributor

@mmusich I am referring to the CosmicHLTMonitor addition itself, not to the alca_producers listed, sorry if I was not clear. This replay at present seems just run on pp collisions, please correct me in case

@mmusich
Copy link
Contributor

mmusich commented Mar 20, 2025

sorry if I was not clear. This replay at present seems just run on pp collisions, please correct me in case

I don't know if that's the case or not. In any case this PR cannot be merged because of the reasons I mentioned above: #5049 (comment)

@LinaresToine
Copy link
Contributor Author

The cosmics replay is almost over. I originally injected the run in the first replay, but due to an operational mistake I had to restart it. Reco is now starting on the new replay, it may be followed here: Replay monitoring

dataset_lifetime=3*30*24*3600,#lifetime for container rules. Default 3 months
versionOverride=expressVersionOverride)

addExpressConfig(tier0Config, "CosmicsHLTMonitor",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinaresToine I just realized that there is yet another mistake here.
The name of the PD is CosmicHLTMonitor and not CosmicsHLTMonitor (without an "s"), see the description from @pietroGru at in the Tier0 ops cmsTalk

@LinaresToine LinaresToine merged commit d1aaf30 into master Mar 21, 2025
1 check passed
@LinaresToine
Copy link
Contributor Author

Thanks a lot @mmusich for your help defining this new express PD. I am now merging

LinaresToine added a commit that referenced this pull request Mar 21, 2025
LinaresToine added a commit that referenced this pull request Mar 21, 2025
@mmusich
Copy link
Contributor

mmusich commented Mar 21, 2025

@LinaresToine will this replay be restarted in CMSSW_15_0_2 with run 389831 as discussed in mattermost?

@LinaresToine
Copy link
Contributor Author

LinaresToine commented Mar 21, 2025

@mmusich
Copy link
Contributor

mmusich commented Mar 22, 2025

I did not restart a replay, I simply injected the run to the existing replay. Please see:
https://monit-grafana.cern.ch/d/t_jr45h7k/cms-tier0-replayid-monitoring?orgId=11&refresh=1m&var-Bin=5m&var-ReplayID=250320235239&var-JobType=All&var-WorkflowType=All

I think something is not working correctly for this setup.
If I look at the grafana monitoring I see that 8% of the Express jobs have not been running

Screenshot from 2025-03-22 10-44-50

and I don't see any output for the CosmicHLTMonitor that I would have expected.
Also checking in the test-bed tier0_wmstats instance I see several jobs in pending status after almost 24h:

Screenshot from 2025-03-22 10-47-25

Can you please check if there is some issue with the jobs submissions?

@LinaresToine
Copy link
Contributor Author

LinaresToine commented Mar 22, 2025

Hello @mmusich, thank you for reporting this. The dqm sequence @HLTMon does not exist for the cosmicsEra_Run3 scenario, which prevents the new CosmicHLTMonitor express PD to be processed correctly.

Step: DQM Spec: ['@HLTMon']
Failed to load process from Scenario cosmicsEra_Run3 (<Configuration.DataProcessing.Impl.cosmicsEra_Run3.cosmicsEra_Run3 object at 0x14bf33f2ddc0>).
Traceback (most recent call last):
  File "/cvmfs/cms.cern.ch/share/overrides/bin/cmssw_wm_create_process.py", line 144, in <module>
    main()
  File "/cvmfs/cms.cern.ch/share/overrides/bin/cmssw_wm_create_process.py", line 135, in main
    process=create_process(args, func_args)
  File "/cvmfs/cms.cern.ch/share/overrides/bin/cmssw_wm_create_process.py", line 97, in create_process
    raise ex
  File "/cvmfs/cms.cern.ch/share/overrides/bin/cmssw_wm_create_process.py", line 93, in create_process
    process = my_func(*call_func_args, **func_args)
  File "/cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_2/src/Configuration/DataProcessing/python/Impl/cosmics.py", line 60, in expressProcessing
    process = Reco.expressProcessing(self,globalTag, **args)
  File "/cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_2/src/Configuration/DataProcessing/python/Reco.py", line 147, in expressProcessing
    cb.prepare()
  File "/cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_2/src/Configuration/Applications/python/ConfigBuilder.py", line 2293, in prepare
    self.addStandardSequences()
  File "/cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_2/src/Configuration/Applications/python/ConfigBuilder.py", line 832, in addStandardSequences
    getattr(self,"prepare_"+stepName)(stepSpec = '+'.join(stepSpec))
  File "/cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_2/src/Configuration/Applications/python/ConfigBuilder.py", line 2121, in prepare_DQM
    setattr(self.process,pathName, cms.EndPath( getattr(self.process,_sequence ) ) )
AttributeError: 'Process' object has no attribute 'HLTMonitoring'

@LinaresToine
Copy link
Contributor Author

I have created the cmssw issue: cms-sw/cmssw#47662

@mmusich
Copy link
Contributor

mmusich commented Mar 22, 2025

I have created the cmssw issue: cms-sw/cmssw#47662

Thanks @LinaresToine for the follow-up.
I have opened PR-s to CMSW to resolve the issue reported above:

Let's try this again when we have a new (patch-)release with the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants