Skip to content

Conversation

@smorovic
Copy link
Contributor

PR description:

As seen in HLT at low input rate runs, source gets stuck in fetching files because streams do not get next event and are still in status of consuming the old file. This fix checks FastMonitoringService status that no event stream is processing this file.

PR validation:

Tested live in emulator run in CDAQ.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

backport of: #47641
Reason for inclusion: causes occassional failure in closure of lumisections in HLT

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @smorovic for CMSSW_15_0_X.

It involves the following packages:

  • EventFilter/Utilities (daq)

@cmsbuild, @emeschi, @smorovic can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @missirol this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 20, 2025

cms-bot internal usage

@smorovic
Copy link
Contributor Author

@cmsbuild please test

@smorovic
Copy link
Contributor Author

type bug-fix

@missirol
Copy link
Contributor

missirol commented Mar 20, 2025

@smorovic

Do you think we need to deploy a patch release for HLT with this right away, or could this wait for the next 15_0_X and be deployed by the end of next week ?

@smorovic
Copy link
Contributor Author

@missirol,
I think it would be better to have a patch release (tomorrow is probably fine) or there will be a lot of runs that have 1-2 open lumisections and need to be closed by hand which is manageable, but messy.

Although it is not happening right now in production with 15_0_1 (but will again with 15_0_2) only because I deployed it in a dirty way to debug as it is not easy to reproduce on a smaller system. So far it didn't reappear after I put it in emulator run and 3-4 production runs.

@missirol
Copy link
Contributor

Okay, thanks @smorovic.

Although it is not happening right now in production with 15_0_1 (but will again with 15_0_2) only because I deployed it in a dirty way to debug as it is not easy to reproduce on a smaller system. So far it didn't reappear after I put it in emulator run and 3-4 production runs.

I guess this counts as validation of this PR.

I think it would be better to have a patch release (tomorrow is probably fine) or there will be a lot of runs that have 1-2 open lumisections and need to be closed by hand which is manageable, but messy.

Tagging ORM (@srimanob), @cms-sw/orp-l2 and @cms-sw/hlt-l2.

IIuc, this requires ORP to create a branch to make a patch off of 15_0_2 (e.g. a branch named CMSSW_15_0_2_patchX), and someone opening a PR like this one targeting that branch, but I let experts take it from here.

@srimanob
Copy link
Contributor

srimanob commented Mar 20, 2025

@smorovic
We can't have the new release tomorrow as CVMFS will be in read-only mode in the whole morning.

FYI @smuzaffar

By the way, let's figure out first if we can use 15_0_X to patch, or we need 15_0_2_patchX.

@srimanob
Copy link
Contributor

urgent

@smuzaffar
Copy link
Contributor

Do we have this fix already in master ?

@srimanob
Copy link
Contributor

Hi @smuzaffar
Master PR: #47641

@smuzaffar
Copy link
Contributor

smuzaffar commented Mar 20, 2025

as we want to just get PR in 15.0.X on top of 15.0.2, so I am going to create the 15_0_2_patchX branch now and move this PR to that branch.

note that CMSSW_15_0_X already has few cmssw and cmsdist PR merged

@srimanob
Copy link
Contributor

By the way, I am asking in advance,
Is the change affect anything in Tier-0 operation? I mean do we need full replay when release is available for this patch?

FYI @LinaresToine @fabiocos

@smuzaffar smuzaffar changed the base branch from CMSSW_15_0_X to CMSSW_15_0_2_patchX March 20, 2025 18:40
@cmsbuild
Copy link
Contributor

Pull request #47644 was updated. @BenjaminRS, @emeschi, @quinnanm, @smorovic can you please check and sign again.

@smuzaffar smuzaffar changed the base branch from CMSSW_15_0_2_patchX to CMSSW_15_0_X March 20, 2025 18:40
@smuzaffar
Copy link
Contributor

smuzaffar commented Mar 20, 2025

@smorovic , I can not update the base branch to CMSSW_15_0_2_patchX as that brings in extra commits. Can you please open a separate PR for CMSSW_15_0_2_patchX ?

@smorovic
Copy link
Contributor Author

@smuzaffar
Here is the 15_0_2_patchX version: #47646
@srimanob
there is no effect on Tier0, not touching data content.

@smorovic
Copy link
Contributor Author

+daq
(assuming tests pass also here)

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_15_0_X IBs after it passes the integration tests and once validation in the development release cycle CMSSW_15_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @sextonkennedy, @mandrenguyen, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 56KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-18487e/45116/summary.html
COMMIT: d4151d5
CMSSW: CMSSW_15_0_X_2025-03-20-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47644/45116/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 3 lines from the logs
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4019958
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4019938
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

looks good. @cms-sw/orp-l2 I am merging this for 15_0_X and triggering an IB to test this change in IB

@smuzaffar smuzaffar merged commit 04fd886 into cms-sw:CMSSW_15_0_X Mar 20, 2025
9 checks passed
@smorovic
Copy link
Contributor Author

@smorovic We can't have the new release tomorrow as CVMFS will be in read-only mode in the whole morning.

Question: Is CVMS needed for building or only availability?
We install releases locally with cmspkg in HLT machines and CVMFS is not used.

@smuzaffar
Copy link
Contributor

@smorovic , cvmfs is not needed for building, it is only needed for installation.

How urget do we need this release? I can already build it and once 20h00 IBs do not show any error then I can upload it later today so that we can also install it on cvmfs before 9am tomorrow

@smorovic
Copy link
Contributor Author

@smorovic , cvmfs is not needed for building, it is only needed for installation.

How urget do we need this release? I can already build it and once 20h00 IBs do not show any error then I can upload it later today so that we can also install it on cvmfs before 9am tomorrow

If it is available in yum repos (for cmspkg) I think it is already enough for us to deploy it (except if that also now depends on cvmfs somehow)..

@smorovic , cvmfs is not needed for building, it is only needed for installation.

How urget do we need this release? I can already build it and once 20h00 IBs do not show any error then I can upload it later today so that we can also install it on cvmfs before 9am tomorrow

It would be good to have it available in the morning to get rid of the current hack.
But it is not extremely urgent and we can live with it for a day or two longer if it is difficult or inconvenient to do this shortly.
In any case, thanks to all people involved for the quick reaction.

@smuzaffar
Copy link
Contributor

If it is available in yum repos (for cmspkg) I think it is already enough for us to deploy it (except if that also now depends on cvmfs somehow)..

building release and making RPMs available for download using cmspkg do not depend on cvmfs.

It would be good to have it available in the morning to get rid of the current hack.

OK, I will start the build shortly but will only upload it once 20h00 15.0.X IBs is error free. If every thing goes fine then patch will be available in the morning

@mandrenguyen
Copy link
Contributor

+1

@srimanob
Copy link
Contributor

@smorovic
Release is now available. I am coordinating the change.
Thanks very much @smuzaffar

@smorovic
Copy link
Contributor Author

@smorovic Release is now available. I am coordinating the change. Thanks very much @smuzaffar

Thank you, we are installing it in HLT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants