-
Notifications
You must be signed in to change notification settings - Fork 4.6k
[DAQ] fix input source raw file deletion deadlock (15_0_X) #47644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DAQ] fix input source raw file deletion deadlock (15_0_X) #47644
Conversation
|
A new Pull Request was created by @smorovic for CMSSW_15_0_X. It involves the following packages:
@cmsbuild, @emeschi, @smorovic can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
cms-bot internal usage |
|
@cmsbuild please test |
|
type bug-fix |
|
Do you think we need to deploy a patch release for HLT with this right away, or could this wait for the next 15_0_X and be deployed by the end of next week ? |
|
@missirol, Although it is not happening right now in production with 15_0_1 (but will again with 15_0_2) only because I deployed it in a dirty way to debug as it is not easy to reproduce on a smaller system. So far it didn't reappear after I put it in emulator run and 3-4 production runs. |
|
Okay, thanks @smorovic.
I guess this counts as validation of this PR.
Tagging ORM (@srimanob), @cms-sw/orp-l2 and @cms-sw/hlt-l2. IIuc, this requires ORP to create a branch to make a patch off of 15_0_2 (e.g. a branch named |
|
@smorovic FYI @smuzaffar By the way, let's figure out first if we can use 15_0_X to patch, or we need 15_0_2_patchX. |
|
urgent |
|
Do we have this fix already in master ? |
|
Hi @smuzaffar |
|
as we want to just get PR in 15.0.X on top of 15.0.2, so I am going to create the 15_0_2_patchX branch now and move this PR to that branch. note that CMSSW_15_0_X already has few cmssw and cmsdist PR merged |
|
By the way, I am asking in advance, |
|
Pull request #47644 was updated. @BenjaminRS, @emeschi, @quinnanm, @smorovic can you please check and sign again. |
|
@smorovic , I can not update the base branch to CMSSW_15_0_2_patchX as that brings in extra commits. Can you please open a separate PR for CMSSW_15_0_2_patchX ? |
|
@smuzaffar |
|
+daq |
|
This pull request is fully signed and it will be integrated in one of the next CMSSW_15_0_X IBs after it passes the integration tests and once validation in the development release cycle CMSSW_15_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @sextonkennedy, @mandrenguyen, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
|
+1 Size: This PR adds an extra 56KB to repository Comparison SummarySummary:
|
|
looks good. @cms-sw/orp-l2 I am merging this for 15_0_X and triggering an IB to test this change in IB |
Question: Is CVMS needed for building or only availability? |
|
@smorovic , cvmfs is not needed for building, it is only needed for installation. How urget do we need this release? I can already build it and once 20h00 IBs do not show any error then I can upload it later today so that we can also install it on cvmfs before 9am tomorrow |
If it is available in yum repos (for cmspkg) I think it is already enough for us to deploy it (except if that also now depends on cvmfs somehow)..
It would be good to have it available in the morning to get rid of the current hack. |
building release and making RPMs available for download using cmspkg do not depend on cvmfs.
OK, I will start the build shortly but will only upload it once 20h00 15.0.X IBs is error free. If every thing goes fine then patch will be available in the morning |
|
+1 |
|
@smorovic |
Thank you, we are installing it in HLT. |
PR description:
As seen in HLT at low input rate runs, source gets stuck in fetching files because streams do not get next event and are still in status of consuming the old file. This fix checks FastMonitoringService status that no event stream is processing this file.
PR validation:
Tested live in emulator run in CDAQ.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
backport of: #47641
Reason for inclusion: causes occassional failure in closure of lumisections in HLT