-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
[JENKINS-59152] - Reduce the default process soft-kill timeout from 2 minutes to 5 seconds #4225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The original issue that introduced it (JENKINS-17116) suggested 5 seconds, so we need to analyze/understand why it was changed to 2 minutes. |
#3414 (comment) implemented it like that (and has drawn quite a lot of attention for this decision since). |
Test currently passes. This is caused by the fact that Jenkins test framework ignores exceptions happening during build process. However, if you look at test output, you'll notice all bad effects described in JENKINS-59152: 1. Pinging continues for a long time after "Sending interrupt signal to process" 2. FileMonitoringTask$FileMonitoringController.cleanup fails to remove a directory because 20s timer in DurableTaskStep.Execution#stop works independently of SoftKillWaitSeconds and doesn't properly wait until processes are killed 3. There's InterruptedException because WindowsOSProcess#killSoftly was interrupted by timeout in CpsThread#stop 4. There's IllegalStateException from CpsStepContext#completed because CpsThread#stop attempted to complete step after it was already completed by stopTask in DurableTaskStep.Execution References: JENKINS-59152, JENKINS-17116, jenkinsci/jenkins#4216, jenkinsci/jenkins#4225
Test currently passes. This is caused by the fact that Jenkins test framework ignores exceptions happening during build process. However, if you look at test output, you'll notice all bad effects described in JENKINS-59152: 1. Pinging continues for a long time after "Sending interrupt signal to process" 2. FileMonitoringTask$FileMonitoringController.cleanup fails to remove a directory because 20s timer in DurableTaskStep.Execution#stop works independently of SoftKillWaitSeconds and doesn't properly wait until processes are killed 3. There's InterruptedException because WindowsOSProcess#killSoftly was interrupted by timeout in CpsThread#stop 4. There's IllegalStateException from CpsStepContext#completed because CpsThread#stop attempted to complete step after it was already completed by stopTask in DurableTaskStep.Execution References: JENKINS-59152, JENKINS-17116, jenkinsci/jenkins#4216, jenkinsci/jenkins#4225
oleg-nenashev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will improve the situation.
We will likely need to mention it in upgrade guidelines for LTS (CC @daniel-beck ) so that users of long timeout get a warning about the change, but I am fine with the change per se. Our process management is still big area for improvement
|
I will keep it for the next weekly so that reviewers have more time to react if they disagree with this change |
|
I plan to merge it tomorrow if no negative feedback |
See JENKINS-59152 and analysis done by @slonopotamus.
Let's reduce the default value for soft-kill to 5 seconds to avoid interference by other mechanisms that want to avoid stalling and stuck processes.
Proposed changelog entries
Desired reviewers
@oleg-nenashev @slonopotamus