Skip to content

Conversation

@stephanreiter
Copy link

See JENKINS-59152 and analysis done by @slonopotamus.

Let's reduce the default value for soft-kill to 5 seconds to avoid interference by other mechanisms that want to avoid stalling and stuck processes.

Proposed changelog entries

Desired reviewers

@oleg-nenashev @slonopotamus

@MRamonLeon MRamonLeon added the needs-more-reviews Complex change, which would benefit from more eyes label Sep 26, 2019
@rsandell
Copy link
Member

The original issue that introduced it (JENKINS-17116) suggested 5 seconds, so we need to analyze/understand why it was changed to 2 minutes.

@daniel-beck
Copy link
Member

we need to analyze/understand why it was changed to 2 minutes

#3414 (comment) implemented it like that (and has drawn quite a lot of attention for this decision since).

slonopotamus added a commit to slonopotamus/workflow-durable-task-step-plugin that referenced this pull request Sep 27, 2019
Test currently passes. This is caused by the fact that Jenkins test framework ignores exceptions happening during build process.

However, if you look at test output, you'll notice all bad effects described in JENKINS-59152:

1. Pinging continues for a long time after "Sending interrupt signal to process"
2. FileMonitoringTask$FileMonitoringController.cleanup fails to remove a directory because 20s timer in DurableTaskStep.Execution#stop works independently of SoftKillWaitSeconds and doesn't properly wait until processes are killed
3. There's InterruptedException because WindowsOSProcess#killSoftly was interrupted by timeout in CpsThread#stop
4. There's IllegalStateException from CpsStepContext#completed because CpsThread#stop attempted to complete step after it was already completed by stopTask in DurableTaskStep.Execution

References: JENKINS-59152, JENKINS-17116, jenkinsci/jenkins#4216, jenkinsci/jenkins#4225
slonopotamus added a commit to slonopotamus/workflow-durable-task-step-plugin that referenced this pull request Sep 27, 2019
Test currently passes. This is caused by the fact that Jenkins test framework ignores exceptions happening during build process.

However, if you look at test output, you'll notice all bad effects described in JENKINS-59152:

1. Pinging continues for a long time after "Sending interrupt signal to process"
2. FileMonitoringTask$FileMonitoringController.cleanup fails to remove a directory because 20s timer in DurableTaskStep.Execution#stop works independently of SoftKillWaitSeconds and doesn't properly wait until processes are killed
3. There's InterruptedException because WindowsOSProcess#killSoftly was interrupted by timeout in CpsThread#stop
4. There's IllegalStateException from CpsStepContext#completed because CpsThread#stop attempted to complete step after it was already completed by stopTask in DurableTaskStep.Execution

References: JENKINS-59152, JENKINS-17116, jenkinsci/jenkins#4216, jenkinsci/jenkins#4225
Copy link
Member

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will improve the situation.
We will likely need to mention it in upgrade guidelines for LTS (CC @daniel-beck ) so that users of long timeout get a warning about the change, but I am fine with the change per se. Our process management is still big area for improvement

@oleg-nenashev oleg-nenashev added ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted and removed needs-more-reviews Complex change, which would benefit from more eyes labels Sep 28, 2019
@oleg-nenashev oleg-nenashev changed the title Reduce default SoftKillWaitSeconds from 2min to 5s [JENKINS-59152] - Reduce the default process soft-kill timeout from 2 minutes to 5 seconds Sep 28, 2019
@oleg-nenashev oleg-nenashev added the on-hold This pull request depends on another event/release, and it cannot be merged right now label Sep 28, 2019
@oleg-nenashev
Copy link
Member

I will keep it for the next weekly so that reviewers have more time to react if they disagree with this change

@timja timja closed this Sep 29, 2019
@timja timja reopened this Sep 29, 2019
@oleg-nenashev oleg-nenashev removed the on-hold This pull request depends on another event/release, and it cannot be merged right now label Sep 30, 2019
@oleg-nenashev
Copy link
Member

I plan to merge it tomorrow if no negative feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants