-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
[JENKINS-17116][JENKINS-59152] Fix build abort not killing processes on Windows reliably #4216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…on Windows reliably Sending CTRL+C to the process was not a proper way to terminate processes. Many programs, most importantly cmd.exe, do not terminate if you send them CTRL+C. cmd.exe asks for Y/N confirmation when you send it a CTRL+C. Instead, always use TerminateProcess WinAPI call. This commit partially reverts changes from 31cd4d3 Note that this commit doesn't fully fix JENKINS-59152 because there is still a race between process killing and REMOTE_TIMEOUT timer in DurableTaskStep. This problem needs to be fixed in workflow-durable-task-step-plugin.
|
Sending Ctrl+C alone is not a proper way to stop a process reliably, indeed. It's the first step only and basically asking the process kindly to terminate. In case it doesn't react to Ctrl+C within a reasonable amount of time, TerminateProcess is the next and final step: Windows will terminate the process, it won't have a chance to react. That's what was intended to be implemented in the commit you are partially reverting now:
On Linux, we'd first send SIGTERM and then SIGKILL. If that sequence doesn't work for some people, then I'd suggest to spend time on investigating it instead of going with what's definitely the wrong approach: Only using TerminateProcess, which doesn't allow graceful shutdowns. |
|
Hmm...
|
|
Okaaay, I need to think a bit what to do with all this mess. |
|
Just to make things clear: current Jenkins code fails to properly abort trivial It is not about "doesn't work for some people", it is "pipelines do not reliably abort on Windows". I believe virtually all pipelines doing something useful on Windows will contain at least one |
|
How about reducing the time between Ctrl+C and TerminateProcess to like 10 seconds. That may be more in line with expectations. |
|
Okaaaaay, there's one more timer. Here's the callstack of how In So, I claim that 2min soft-kill timer totally breaks 30s timeout in CpsThread was added here. |
|
Nice find! So there's some coordination needed between these kill timers or that CpsThread thing should go away. As an immediate remedy, you could set the default soft-kill timeout to 10secs. |
|
Closing this PR. With info found so far it seems to be possible to fix stuff without touching Jenkins Core. Although I believe that 2min timer is too long. |
Test currently passes. This is caused by the fact that Jenkins test framework ignores exceptions happening during build process. However, if you look at test output, you'll notice all bad effects described in JENKINS-59152: 1. Pinging continues for a long time after "Sending interrupt signal to process" 2. FileMonitoringTask$FileMonitoringController.cleanup fails to remove a directory because 20s timer in DurableTaskStep.Execution#stop works independently of SoftKillWaitSeconds and doesn't properly wait until processes are killed 3. There's InterruptedException because WindowsOSProcess#killSoftly was interrupted by timeout in CpsThread#stop 4. There's IllegalStateException from CpsStepContext#completed because CpsThread#stop attempted to complete step after it was already completed by stopTask in DurableTaskStep.Execution References: JENKINS-59152, JENKINS-17116, jenkinsci/jenkins#4216, jenkinsci/jenkins#4225
Test currently passes. This is caused by the fact that Jenkins test framework ignores exceptions happening during build process. However, if you look at test output, you'll notice all bad effects described in JENKINS-59152: 1. Pinging continues for a long time after "Sending interrupt signal to process" 2. FileMonitoringTask$FileMonitoringController.cleanup fails to remove a directory because 20s timer in DurableTaskStep.Execution#stop works independently of SoftKillWaitSeconds and doesn't properly wait until processes are killed 3. There's InterruptedException because WindowsOSProcess#killSoftly was interrupted by timeout in CpsThread#stop 4. There's IllegalStateException from CpsStepContext#completed because CpsThread#stop attempted to complete step after it was already completed by stopTask in DurableTaskStep.Execution References: JENKINS-59152, JENKINS-17116, jenkinsci/jenkins#4216, jenkinsci/jenkins#4225
Sending CTRL+C to the process was not a proper way to terminate processes. Many programs,
most importantly cmd.exe, do not terminate if you send them CTRL+C. cmd.exe asks for Y/N confirmation
when you send it a CTRL+C. Instead, always use TerminateProcess WinAPI call now.
This commit partially reverts changes from 31cd4d3
This commit doesn't fully fix JENKINS-59152 because there is still a race between process killing and
REMOTE_TIMEOUTtimer inDurableTaskStep. This problem needs to be fixed in workflow-durable-task-step-plugin.I'm not sure how to write a test for this change. What I really care about is that
bat/shpipeline steps from workflow-durable-task-step-plugin are aborted properly. This is tested byShellStepTest.aborttest and it does fail currently on Windows. However, due to #4155, Jenkins tests are not run on Windows since August. If you have any ideas hot to write a test for this within Jenkins Core, please tell me.Also, note that even though 31cd4d3 was commited more than a year ago, JENKINS-17166 was not resolved. Worse, people continue to complain there that process killing doesn't work as expected for them.
Additional thought:
hudson.util.ProcessTree.WindowsOSProcess#getParentis not implemented on Windows that causes unprediclable process termination order.Also,
ShellStepTest.abortfrom workflow-durable-task-step-plugin has a race condition that I am fixing in jenkinsci/workflow-durable-task-step-plugin#118Pinging @stephanreiter, an author of changes in 31cd4d3 that I'm reverting.
Uh. All this situation is so much messed up :( Hope I provided enough info here so whoever reviews these changes will be able to get the full picture.
Proposed changelog entries