Skip to content

Conversation

@slonopotamus
Copy link
Contributor

Sending CTRL+C to the process was not a proper way to terminate processes. Many programs,
most importantly cmd.exe, do not terminate if you send them CTRL+C. cmd.exe asks for Y/N confirmation
when you send it a CTRL+C. Instead, always use TerminateProcess WinAPI call now.

This commit partially reverts changes from 31cd4d3

This commit doesn't fully fix JENKINS-59152 because there is still a race between process killing and REMOTE_TIMEOUT timer in DurableTaskStep. This problem needs to be fixed in workflow-durable-task-step-plugin.

I'm not sure how to write a test for this change. What I really care about is that bat/sh pipeline steps from workflow-durable-task-step-plugin are aborted properly. This is tested by ShellStepTest.abort test and it does fail currently on Windows. However, due to #4155, Jenkins tests are not run on Windows since August. If you have any ideas hot to write a test for this within Jenkins Core, please tell me.

Also, note that even though 31cd4d3 was commited more than a year ago, JENKINS-17166 was not resolved. Worse, people continue to complain there that process killing doesn't work as expected for them.

Additional thought: hudson.util.ProcessTree.WindowsOSProcess#getParent is not implemented on Windows that causes unprediclable process termination order.

Also, ShellStepTest.abort from workflow-durable-task-step-plugin has a race condition that I am fixing in jenkinsci/workflow-durable-task-step-plugin#118

Pinging @stephanreiter, an author of changes in 31cd4d3 that I'm reverting.

Uh. All this situation is so much messed up :( Hope I provided enough info here so whoever reviews these changes will be able to get the full picture.

Proposed changelog entries

…on Windows reliably

Sending CTRL+C to the process was not a proper way to terminate processes. Many programs,
most importantly cmd.exe, do not terminate if you send them CTRL+C. cmd.exe asks for Y/N confirmation
when you send it a CTRL+C. Instead, always use TerminateProcess WinAPI call.

This commit partially reverts changes from 31cd4d3

Note that this commit doesn't fully fix JENKINS-59152 because there is still a race between process killing
and REMOTE_TIMEOUT timer in DurableTaskStep. This problem needs to  be fixed in workflow-durable-task-step-plugin.
@stephanreiter
Copy link

Sending Ctrl+C alone is not a proper way to stop a process reliably, indeed. It's the first step only and basically asking the process kindly to terminate. In case it doesn't react to Ctrl+C within a reasonable amount of time, TerminateProcess is the next and final step: Windows will terminate the process, it won't have a chance to react.

That's what was intended to be implemented in the commit you are partially reverting now:

  1. killSoftly()
  2. kill()

On Linux, we'd first send SIGTERM and then SIGKILL.

If that sequence doesn't work for some people, then I'd suggest to spend time on investigating it instead of going with what's definitely the wrong approach: Only using TerminateProcess, which doesn't allow graceful shutdowns.

@slonopotamus
Copy link
Contributor Author

slonopotamus commented Sep 23, 2019

Hmm...

  1. Do you know how does taskkill /pid <pid> work? Maybe we could use that as a graceful termination.
  2. CTRL+C is an equivalent of SIGINT, this is not SIGTERM that we actually need. As I already said, cmd.exe does not exit by CTRL+C.
  3. Your change made things much worse than they were due to how your 2min timeout interacts with 20s timeout in DurableTaskStep that handles bat in pipelines. And with a high chance, aborting a bat <whatever> pipeline hits cmd.exe, because there are two of them involved, so you wait for 2mins.

@slonopotamus
Copy link
Contributor Author

Okaaay, taskkill /pid cannot kill cmd /c <whatever>:

>taskkill /pid <pid>
ERROR: The process with PID <pid> could not be terminated.
Reason: This process can only be terminated forcefully (with /F option).

I need to think a bit what to do with all this mess.

@slonopotamus
Copy link
Contributor Author

Just to make things clear: current Jenkins code fails to properly abort trivial bat pipeline.

It is not about "doesn't work for some people", it is "pipelines do not reliably abort on Windows". I believe virtually all pipelines doing something useful on Windows will contain at least one bat step.

@oleg-nenashev oleg-nenashev self-requested a review September 24, 2019 06:19
@stephanreiter
Copy link

How about reducing the time between Ctrl+C and TerminateProcess to like 10 seconds. That may be more in line with expectations.

@slonopotamus
Copy link
Contributor Author

slonopotamus commented Sep 24, 2019

Okaaaaay, there's one more timer.

Here's the callstack of how killSoftly is invoked:

	at hudson.util.ProcessTree$WindowsOSProcess.killSoftly(ProcessTree.java:563) <------------- here you're waiting for reaction to CTRL+C before calling TerminateProcess
	at hudson.util.ProcessTree$WindowsOSProcess.killRecursively(ProcessTree.java:523)
	at hudson.util.ProcessTree$Windows.killAll(ProcessTree.java:669)
	at hudson.Launcher$LocalLauncher.kill(Launcher.java:956)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.stop(FileMonitoringTask.java:345)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.stop(DurableTaskStep.java:512)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.stop(CpsThread.java:308) <---------------- Attention here!
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:1151)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:1140)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:907)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	at java.util.concurrent.FutureTask.run(FutureTask.java)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	at java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

In CpsThread.stop you can see a lovely 30s timeout that happily interrupts your 2min timer by raising InterruptException so you neither hard-kill problematic process that didn't shutdown by CTRL+C nor attempt to kill other processes in the same tree at all:


        try (Timeout timeout = Timeout.limit(30, TimeUnit.SECONDS)) {
            s.stop(t);
        } catch (Exception e) {
            t.addSuppressed(e);
            s.getContext().onFailure(t);
        }

@stephanreiter

So, I claim that 2min soft-kill timer totally breaks bat aborting given how all other code that handles bat is organized.

30s timeout in CpsThread was added here.

@stephanreiter
Copy link

Nice find! So there's some coordination needed between these kill timers or that CpsThread thing should go away. As an immediate remedy, you could set the default soft-kill timeout to 10secs.

@slonopotamus
Copy link
Contributor Author

Closing this PR. With info found so far it seems to be possible to fix stuff without touching Jenkins Core. Although I believe that 2min timer is too long.

slonopotamus added a commit to slonopotamus/workflow-durable-task-step-plugin that referenced this pull request Sep 27, 2019
Test currently passes. This is caused by the fact that Jenkins test framework ignores exceptions happening during build process.

However, if you look at test output, you'll notice all bad effects described in JENKINS-59152:

1. Pinging continues for a long time after "Sending interrupt signal to process"
2. FileMonitoringTask$FileMonitoringController.cleanup fails to remove a directory because 20s timer in DurableTaskStep.Execution#stop works independently of SoftKillWaitSeconds and doesn't properly wait until processes are killed
3. There's InterruptedException because WindowsOSProcess#killSoftly was interrupted by timeout in CpsThread#stop
4. There's IllegalStateException from CpsStepContext#completed because CpsThread#stop attempted to complete step after it was already completed by stopTask in DurableTaskStep.Execution

References: JENKINS-59152, JENKINS-17116, jenkinsci/jenkins#4216, jenkinsci/jenkins#4225
slonopotamus added a commit to slonopotamus/workflow-durable-task-step-plugin that referenced this pull request Sep 27, 2019
Test currently passes. This is caused by the fact that Jenkins test framework ignores exceptions happening during build process.

However, if you look at test output, you'll notice all bad effects described in JENKINS-59152:

1. Pinging continues for a long time after "Sending interrupt signal to process"
2. FileMonitoringTask$FileMonitoringController.cleanup fails to remove a directory because 20s timer in DurableTaskStep.Execution#stop works independently of SoftKillWaitSeconds and doesn't properly wait until processes are killed
3. There's InterruptedException because WindowsOSProcess#killSoftly was interrupted by timeout in CpsThread#stop
4. There's IllegalStateException from CpsStepContext#completed because CpsThread#stop attempted to complete step after it was already completed by stopTask in DurableTaskStep.Execution

References: JENKINS-59152, JENKINS-17116, jenkinsci/jenkins#4216, jenkinsci/jenkins#4225
@slonopotamus slonopotamus deleted the killing-me-softly branch September 27, 2019 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants