-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
[JENKINS-17116] - When aborting a build, wait up to 2min for process termination #3414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Maybe making the 30secs configurable (per project?) is a good idea. |
|
In Line 558 of ProcessTree.java we should replace "java" with the actual path to the java.exe that creates the jvm the code is running in, i.e. property java.home ... |
Seems like one of those options that clutter up the UI further. Possibly good enough for a system property, but in general we should "just" get something like this right. |
|
Good point. 30secs might be too short. At work we are using scons as the build system and when you abort a build it sometimes sits there for a minute updating its DB on disk. So maybe 5mins are a safer choice here ... |
|
I have no clue why test "Linux / Linux Publishing / considersKillingVetos – hudson.util.ProcessTreeKillerTest" should fail. On Linux, we are now merely waiting after the SIGTERM and not sending out any extra SIGTERMS. |
|
Same test failure is observed in another pull request (#3417), so not introduced here. |
|
@stephanreiter test failure was fixed in #3419, looks like the build is good now. |
|
Cool! I think I am quite happy with the change now. Please take a look. |
| return; | ||
|
|
||
| LOGGER.log(FINER, "Killing recursively {0}", getPid()); | ||
| killSoftly(getPid()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll send Ctrl+C only to the root of a process tree. That's alright for my use case.
In case every process in the tree should receive Ctrl+C, we should change the implementation of WinProcess and do Ctrl+C sending in killRecursively and kill (should then pass a timeout to it).
|
Can we proceed with this please? |
| private void killSoftly(int pid) { | ||
| // send Ctrl+C to the process in the first iteration | ||
| // after that just wait for it to cease to exist | ||
| for (int i = 0; i < softkillWaitSeconds; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good comment form selckin on IRC: better to calculate the deadline = nowtimestamp+softkillWaitSeconds) and loop while (nowtimestamp < deadline)
| LOGGER.fine("Killing pid="+pid); | ||
| UnixReflection.destroy(pid); | ||
| // after sending SIGTERM, wait for the process to cease to exist | ||
| for (int i = 0; i < softkillWaitSeconds; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good comment form selckin on IRC: better to calculate the deadline = nowtimestamp+softkillWaitSeconds) and loop while (nowtimestamp < deadline)
| catch (Exception e) { | ||
| break; | ||
| } | ||
| } while(System.currentTimeMillis() < deadline); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to use System.nanoTime, not currentTimeMillis. See Javadoc for why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that nanoTime is immune to system clock changes and hence preferred for elapsed-time calculations. Will fix!
| classPath = new ArrayList<String>(); | ||
| for (final String resourcePath : resourcePaths) { | ||
| if (resourcePath.contains("jenkins-core") && resourcePath.endsWith(".jar")) { | ||
| String jarPath = servletContext.getRealPath(resourcePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not smell right. Have you even tested this when using remote agents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this with a local installation only. The issue to solve is to start a new process which attaches to the process we want to send Ctrl+C to. I'd be glad for suggestions since you have an elaborate nose.
|
I suspect this entire PR is in the wrong repository. See: https://github.com/kohsuke/winp |
|
The entire PR is certainly not in the wrong repository. The windows-specific bits could be moved into winp; linux-specific bits cannot. |
|
I'll try to move the killSoftly method into winp. There I will need to execute a function in the winp.dll in a new process to send ctrl+c. rundll should allow that ... we'll see. |
|
Alright, working on Ctrl+C sending on Windows via WinP here: jenkinsci/winp#49 |
|
@stephanreiter will try to take a look ASAP |
|
Upgraded this change here based on what I am hoping will make it into WinP. |
|
@oleg-nenashev after you have released a new version of WinP, could you help me update this pull request here to include the new WinP, please? |
|
@stephanreiter FYI I have pushed a commit to retrigger the build after fixing Maven Central staging |
|
Seems to build now, yippie! I added exception catching around sendCtrlC, i.e. if Ctrl+C couldn't be sent then we just proceed with the regular hard kill. Exception catching became necessary now that sendCtrlC throws if something goes wrong instead of merely returning false. |
|
Only looked at this superficially, but I regularly press 'abort build' again if I don't see anything happening. If this will affect abort duration, there should be a message logged that it might take a while. |
|
This change here gives processes a chance to execute a clean shutdown. Depending on the executable, aborting will therefore take longer than before. At work, we use SCons for building and it sometimes takes a few seconds to shutdown. (FInally, though, SCons will cleanly shutdown; that was my motivation for fixing the referenced issue.) Logging when process termination starts and ends sounds like good feedback to the user. I don't know how to do that, though. Just regular logging on a certain loglevel? |
|
The nearest http://javadoc.jenkins-ci.org/hudson/model/TaskListener.html (or rather its logger) would receive such messages. |
|
@daniel-beck I don't know how to get to a relevant TaskListener logger at the time the abort action is triggered by the user. The callstack for that even is: I pushed a change that allows for tweaking of the softkill time via a system property and makes sure that when killing a processtree on Linux, the overall operation won't take longer than the softkill time (i.e. all process in the tree share the 2min default softkill time, it is no longer per process in the tree -- thanks to Nick Talbot for pointing that out!!). |
… and share time among processes when terminating a tree on linux
|
@oleg-nenashev @daniel-beck @dwnusbaum can you help me finish this PR, please? |
|
@stephanreiter If you are fine with that, I will try to add logging on the top of your pull-request (today on the evening, I'd guess) |
|
I would very much appreciate that, @oleg-nenashev . Thank you so much, for this and the work you put into the PR already! |
|
Didn't get to it on the weekend, sorry. I will see if I have some time this week |
|
@daniel-beck I have spent few hours yesterday to investigate your proposal w.r.t
I have started assembling some API changes to enable I propose to...
@daniel-beck WDYT? |
|
If it's not possible then we'll need to accept this. Thanks for investigating. |
|
Created https://issues.jenkins-ci.org/browse/JENKINS-53373 as a follow-up for logging |
|
@jenkinsci/sig-platform Just FYI, here we are dropping support of Windows 2000 (as discussed at the first meeting). |
|
+1 |
1 similar comment
|
+1 |
|
https://issues.jenkins-ci.org/browse/JENKINS-55106 indicates the lack of logging is a problem, and 2 minutes is excessive, as it seems consecutive when multiple processes are involved. |
|
Lack of information in the UI is indeed a problem. |
Lack of logging means this isn't discoverable however, so it might as well not exist. |
|
Until logging is implemented, maybe the default timeout should be set very short then, to avoid regressions? As long as those of us who need a long timeout can set a long timeout explicitly, that shouldn't be a problem. |
|
@stephanreiter https://issues.jenkins-ci.org/browse/JENKINS-54502 suggests that this change might have introduced a bug with aborting builds on Windows masters. |
|
2min timeout introduced here doesn't actually work. Instead, Jenkins waits only for 20s. See https://issues.jenkins-ci.org/browse/JENKINS-59152 |
See JENKINS-17116.
When a build is aborted by the user, Jenkins will now gracefully terminate involved processes by giving it up to 30 seconds time to exit after having received SIGTERM (on Linux) or Ctrl+C on Windows.
Proposed Changelog entries
SoftKillWaitSecondssystem property, the default value is 2 minutes (for the entire process tree)@jenkinsci/code-reviewers