-
Notifications
You must be signed in to change notification settings - Fork 96
Better fix to [JENKINS-25519] -- avoid errors when trying to read exitStatus from durable task before finished writing #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
abayer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@reviewbybees |
| FilePath logFile = c.getLogFile(ws); | ||
| FilePath resultFile = c.getResultFile(ws); | ||
| if (resultFile.exists()) { | ||
| resultFile.delete(); // Maybe overly cautious, but better safe than sorry, similarly we should make sure no duplicate logfile? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the whole control directory should not exist before we start, or should be empty.
| return controlDir(ws).child("pid"); | ||
| } | ||
|
|
||
| // TODO run as one big MasterToSlaveCallable<Integer> to avoid extra network roundtrips |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Irrelevant after #60.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| listener.getLogger().println("still have " + pidFile + " so heartbeat checks unreliable; process may or may not be alive"); | ||
| } else { | ||
| listener.getLogger().println("wrapper script does not seem to be touching the log file in " + controlDir); | ||
| listener.getLogger().println("s " + controlDir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh…?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erm, I have no idea where that came from. Best guess It was a hotkeying error somewhere, but fixing now.
| } catch (NumberFormatException x) { | ||
| throw new IOException("corrupted content in " + status + ": " + x, x); | ||
| /** Avoids excess round-tripping when reading status file. */ | ||
| static class StatusCheck extends MasterToSlaveFileCallable<Integer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just gratuitous merge conflict creation against #60. Better to keep the patch short and to the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have one particular user who has longer latencies and this change likely helps keep them under the timeouts - thus it is hardly "gratuitous."
| public Integer invoke(File f, VirtualChannel channel) throws IOException, InterruptedException { | ||
| if (f.exists() && f.length() > 0) { | ||
| try { | ||
| String fileString = Files.readFirstLine(f, Charset.defaultCharset()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway you could just get rid of the exists check and use a single readToString call, catching FileNotFoundException and ignoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's all agent-local then it doesn't really matter, does it? Also: as a rule of thumb in Java one should not be throwing Exceptions to signal routine and expected conditions. They come with a higher than normal level of baggage.
As opposed to Python where that's the idiomatic way to signal things like an Iterator being done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, packaging all of this up in a Remoting call is a pretty high level of baggage, which would swamp any tiny overhead of Throwable.fillInStackTrace in this case.
| } | ||
| } | ||
|
|
||
| static final StatusCheck STATUS_CHECK_INSTANCE = new StatusCheck(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know act is doing object allocation anyway, right?
| String quotedResultFile = quote(c.getResultFile(ws)); | ||
| if (capturingOutput) { | ||
| cmd = String.format("@echo off \r\ncmd /c \"\"%s\"\" > \"%s\" 2> \"%s\"\r\necho %%ERRORLEVEL%% > \"%s\"\r\n", | ||
| cmd = String.format("@echo off \r\ncmd /c \"\"%s\"\" > \"%s\" 2> \"%s\"\r\necho %%ERRORLEVEL%% > \"%s.tmp\"\r\nmove \"%s.tmp\" \"%s\"\r\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And PowerShell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sensible.
| public Integer invoke(File f, VirtualChannel channel) throws IOException, InterruptedException { | ||
| if (f.exists() && f.length() > 0) { | ||
| try { | ||
| String fileString = Files.readFirstLine(f, Charset.defaultCharset()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor note: use of defaultCharset could theoretically cause issues on z/OS. Since we are expecting this file to be ASCII (it should in fact just contain [0-9]+), it is safer and clearer to use StandardCharsets.ASCII.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A really critical point - it looks like this is already broken, per #28 -- I think we'd need to address it in more comprehensive work to get it to play nicely in any case.
…able hang bugs on windows
Superior version of #65
Proposed solution to error like this, encountered when statusCode file exists but is empty, potentially when created but not written to yet, or not written fully. Was confirmed that previous version removed the errors with the "corrupt" status file content and greatly reduced failure rates.
Variant of JENKINS-25519