Skip to content

Conversation

@svanoort
Copy link
Member

@svanoort svanoort commented Mar 3, 2018

Superior version of #65

Proposed solution to error like this, encountered when statusCode file exists but is empty, potentially when created but not written to yet, or not written fully. Was confirmed that previous version removed the errors with the "corrupt" status file content and greatly reduced failure rates.

java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:592)
at java.lang.Integer.parseInt(Integer.java:615)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.exitStatus(FileMonitoringTask.java:168)
Caused: java.io.IOException: corrupted content in $SOMEPLACE
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.exitStatus(FileMonitoringTask.java:170)
at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:211)

Variant of JENKINS-25519

Copy link
Member

@abayer abayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@svanoort
Copy link
Member Author

svanoort commented Mar 5, 2018

@reviewbybees

FilePath logFile = c.getLogFile(ws);
FilePath resultFile = c.getResultFile(ws);
if (resultFile.exists()) {
resultFile.delete(); // Maybe overly cautious, but better safe than sorry, similarly we should make sure no duplicate logfile?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the whole control directory should not exist before we start, or should be empty.

return controlDir(ws).child("pid");
}

// TODO run as one big MasterToSlaveCallable<Integer> to avoid extra network roundtrips
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irrelevant after #60.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming #60 is fundamentally reliable. We can always remove the comment if it tests clean. And we may have to do yet another emergency fix due to something introduced in #49 before that gets integrated.

listener.getLogger().println("still have " + pidFile + " so heartbeat checks unreliable; process may or may not be alive");
} else {
listener.getLogger().println("wrapper script does not seem to be touching the log file in " + controlDir);
listener.getLogger().println("s " + controlDir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh…?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm, I have no idea where that came from. Best guess It was a hotkeying error somewhere, but fixing now.

} catch (NumberFormatException x) {
throw new IOException("corrupted content in " + status + ": " + x, x);
/** Avoids excess round-tripping when reading status file. */
static class StatusCheck extends MasterToSlaveFileCallable<Integer> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just gratuitous merge conflict creation against #60. Better to keep the patch short and to the point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have one particular user who has longer latencies and this change likely helps keep them under the timeouts - thus it is hardly "gratuitous."

public Integer invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {
if (f.exists() && f.length() > 0) {
try {
String fileString = Files.readFirstLine(f, Charset.defaultCharset());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway you could just get rid of the exists check and use a single readToString call, catching FileNotFoundException and ignoring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's all agent-local then it doesn't really matter, does it? Also: as a rule of thumb in Java one should not be throwing Exceptions to signal routine and expected conditions. They come with a higher than normal level of baggage.

As opposed to Python where that's the idiomatic way to signal things like an Iterator being done.

Copy link
Member

@jglick jglick Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, packaging all of this up in a Remoting call is a pretty high level of baggage, which would swamp any tiny overhead of Throwable.fillInStackTrace in this case.

}
}

static final StatusCheck STATUS_CHECK_INSTANCE = new StatusCheck();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know act is doing object allocation anyway, right?

String quotedResultFile = quote(c.getResultFile(ws));
if (capturingOutput) {
cmd = String.format("@echo off \r\ncmd /c \"\"%s\"\" > \"%s\" 2> \"%s\"\r\necho %%ERRORLEVEL%% > \"%s\"\r\n",
cmd = String.format("@echo off \r\ncmd /c \"\"%s\"\" > \"%s\" 2> \"%s\"\r\necho %%ERRORLEVEL%% > \"%s.tmp\"\r\nmove \"%s.tmp\" \"%s\"\r\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And PowerShell?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Powershell is caught by the 0-length check -- I have no idea how we'd even begin applying this to it and was planning to ask @gabloe or @jtnord because they know the Windows side of things.

Can at least still write a Batch script though, so I included that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sensible.

public Integer invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {
if (f.exists() && f.length() > 0) {
try {
String fileString = Files.readFirstLine(f, Charset.defaultCharset());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note: use of defaultCharset could theoretically cause issues on z/OS. Since we are expecting this file to be ASCII (it should in fact just contain [0-9]+), it is safer and clearer to use StandardCharsets.ASCII.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A really critical point - it looks like this is already broken, per #28 -- I think we'd need to address it in more comprehensive work to get it to play nicely in any case.

@svanoort svanoort merged commit 21f570e into jenkinsci:master Mar 6, 2018
@svanoort svanoort deleted the better-fix-to-exitStatus branch March 6, 2018 14:48
jglick added a commit to jglick/durable-task-plugin that referenced this pull request Mar 6, 2018
jglick added a commit to jglick/durable-task-plugin that referenced this pull request Mar 6, 2018
svanoort added a commit to svanoort/durable-task-plugin that referenced this pull request Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants