Skip to content

Conversation

@svanoort
Copy link
Member

@svanoort svanoort commented Mar 1, 2018

Proposed solution to error like this, encountered when statusCode file exists but is empty, potentially when created but not written to yet, or not written fully:

java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:592)
at java.lang.Integer.parseInt(Integer.java:615)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.exitStatus(FileMonitoringTask.java:168)
Caused: java.io.IOException: corrupted content in $SOMEPLACE
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.exitStatus(FileMonitoringTask.java:170)
at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:211)

Variant of JENKINS-25519

This may not be the best solution to the problem in my opinion but should generally solve it.

Other options:

  • Return null until the file parses cleanly (normal timeout will apply once the launching script dies and log file no longer is touched).
  • Write and look for a delimiter on the exitStatus file to indicate that the file is fully written (avoids theoretical issues with only '1' being written when status code might actually be '102' or something).

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe enough, if the problem was in fact diagnosed correctly. I swear there was some earlier attempt to fix this, but I cannot find it. JENKINS-25519 describes a similar-sounding case but note that the exception is repeatedly thrown—meaning that the file was created but stayed empty, for some reason. So this should perhaps be considered only a hotfix:

  • root cause not obvious (is it really just a simple race condition?)
  • caller should perhaps treat this as fatal

@Override public Integer exitStatus(FilePath workspace, Launcher launcher, TaskListener listener) throws IOException, InterruptedException {
FilePath status = getResultFile(workspace);
if (status.exists()) {
if (status.exists() && status.length() > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this alone would fix the issue, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, yes, if there's no bogus whitespace or other stuff in there.

@jglick jglick changed the title If unable to read exitStatus file then wait briefly and retry one more time to allow write to complete [JENKINS-25519] If unable to read exitStatus file then wait briefly and retry one more time to allow write to complete Mar 1, 2018
@jglick
Copy link
Member

jglick commented Mar 1, 2018

Ah, maybe I was thinking of #37—similar problem (at least on the face of it), different file.

If the problem is indeed a simple race condition, tricks like this are not the best fix. Rather, the wrapper script should be changed from

; echo $? > …/exit

to

…; echo $? > …/exit.tmp; mv …/exit.tmp …/exit

which ought to be atomic.

@svanoort
Copy link
Member Author

svanoort commented Mar 5, 2018

@jglick Has requested a test-comment, please ignore this. @cloudbees/team-arc

@svanoort
Copy link
Member Author

svanoort commented Mar 5, 2018

Closing in favor of #66

@svanoort svanoort closed this Mar 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants