-
Notifications
You must be signed in to change notification settings - Fork 200
Fix a whole wad of Resume issues and add a bunch of diagnostic logging for FlowHead problems etc #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…smatches in the final FlowNode
abayer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not explicitly approving since this is still WIP and I don't want to confuse things for myself later, but looks good so far.
| * and for those we don't want to load the program state, so that check should be efficient. | ||
| */ | ||
| private boolean done; | ||
| Boolean done; // Only non-private for unit test use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extraneous leading space after indent? =)
| return iota.get(); | ||
| } | ||
|
|
||
| /** For diagnostic purposes only. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I personally appreciate it when diagnostics methods like this have javadoc or comments giving a quick explanation of what their diagnostics are meant to show - not a blocker by any means, just a casual request.
|
|
||
| if (this.heads != null && this.heads.size() > 0) { | ||
| if (LOGGER.isLoggable(Level.INFO)) { | ||
| LOGGER.log(Level.INFO, "Resetting heads to rebuild the Pipeline structure, tossing existing heads: "+getHeadsAsString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooo, this is a nice addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here would reduce to just
if (heads != null && !heads.isEmpty()) {
LOGGER.log(Level.INFO, "Resetting heads to rebuild the Pipeline structure, tossing existing heads: {0}", heads);
heads.clear();
}| } | ||
|
|
||
| @Override | ||
| @SuppressFBWarnings(value = "RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN", justification = "We want to explicitly check for boolean not-null and true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it wrong that I love that with Boolean, you effectively can treat it as a, I dunno, troolean? What'd be the term for a boolean but for three possible states... =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A trilean, acc. to Wikipedia.
| /** Report a fatal error in the VM. */ | ||
| void croak(Throwable t) { | ||
| setResult(Result.FAILURE); | ||
| done = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Boolean.TRUE above but true here? Might be handy to mention why you're using one vs the other in one case (probably the Boolean.TRUE case, since that's not something you see as often).
| writeChild(w, context, "start", st.getId(), String.class); | ||
| } | ||
| if (e.done != null) { | ||
| writeChild(w, context, "done", e.done, Boolean.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glee I can't remember off the top of my head why I care about having done persisted, but I know I do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long as it is making some test pass, fine with me…
|
|
||
| void setNewHead(FlowNode v) { | ||
| void setNewHead(@Nonnull FlowNode v) { | ||
| if (v == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my brain briefly melts at the combination of @Nonnull and if (v == null) - I'm assuming this is defense-in-depth, with the @Nonnull covering compile-time scenarios and the if (v == null) covering even edgier-case scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's protecting against some things that shouldn't happen but yet were still happening in the wild, even though Findbugs didn't find issues at compile-time (probably it wasn't digging deeply enough).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I have no problem with that approach - probably worth adding a comment explaining that, though. It'd be awfully easy for a later maintainer (i.e., me, because I'm forgetful) to not know/remember that and just think "Oh, that's a pointless null check, lemme clear the dead code".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, this is package-private so we can trivially enumerate all the call sites and put in assertions there that the value is non-null unless that is trivially seen from code inspection, right?
| } catch (IOException e) { | ||
| LOGGER.log(Level.FINE, "Failed to record new head: " + v, e); | ||
| } | ||
| v.addAction(new TimingAction()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the idea here is that we want to still do the TimingAction, notifyNewHead, etc, even if we can't store the node?
pom.xml
Outdated
| <groupId>org.jenkins-ci.plugins.workflow</groupId> | ||
| <artifactId>workflow-job</artifactId> | ||
| <version>2.17</version> | ||
| <version>2.18-20180329.215807-4</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
polite to link to the upstream in a comment so we do not lose track
| * and for those we don't want to load the program state, so that check should be efficient. | ||
| */ | ||
| private boolean done; | ||
| Boolean done; // Only non-private for unit test use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best to document what a null value means here.
| } else if (myHeads.size() == 0) { | ||
| return "empty-heads"; | ||
| } else { | ||
| return myHeads.entrySet().stream().map(h->h.getKey()+"::"+h.getValue()).collect(Collectors.joining(",")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
myHeads.toString not good enough??
|
|
||
| if (this.heads != null && this.heads.size() > 0) { | ||
| if (LOGGER.isLoggable(Level.INFO)) { | ||
| LOGGER.log(Level.INFO, "Resetting heads to rebuild the Pipeline structure, tossing existing heads: "+getHeadsAsString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here would reduce to just
if (heads != null && !heads.isEmpty()) {
LOGGER.log(Level.INFO, "Resetting heads to rebuild the Pipeline structure, tossing existing heads: {0}", heads);
heads.clear();
}| h.setForDeserialize(storage.getNode(entry.getValue())); | ||
| heads.put(h.getId(), h); | ||
| } else { | ||
| LOGGER.log(Level.WARNING, "Tried to load head FlowNodes for execution "+this.owner+" but FlowNode was not found in storage for head id:FlowNodeId "+entry.getKey()+":"+entry.getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just:
LOGGER.log(Level.WARNING, "Tried to load head FlowNodes for execution {0} but FlowNode was not found in storage for head id:FlowNodeId {1}", new Object[] {owner, entry});| writeChild(w, context, "start", st.getId(), String.class); | ||
| } | ||
| if (e.done != null) { | ||
| writeChild(w, context, "done", e.done, Boolean.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long as it is making some test pass, fine with me…
| setField(result, "iota", new AtomicInteger(iota)); | ||
| } else if (nodeName.equals("done")) { | ||
| Boolean isDone = readChild(reader, context, Boolean.class, result); | ||
| setField(result, "done", isDone); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are doing something with historical builds which omit this field, right? Is there some @LocalData test which fails if you comment out that “something”?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historical builds have this field. Also I've now dropped the use of Boolean (wasn't needed, can use primitive).
| } else { | ||
| stillRunnable |= t.isRunnable(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
|
|
||
| void setNewHead(FlowNode v) { | ||
| void setNewHead(@Nonnull FlowNode v) { | ||
| if (v == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, this is package-private so we can trivially enumerate all the call sites and put in assertions there that the value is non-null unless that is trivially seen from code inspection, right?
| Assert.assertFalse("Should always be able to retrieve script", StringUtils.isEmpty(cpsFlow.getScript())); | ||
| Assert.assertNull("We should have no Groovy shell left or that's a memory leak", cpsFlow.getShell()); | ||
| Assert.assertNull("We should have no Groovy shell left or that's a memory leak", cpsFlow.getTrustedShell()); | ||
| assert cpsFlow.done == Boolean.TRUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertEquals preferred
…tests plus pull in latest WF-JOB fixes
| try { | ||
| saveOwner(); | ||
| } catch (Exception ex) { | ||
| LOGGER.log(Level.WARNING, "Failed to persist WorkflowRun after noting a serious failure for run:", owner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably log the actual exception as well.
jglick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess?
pom.xml
Outdated
| <groupId>org.jenkins-ci.plugins.workflow</groupId> | ||
| <artifactId>workflow-job</artifactId> | ||
| <version>2.18-20180329.215807-4</version> | ||
| <version>2.18-20180406.172304-8</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please link to upstream PRs
| <scm-api-plugin.version>2.0.8</scm-api-plugin.version> | ||
| <groovy-cps.version>1.24</groovy-cps.version> | ||
| <jenkins-test-harness.version>2.33</jenkins-test-harness.version> | ||
| <jenkins-test-harness.version>2.37</jenkins-test-harness.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file a request to update the parent POM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jglick jenkinsci/plugin-pom#101 - Done
| } | ||
| } | ||
| } catch (Exception ex) { | ||
| done = Boolean.TRUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true
| exec.runInCpsVmThread(new FutureCallback<CpsThreadGroup>() { | ||
| @Override public void onSuccess(CpsThreadGroup result) { | ||
| try { | ||
| // TODO keep track of whether the program was saved anyway after saveState was called but before now, and do not bother resaving it in that case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW how does durability interact with saveState? Important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jglick CpsStepContext#saveState() follows the same rules as overall Program durability - if your DurabilityHint has isPersistWithEveryStep() (i.e. MAX_SURVIVABILITY or SURVIVABLE_NONATOMIC) it saves normally.
Otherwise (for PERFORMANCE_OPTIMIZED) it doesn't -- your program is not providing guaranteed persistence in the face of catastrophic system failures.
Of course with a clean shutdown the pipeline will persist via the @Terminator as part of shutdown, so that's fine.
The main fixes are here jenkinsci/workflow-job-plugin#93 but this helps harden against the following and adds a bunch of diagnostic logging:
It also subsumes: #215