Skip to content

Conversation

@svanoort
Copy link
Member

@svanoort svanoort commented Mar 30, 2018

The main fixes are here jenkinsci/workflow-job-plugin#93 but this helps harden against the following and adds a bunch of diagnostic logging:

It also subsumes: #215

Copy link
Member

@abayer abayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not explicitly approving since this is still WIP and I don't want to confuse things for myself later, but looks good so far.

* and for those we don't want to load the program state, so that check should be efficient.
*/
private boolean done;
Boolean done; // Only non-private for unit test use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extraneous leading space after indent? =)

return iota.get();
}

/** For diagnostic purposes only. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I personally appreciate it when diagnostics methods like this have javadoc or comments giving a quick explanation of what their diagnostics are meant to show - not a blocker by any means, just a casual request.


if (this.heads != null && this.heads.size() > 0) {
if (LOGGER.isLoggable(Level.INFO)) {
LOGGER.log(Level.INFO, "Resetting heads to rebuild the Pipeline structure, tossing existing heads: "+getHeadsAsString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooo, this is a nice addition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here would reduce to just

if (heads != null && !heads.isEmpty()) {
    LOGGER.log(Level.INFO, "Resetting heads to rebuild the Pipeline structure, tossing existing heads: {0}", heads);
    heads.clear();
}

}

@Override
@SuppressFBWarnings(value = "RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN", justification = "We want to explicitly check for boolean not-null and true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it wrong that I love that with Boolean, you effectively can treat it as a, I dunno, troolean? What'd be the term for a boolean but for three possible states... =)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A trilean, acc. to Wikipedia.

/** Report a fatal error in the VM. */
void croak(Throwable t) {
setResult(Result.FAILURE);
done = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Boolean.TRUE above but true here? Might be handy to mention why you're using one vs the other in one case (probably the Boolean.TRUE case, since that's not something you see as often).

writeChild(w, context, "start", st.getId(), String.class);
}
if (e.done != null) {
writeChild(w, context, "done", e.done, Boolean.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glee I can't remember off the top of my head why I care about having done persisted, but I know I do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long as it is making some test pass, fine with me…


void setNewHead(FlowNode v) {
void setNewHead(@Nonnull FlowNode v) {
if (v == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my brain briefly melts at the combination of @Nonnull and if (v == null) - I'm assuming this is defense-in-depth, with the @Nonnull covering compile-time scenarios and the if (v == null) covering even edgier-case scenarios?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's protecting against some things that shouldn't happen but yet were still happening in the wild, even though Findbugs didn't find issues at compile-time (probably it wasn't digging deeply enough).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have no problem with that approach - probably worth adding a comment explaining that, though. It'd be awfully easy for a later maintainer (i.e., me, because I'm forgetful) to not know/remember that and just think "Oh, that's a pointless null check, lemme clear the dead code".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, this is package-private so we can trivially enumerate all the call sites and put in assertions there that the value is non-null unless that is trivially seen from code inspection, right?

} catch (IOException e) {
LOGGER.log(Level.FINE, "Failed to record new head: " + v, e);
}
v.addAction(new TimingAction());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea here is that we want to still do the TimingAction, notifyNewHead, etc, even if we can't store the node?

@jglick jglick changed the title [WIP] Fix a whole wad of Resume issues and add a bunch of diagnostic logging for FlowHead problems etc Fix a whole wad of Resume issues and add a bunch of diagnostic logging for FlowHead problems etc Apr 3, 2018
pom.xml Outdated
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-job</artifactId>
<version>2.17</version>
<version>2.18-20180329.215807-4</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polite to link to the upstream in a comment so we do not lose track

* and for those we don't want to load the program state, so that check should be efficient.
*/
private boolean done;
Boolean done; // Only non-private for unit test use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to document what a null value means here.

} else if (myHeads.size() == 0) {
return "empty-heads";
} else {
return myHeads.entrySet().stream().map(h->h.getKey()+"::"+h.getValue()).collect(Collectors.joining(","));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

myHeads.toString not good enough??


if (this.heads != null && this.heads.size() > 0) {
if (LOGGER.isLoggable(Level.INFO)) {
LOGGER.log(Level.INFO, "Resetting heads to rebuild the Pipeline structure, tossing existing heads: "+getHeadsAsString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here would reduce to just

if (heads != null && !heads.isEmpty()) {
    LOGGER.log(Level.INFO, "Resetting heads to rebuild the Pipeline structure, tossing existing heads: {0}", heads);
    heads.clear();
}

h.setForDeserialize(storage.getNode(entry.getValue()));
heads.put(h.getId(), h);
} else {
LOGGER.log(Level.WARNING, "Tried to load head FlowNodes for execution "+this.owner+" but FlowNode was not found in storage for head id:FlowNodeId "+entry.getKey()+":"+entry.getValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just:

LOGGER.log(Level.WARNING, "Tried to load head FlowNodes for execution {0} but FlowNode was not found in storage for head id:FlowNodeId {1}", new Object[] {owner, entry});

writeChild(w, context, "start", st.getId(), String.class);
}
if (e.done != null) {
writeChild(w, context, "done", e.done, Boolean.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long as it is making some test pass, fine with me…

setField(result, "iota", new AtomicInteger(iota));
} else if (nodeName.equals("done")) {
Boolean isDone = readChild(reader, context, Boolean.class, result);
setField(result, "done", isDone);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are doing something with historical builds which omit this field, right? Is there some @LocalData test which fails if you comment out that “something”?

Copy link
Member Author

@svanoort svanoort Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historical builds have this field. Also I've now dropped the use of Boolean (wasn't needed, can use primitive).

} else {
stillRunnable |= t.isRunnable();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?


void setNewHead(FlowNode v) {
void setNewHead(@Nonnull FlowNode v) {
if (v == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, this is package-private so we can trivially enumerate all the call sites and put in assertions there that the value is non-null unless that is trivially seen from code inspection, right?

Assert.assertFalse("Should always be able to retrieve script", StringUtils.isEmpty(cpsFlow.getScript()));
Assert.assertNull("We should have no Groovy shell left or that's a memory leak", cpsFlow.getShell());
Assert.assertNull("We should have no Groovy shell left or that's a memory leak", cpsFlow.getTrustedShell());
assert cpsFlow.done == Boolean.TRUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertEquals preferred

@svanoort svanoort requested a review from jglick April 6, 2018 13:20
try {
saveOwner();
} catch (Exception ex) {
LOGGER.log(Level.WARNING, "Failed to persist WorkflowRun after noting a serious failure for run:", owner);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably log the actual exception as well.

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess?

pom.xml Outdated
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-job</artifactId>
<version>2.18-20180329.215807-4</version>
<version>2.18-20180406.172304-8</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please link to upstream PRs

<scm-api-plugin.version>2.0.8</scm-api-plugin.version>
<groovy-cps.version>1.24</groovy-cps.version>
<jenkins-test-harness.version>2.33</jenkins-test-harness.version>
<jenkins-test-harness.version>2.37</jenkins-test-harness.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file a request to update the parent POM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}
} catch (Exception ex) {
done = Boolean.TRUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true

exec.runInCpsVmThread(new FutureCallback<CpsThreadGroup>() {
@Override public void onSuccess(CpsThreadGroup result) {
try {
// TODO keep track of whether the program was saved anyway after saveState was called but before now, and do not bother resaving it in that case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW how does durability interact with saveState? Important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jglick CpsStepContext#saveState() follows the same rules as overall Program durability - if your DurabilityHint has isPersistWithEveryStep() (i.e. MAX_SURVIVABILITY or SURVIVABLE_NONATOMIC) it saves normally.

Otherwise (for PERFORMANCE_OPTIMIZED) it doesn't -- your program is not providing guaranteed persistence in the face of catastrophic system failures.

Of course with a clean shutdown the pipeline will persist via the @Terminator as part of shutdown, so that's fine.

@svanoort svanoort merged commit 9b0848c into jenkinsci:master Apr 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants