Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1638,25 +1638,23 @@ public void pause(final boolean v) throws IOException {
@Restricted(DoNotUse.class)
@Terminator(attains = FlowExecutionList.EXECUTIONS_SUSPENDED)
public static void suspendAll() {
CpsFlowExecution exec = null;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outermost catch clause seemed redundant. (Timeout.close does not throw exceptions, so it was not for that.)

try (Timeout t = Timeout.limit(3, TimeUnit.MINUTES)) { // TODO some complicated sequence of calls to Futures could allow all of them to run in parallel
LOGGER.fine("starting to suspend all executions");
for (FlowExecution execution : FlowExecutionList.get()) {
try {
if (execution instanceof CpsFlowExecution) {
CpsFlowExecution cpsExec = (CpsFlowExecution)execution;
if (execution instanceof CpsFlowExecution) {
CpsFlowExecution cpsExec = (CpsFlowExecution) execution;
try {
cpsExec.checkAndAbortNonresumableBuild();

LOGGER.log(Level.FINE, "waiting to suspend {0}", execution);
exec = (CpsFlowExecution) execution;
// Like waitForSuspension but with a timeout:
if (exec.programPromise != null) {
if (cpsExec.programPromise != null) {
Comment on lines -1645 to +1651
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplifying

LOGGER.log(Level.FINER, "Waiting for Pipeline to go to sleep for shutdown: "+execution);
try {
exec.programPromise.get(1, TimeUnit.MINUTES).scheduleRun().get(1, TimeUnit.MINUTES);
cpsExec.programPromise.get(1, TimeUnit.MINUTES).scheduleRun().get(1, TimeUnit.MINUTES);
LOGGER.log(Level.FINER, " Pipeline went to sleep OK: "+execution);
} catch (InterruptedException | TimeoutException ex) {
LOGGER.log(Level.WARNING, "Error waiting for Pipeline to suspend: "+exec, ex);
LOGGER.log(Level.WARNING, "Error waiting for Pipeline to suspend: " + cpsExec, ex);
}
}
cpsExec.checkpoint(true);
Expand All @@ -1671,15 +1669,15 @@ public static void suspendAll() {
}
});
}
cpsExec.getOwner().getListener().getLogger().close();
if (cpsExec.owner != null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how this would be reachable? Corruption during resumption or something? Perhaps jenkinsci/workflow-api-plugin#304 has made it possible? If I understand right, in this case FlowExecutionList.runningTasks contains a FlowExecutionOwner which successfully returns a FlowExecution from .get(), but for which FlowExecution.owner on the returned object is null, which seems problematic and unexpected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid I do not know—just saw this in a log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose [Workflow]Run.reload deserializes execution, but then WorkflowRun.onLoad fails before calling getExecution, or getExecution fails before calling fetchedExecution.onLoad(new Owner(this)), or unmarshal fails partway through, etc. Presumably the build was badly corrupted somehow. The point of this PR is just to avoid unnecessary stack traces after that.

Copy link
Member

@dwnusbaum dwnusbaum Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth though I saw another case of a CpsFlowExecution with a null owner recently, which is why I am wondering if something has changed things:

java.lang.IllegalStateException: List of flow heads unset for CpsFlowExecution[null]
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getCurrentHeads(CpsFlowExecution.java:1018)
        ... insignificant, just a user loading some build page that accessed the heads for an execution ...

I looked at the XML for that execution on disk (based on the thread name) and head was non-null and pointed to a FlowEndNode that did exist in workflow/, so it wasn't obvious what might have gone wrong.

cpsExec.owner.getListener().getLogger().close();
}
} catch (Exception ex) {
LOGGER.log(Level.WARNING, "Error persisting Pipeline execution at shutdown: " + cpsExec.owner, ex);
}
} catch (Exception ex) {
LOGGER.log(Level.WARNING, "Error persisting Pipeline execution at shutdown: "+((CpsFlowExecution) execution).owner, ex);
}
}
LOGGER.fine("finished suspending all executions");
} catch (Exception x) {
LOGGER.log(Level.WARNING, "problem suspending " + exec, x);
}
}

Expand Down