Skip to content

Conversation

@jglick
Copy link
Member

@jglick jglick commented Oct 19, 2017

See JENKINS-47517. Reverts an inessential-seeming, and extremely incompatible, part of #3038.

Proposed changelog entries

  • StackOverflowError thrown under some conditions when using Pipeline on 2.85.

@reviewbybees @jenkinsci/code-reviewers

@ghost
Copy link

ghost commented Oct 19, 2017

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

svanoort
svanoort previously approved these changes Oct 19, 2017
Copy link
Member

@svanoort svanoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's subtle... and nasty. Good catch.

stephenc
stephenc previously approved these changes Oct 19, 2017
@Jimilian
Copy link
Contributor

@jglick any suggestion how to avoid such bugs in future? maybe there is some way to grep through all related plugins?

@jglick
Copy link
Member Author

jglick commented Oct 19, 2017

@oleg-nenashev asked if Util.isOverridden could be used to retain the fallback only when it is safe. We use this trick a lot to avoid stack overflows when upgrading APIs, but I did not see a clear way to use it in this case, because we want to know not just whether isBuildBlocked is overridden (it must be), but whether it delegates to getCauseOfBlockage. Suggestions welcome; in the meantime, it seemed that this fallback was not actually being used by the original patch and could be removed.

Another solution is of course to patch AfterRestartTask to take advantage of new default methods, and just return false unconditionally from isBuildBlocked. We probably want to do that anyway, but we cannot assume people will have upgraded workflow-job.

any suggestion how to avoid such bugs in future?

Ultimately there should be some process running plugin-compat-tester and acceptance-test-harness against jenkinsci/jenkins@master at least just prior to cutting a weekly release, and perhaps more often than that. I am not sure if that would have caught this or not—I noticed the StackOverflowError by accident in pipeline-stage-step test output but it did not cause a test failure. (The error was in a background thread incidental to the behavior being tested, and plugin test cases generally do not do anything like StartupTest.noWarnings.)

maybe there is some way to grep through all related plugins?

It is definitely possible to find usages of an API whose behavior you are proposing to change, deprecate, etc., as in this example. Somewhere (where?) there is also an index of core APIs using actual code analysis, rather than plain-text search. We used to have Sourcerer running on Jenkins but this has been broken for a long time.

@jglick jglick added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Oct 19, 2017
@jglick
Copy link
Member Author

jglick commented Oct 19, 2017

@oleg-nenashev also asks whether we could have the equivalent of plugin-compat-tester run as part of jenkinsci/jenkins/Jenkinsfile (and thus even during core PR builds) for at least some “smoke” tests in selected plugins which are both important (~ in Setup Wizard recommended list, and with lots of active users); and likely to be broken by subtle core changes like this. Perhaps. plugin-compat-tester might be able to do it with some hackery; or jenkinsci/maven-hpi-plugin#66 might be a more maintainable path forward.

@daniel-beck
Copy link
Member

It is definitely possible to find usages of an API whose behavior you are proposing to change, deprecate, etc., as in this example. Somewhere (where?) there is also an index of core APIs using actual code analysis, rather than plain-text search.

FWIW I usually adapt https://github.com/jenkins-infra/deprecated-usage-in-plugins to find what I'm looking for, e.g. jenkins-infra/usage-in-plugins#7. Signatures can be hard-coded in DeprecatedApi, e.g. methods.add("hudson/model/Queue#clearLeftItems") (which I was looking for in a recent investigation about a reported bug).

@stephenc
Copy link
Member

if Util.isOverridden could be used to retain the fallback only when it is safe.

So the only way I can see to do that would be to use a ThreadLocal<?>

  1. Get the current value of the ThreadLocal
  2. If the current value != null, do not invoke
  3. If the current value == null, set to this, invoke and finally clear again

We could go even more fancy to see if there are nested calls... but that would have a bigger impact on the expected cost of the method.

@jglick
Copy link
Member Author

jglick commented Oct 19, 2017

Yes it is possible to break cycles using ThreadLocal if it is considered important to keep this fallback. For purposes of this patch I wanted to just keep it simple and revert to the 2.84 behavior because I did not see any particular reason for it to have been added in the first place. @Jimilian please correct me if this part of the original patch was essential for some reason.

@recampbell
Copy link
Member

Would be nice to have a test in core to make sure we don't regress this again.

@Jimilian
Copy link
Contributor

Jimilian commented Oct 19, 2017

@jglick I introduced it because if someone did not override getCauseOfBlockage it will just pass through all checks.

CauseOfBlockage causeOfBlockage = task.getCauseOfBlockage();
if (causeOfBlockage != null) { // here we need to check `task.isBlocked`
    return task.getCauseOfBlockage();

if (!canRun(task.getResourceList())) {
    ...
}  

Now I think it was better to add a check for task.isBlocked in getCauseOfBlockageForItem instead of changing default implementation in first place...

@jglick
Copy link
Member Author

jglick commented Oct 19, 2017

Having both methods seems like a bad design in the first place. Maybe we can simply deprecate isBuildBlocked (core no longer calls it anyway) and have it delegate to getCauseOfBlockage() != null. I think that would be more compatible. Let me try it.

@jglick jglick dismissed stale reviews from svanoort and stephenc October 19, 2017 17:08

Materially changed since review.

@jglick jglick added needs-more-reviews Complex change, which would benefit from more eyes and removed ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback labels Oct 19, 2017
* A project must be blocked if its own previous build is in progress,
* or if the blockBuildWhenUpstreamBuilding option is true and an upstream
* project is building, but derived classes can also check other conditions.
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now no-op overrides.

*
* <p>
* Short-hand for {@code getCauseOfBlockageForItem()!=null}.
* @deprecated Use {@link #getCauseOfBlockage} != null
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of 2.85 was already not being called by core, so why keep it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be also restricted?

*/
@Deprecated
String getWhyBlocked();
default String getWhyBlocked() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other methods in this interface could benefit from default implementations but I will leave that to another day.

*/
@CheckForNull
default CauseOfBlockage getCauseOfBlockage() {
if (isBuildBlocked()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix.

@Test
public void causeOfBlockageOverrides() {
Queue.Task t = new LegacyTask();
assertFalse(t.isBuildBlocked());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Queue.java reverted to master version, reproduces the StackOverflowError.

assertNull(t.getWhyBlocked());
assertNull(t.getCauseOfBlockage());
}
static class LegacyTask extends AbstractQueueTask {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same essential structure as current org.jenkinsci.plugins.workflow.job.AfterRestartTask.

@Override public int hashCode() {
return cnt.hashCode();
}
@Override public boolean isBuildBlocked() {return isBlocked;}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only a test task. A real one would already have needed to implement getCauseOfBlockage to provide a message to the user. (It was abstract prior to #2879 in 2.62 so it is unlikely that new code was written against 2.62–2.85 that assumed that only implementing isBuildBlocked was OK.)

queue.maintain();

assertEquals(1, r.jenkins.getQueue().getBlockedItems().size());
CauseOfBlockage actual = r.jenkins.getQueue().getBlockedItems().get(0).getCauseOfBlockage();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing something which is no longer wanted.

Copy link
Member

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐝 Maybe needs some polishing later, but it seems to be a good fix for now

*
* <p>
* Short-hand for {@code getCauseOfBlockageForItem()!=null}.
* @deprecated Use {@link #getCauseOfBlockage} != null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be also restricted?

@oleg-nenashev
Copy link
Member

I assume @Jimilian also approved it, am I right?

@oleg-nenashev oleg-nenashev self-assigned this Oct 20, 2017
@Jimilian
Copy link
Contributor

@oleg-nenashev, yes, fix itself looks fine for me. But I'm not 100% sure that it handles mentioned problem whenisBuildBlocked was overriden, but getCauseOfBlockage() - was not. Because isBuildBlocked is not called anymore, so, if old plugin that relays only on isBuildBlocked will be executed on new core -> it will just pass through the check as default implementation of getCauseOfBlockage() returns null.

p.s. Maybe I still do not get classpath/api magic in Jenkins.

@jglick
Copy link
Member Author

jglick commented Oct 20, 2017

Should it be also restricted?

No. Why? @Restricted is for things which must be public (or protected) for technical reasons but should not be considered part of the API. @Deprecated is for things which used to be part of the API but should no longer be used. They do not overlap.

@jglick
Copy link
Member Author

jglick commented Oct 20, 2017

I'm not 100% sure that it handles mentioned problem when isBuildBlocked was overridden, but getCauseOfBlockage() was not.

No, it does not. Already discussed above in #3099 (comment). I do not consider it a problem.

@stephenc
Copy link
Member

@Restricted will break the build on upgrade and accelerate the avoidance of the method. @Deprecated can (and consequently will) be easily ignored.

I think adding a new type of restriction, @Restricted(Legacy) would be the way to go

@jglick
Copy link
Member Author

jglick commented Oct 20, 2017

@Restricted will break the build on upgrade and accelerate the avoidance of the method.

At the cost of interrupting unrelated developer workflows, such as

mvn -Djenkins.version=2.99 clean test

Anyway I see no reason why avoidance of this method needs to be accelerated beyond any other deprecated API.

@Deprecated can (and consequently will) be easily ignored.

It can be, yes, but that is a separate issue, solvable using standard techniques such as compiler flags, or the Warnings plugin for Jenkins for that matter.

@jglick jglick added ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback and removed needs-more-reviews Complex change, which would benefit from more eyes labels Oct 20, 2017
@jglick
Copy link
Member Author

jglick commented Oct 20, 2017

@reviewbybees done

@jglick
Copy link
Member Author

jglick commented Oct 20, 2017

Merging this to make sure it gets into 2.86 since I think the StackOverflowError is too severe to let sit. Follow-ups are possible if necessary.

@jglick jglick merged commit 5e6b105 into jenkinsci:master Oct 20, 2017
@jglick jglick deleted the getCauseOfBlockage-JENKINS-47517 branch October 20, 2017 22:02
@oleg-nenashev
Copy link
Member

Thanks @jglick !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants