Skip to content

Conversation

@jglick
Copy link
Member

@jglick jglick commented Dec 10, 2018

Various tests which have prevented us from having a stable master build since Nov 27, which is intolerable. Exempting this DisablePluginCommandTest test where it just looks like all hell broke loose, not the fault of the test.

assertFalse("f1 exists", f1.exists());
}

@Ignore("TODO often fails in CI")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has long been flaky. #3787 forgot this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps better to assume not running in a CI environment? https://github.com/auchenberg/volkswagen/ light?

Copy link
Contributor

@Wadeck Wadeck Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already in progress for this one in #3721. Otherwise we will have 2 @Ignore and the compilation will fail. (Sorry "ignore" user for the ping)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, would just be a trivial merge conflict.

@Rule
public JenkinsRule r = new JenkinsRule();

@Ignore("TODO observed to fail in CI with 404")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some sort of network outage, it seems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems unnecessary, haven't seen it fail a substantial number of times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I have seen it fail for reasons which were clearly tied to issues completely unrelated to core code. There are other UC-dependent tests which have been suppressed for the same reason.

}
}

@Ignore("TODO sometimes fails, in CI & locally")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3708 does not seem to have worked reliably.

* So using slightly more than the worse value obtained above should avoid making this flaky and still catch
* <strong>really</strong> bad performance regressions.
*/
@Ignore("TODO often fails in CI")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, perhaps just assume not running in CI?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea, because this test is unlikely to ever fail in a normal environment (I mean something built after ~1995), and at the same time I feel a bit bad if tests run locally are not the same as in CI. I see potential interesting headaches for some people if some tests fail locally but not in CI (though I suppose a comment would help about this...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is unlikely to ever fail in a normal environment

Possibly so, but performance tests are not suitable as merge/deploy gates—I could be patching README.md and the build might be marked as a failure for reasons unrelated to my change. Or, as in #3778, I could be trying to get a binary published for some unrelated fix and be stymied by this flake.

Better to have these in a separate job (repo?) where they are run say on a daily basis with some kind of tooling to track trends and alert developers to consistent regressions.

@daniel-beck
Copy link
Member

In general I'm not positive tooling picks up TODO as part of strings, would prefer an additional // TODO comment on the same line to ensure as much visibility as possible.

Other than that and inline suggestions for a few of these, 👍

Copy link
Member

@batmat batmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite a few ignores are probably too aggressive, as Daniel says some have not failed since some time, I think we can move forward here.
I think given the current situation where we didn't get a green build on the master branch since quite long, possibly a radical fix like this one is the way to go, then we take the time to fix it/re-enable tests one by one.

Overall, I think we should define a process more common for this. Flakes are a pain, and we should probably just go some way like either @Ignore them as soon as they behave flakily, and work on these to see if they can be later reintroduced, or move them to a special profile that would be run in a dedicated test run that would not fail the build, but would help us have a look afterwards if needed.

@jglick if this gets more approval and gets merged, could you please file a JIRA to work on fixing these flakes and ping me there? I'd like to see what we can do to improve the situation, once the master branch has durably gotten back to green, which is my first goal approving here.

Thanks!

@batmat batmat mentioned this pull request Dec 11, 2018
1 task
@jglick
Copy link
Member Author

jglick commented Dec 11, 2018

we should probably just go some way like either @Ignore them as soon as they behave flakily, and [various options]

Yes, that would be my preference. Get the master build blue ASAP, then work at our leisure on reintroducing test coverage which still seems relevant.

@jglick
Copy link
Member Author

jglick commented Dec 11, 2018

Not waiting for build 4 since build 2 was stable and there have been only trivial changes since then (which could only affect test compilation, which has already passed).

@jglick jglick merged commit 2656f7b into jenkinsci:master Dec 11, 2018
@jglick jglick deleted the failing-tests branch December 11, 2018 15:29
@jglick
Copy link
Member Author

jglick commented Dec 11, 2018

could you please file a JIRA to work on fixing these flakes

@batmat JENKINS-55122 and thanks!

jglick added a commit to jglick/jenkins that referenced this pull request Dec 12, 2018
@Rule
public JenkinsRule r = new JenkinsRule();

@Ignore("TODO observed to fail in CI with 404 due to external UC issues")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should perhaps get a workaround applied similar to #3962?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. Does not look like I thought to record a copy of the failure message. Probably best to rewrite tests like these to not attempt to make an Internet connection to begin with, and instead use some hard-coded example JSON data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants