Skip to content

Conversation

@abayer
Copy link
Member

@abayer abayer commented Mar 7, 2018

JENKINS-26521

Replaces #18

Many thanks to @ikedam for the original version of this. I've just resurrected it and tweaked a couple things.

cc @reviewbybees

Many thanks to @ikedam for the original version of this. I've just
resurrected it and tweaked a couple things.
@abayer abayer requested review from ikedam, jglick and svanoort March 7, 2018 17:53
@jglick
Copy link
Member

jglick commented Mar 7, 2018

And what about #19?

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will work for now, but is going to break in conjunction with jenkinsci/workflow-support-plugin#15 because the ConsoleLogFilterImpl, while Serializable to program.dat, will not be remotable. You would need to create some kind of exported object to signal the master side that the timer should be reset.

private long timeout = 0;
private long end = 0;

/** Used to track whether this is timing out on inactivity without needing to reference {@link #step}. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just delete the step field altogether at this point?

private static class ConsoleLogFilterImpl extends ConsoleLogFilter implements Serializable {
private static final long serialVersionUID = 1L;

private final ResetCallback callback;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier to just make ConsoleLogFilterImpl nonstatic and inline logWritten.

+ " echo 'NotHere';\n"
+ " sleep 3;\n"
+ " echo 'NotHereYet';\n"
+ " sleep 3;\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test could be flaky on loaded CI systems if you have only a 1s buffer.

@svanoort
Copy link
Member

svanoort commented Mar 8, 2018

The reliance on reading the logs here makes me rather nervous -- I can't put a finger on something clearly wrong with it per se, but it seems like it might open up some potential stability issues.

We need to be really, really careful that we're not thrashing the system by forcing heavy log read activity.

@abayer
Copy link
Member Author

abayer commented Mar 8, 2018

@svanoort - Yeah, I agree. This works but I don't love it. I'd made some vague efforts in the direction of implementing this before finding #18 - nothing else I could come up with would actually work. Particularly because the actual intent here really is to emulate build-timeout's "timeout if nothing new is written to the log" functionality - there really isn't much we can do there but sniff the logs in some way.

@abayer
Copy link
Member Author

abayer commented Mar 9, 2018

@jglick - might it make sense to actually make "timeout on log inactivity" part of DurableTask, wait for the Controller.watch api and piggyback on that with a Handler that pushes back "hey, log activity" notifications? Feels like there should be some way to sneak in before we're actually getting to the log-goes-to-master-or-wherever-else phase if we go straight into DurableTask and DurableTaskStep...

@jglick
Copy link
Member

jglick commented Mar 12, 2018

@abayer perhaps. There could be some use cases for checking inactivity on miscellaneous long-running steps producing output—for example, if there is a step which tails a log from a grid engine. On the other hand I agree that DurableTaskStep is likely to account for the great majority of uses of this feature. jenkinsci/durable-task-plugin#60 would not need to be involved, I think (nor jenkinsci/durable-task-plugin#49, which also deals with something vaguely related); suffices for jenkinsci/workflow-durable-task-step-plugin#63 (or indeed the current code) to be amended to send back a heartbeat signal.

But I do not think such a proposal really simplifies anything. Indeed what I meant by my comment was not that this PR is fundamentally incompatible with an external log sink—only that the current implementation is. jenkinsci/workflow-durable-task-step-plugin#21 means that the ConsoleLogFilter needs to be safely remotable.¹ That could be satisfied without any major rewrite (I think) by calling Channel.export to create a proxy for a heartbeat interface if and when the filter is remoted (as opposed to saved to program.dat). Thus, when new output appears, whether the OutputStream receiving it is running on the master (most steps) or on some agent (sh etc., with above changes), the heartbeat method gets called in the master and updates the running TimeoutStepExecution.

¹Unfortunately while we have a SerializableOnlyOverRemoting marker interface, we currently lack something like SerializableAlsoOverRemoting, so this is currently not mechanically checkable. CC @oleg-nenashev.

@jglick
Copy link
Member

jglick commented Mar 12, 2018

we're not thrashing the system by forcing heavy log read activity

I see no reason for worry. This PR does not “read” anything. It merely intercepts log writes. Yes the LineTransformationOutputStream adds a little bit of overhead, but I expect it to be very small, and after all this feature is opt-in.

@abayer
Copy link
Member Author

abayer commented Mar 12, 2018

Where exactly would we be calling Channel.export?

@abayer
Copy link
Member Author

abayer commented Mar 12, 2018

And is there an existing example of this sort of thing you could point me to?

long now = System.currentTimeMillis();
end = now + GRACE_PERIOD;
setupTimer(now);
resetTimer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something strange.
end will be overridden soon in resetTimer.

@henrik-koren
Copy link

@abayer , @ikedam when do you think that this PR will be merge
I am the initial requester of this change
thanks

@jglick
Copy link
Member

jglick commented Apr 11, 2018

Where exactly would we be calling Channel.export?

From either writeReplace or readResolve of ConsoleLogFilterImpl, iff there is indeed an active Channel (thus serialization is over Remoting rather than to program.dat). ResetCallback would need to be made to implement an interface you could export. (This would of course supersede my suggestion to make ConsoleLogFilterImpl nonstatic.)

@abayer
Copy link
Member Author

abayer commented Apr 11, 2018

@jglick - So would we need to behave differently depending on whether we have a Channel, then? Is there any good example of something like this? hudson.util.ProcessTree is the first place I've stumbled over use of Channel#export - not sure if that's actually a good example, though.

@jglick
Copy link
Member

jglick commented Apr 11, 2018

See abayer#1.

@abayer
Copy link
Member Author

abayer commented Apr 11, 2018

@jglick - Thanks! That was very similar to what I was hacking together anyway, just...well, simpler. =)

@jglick jglick self-requested a review April 11, 2018 17:46
Copy link
Member

@svanoort svanoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking briefly so I can give this one more pass before we release.

@abayer
Copy link
Member Author

abayer commented Apr 18, 2018

Ping @svanoort

@Override
public void onResume() {
setupTimer(System.currentTimeMillis());
setupTimer(System.currentTimeMillis(), false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abayer I wonder a bit about what the implications are for not resetting activity-based timers at resume... if we're looking at logs, won't the timer trip if Jenkins has been down for a while, and it takes a little bit to get the new log data?

Then again we also trip normal timers at restart too I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure what the ideal approach here would be, so I'd say let's go with this to start and change it in the future if needed. =)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found issue with timeout activity, described in:
https://issues.jenkins.io/browse/JENKINS-58752

How additional logs could be taken from Jenkins, to investigate that issue?

Copy link
Member

@svanoort svanoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel nervous about this one, but in the interests of unblocking will approve it, since I cannot point to anything specific that needs to be changed.

The logging approach at least seems reasonable (essentially intercepting writes).

@abayer abayer merged commit 2bf6e55 into jenkinsci:master Apr 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants