-
Notifications
You must be signed in to change notification settings - Fork 1.2k
✨ Implement warm replica support for controllers #3192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @godwinpang. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
5d1be84
to
5db61c7
Compare
/retest |
5db61c7
to
be1b1c2
Compare
/retest |
/retest |
/retest |
/retest |
e05677a
to
1d07efc
Compare
/retest |
1 similar comment
/retest |
1c32d37
to
7cc29dc
Compare
@godwinpang any plans on wrapping this up? |
Sorry, I missed this Github notification. I've updated the PR description and also duplicated the tests between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
/hold
in case Stefan has anything to add
LGTM label has been added. Git tree hash: 781742318cfc9c5c72173ae51ebe9fb41673044d
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, godwinpang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@godwinpang Thank you very much! We're almost there (only one idea for simplification in the prod code, otherwise just a few comments on tests) |
@godwinpang would be nice if we could get this PR done 😃 |
New changes are detected. LGTM label has been removed. |
…nables are started before leader election.
Sorry for the long wait, I've addressed the PR comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, just some very minor findings now.
@alvaroaleman Can you please also take another look?
q := c.Queue | ||
if err := watch.Start(ctx, q); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q := c.Queue | |
if err := watch.Start(ctx, q); err != nil { | |
if err := watch.Start(ctx, c.Queue); err != nil { |
By("Unblocking leader election") | ||
resourceLockWithHooks.UnblockLeaderElection() | ||
|
||
By("Waiting for the leader election runnable to be executed without leader election being won") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By("Waiting for the leader election runnable to be executed without leader election being won") | |
By("Waiting for the leader election runnable to be executed after leader election was won") |
|
||
// BlockLeaderElection blocks the leader election process when called. It will not be unblocked | ||
// until UnblockLeaderElection is called. | ||
// Not thread safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm not sure I follow what makes this not thread safe (same in l.46)
Or maybe the comment is outdated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outdated, sorry!
This change implements the proposal for warm replicas as proposed in #3121.
It adds an
EnableWarmup
option for controllers to optionally start as warmed replicas, which means that sources will be started before leader election has been won by the instance.It also adds a new internal runnable interface called
warmupRunnable
with a singleWarmup
method that will be called before leader election. There is no guarantee that theWarmup
method returns before leader election, just that it is called.The controller's implementation of the
Warmup
method for thewarmupRunnable
starts the sources with a threadsafe methodstartEventSourcesAndQueueLocked
that also adds events to the queue. In the case of a non-leader elected controller, it is intended for theWarmup
method to race with theStart
method to start the sources. The methods are synchronized by thedidStartEventSourcesOnce
sync.Once used instartEventSourcesAndQueueLocked
.For the most part,
Warmup
behaves exactly the same asStart
with the following differencesWarmup
can be called, vs.Start
which can only be called once.Warmup
doesn't initialize metrics for the controller, nor does it start the worker goroutines.Shutdown
Warmup runnables are shutdown in a separate goroutine from the one that stops the leader election runnables, because there is no guarantee as to whether or not the
Warmup
or theStart
method is holding the lock for thedidStartEventSourcesOnce
sync.Once instance.Testing
On top of the unit + integration tests, this PR was tested with an internal project that ran this branch of controller-runtime for a week with (EnableWarmup=true`, and verified that the controllers on the non-leader elected replica had a populated workqueue.