fix: Ensure consistent breaker state for unhealthy hosts with infligh… #15811
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR references the discussion in the CNCF knative-serving Slack channel, which can be found here: https://cloud-native.slack.com/archives/C04LMU0AX60/p1740420658746939
Issue
There has been a long-standing issue in KNative, with Issues dating back to 2022 of the Load Balancer erroneously sending requests to pods which are already busy, even though their set max concurrency should not permit the request to be sent through to that pod. I believe I have found (at least one of) the causes.
In specific scenarios, particularly long-running requests, the breaker state for a podTracker will be functionally reset if the revision backend temporarily registers the pod as "unhealthy". There is no explicit logic that dictates this behaviour. Rather, this is a result of how
throttler.go
updates the state for its podTrackers:serving/pkg/activator/net/throttler.go
Line 335 in c09ff6c
To summarise, the new state is determined by completely remaking its []podTrackers list from scratch, using information it retrieves from the respective revision backend. Only the healthy hosts from the revision backend are used to make this podTrackers list, and therefore, any unhealthy podTrackers are effectively removed. This is the case even if the "unhealthy" pod is currently busy with a request. If the pod then becomes healthy timely, the revision backend will report it, and the throttler will re-add a brand new podTracker for the pod, effectively removing any previous state held by the podTracker (ie setting InFlight to 0). The pod therefore becomes a valid target for the loadBalancing policy, even though in reality it is still busy with a request.
Proposed Changes
This PR seeks to address this issue fairly simply, by not relinquishing the state of the podTrackers until a host is both unhealthy AND finished with in-flight requests. Technically, this amounts to the following:
map[string]*podTracker
as opposed to a[]*podTracker
. This allows us to easily manage viable podTrackers by referencing them with theirdest
at any point. It has the added benefit on not re-creating the map on everyassignSlice
call.Breaker
interface to have newPending
method. This involves moving the current definition forInFlight
to be the inflight value associated with the semaphore (similarly toCapacity
), and creating a newPending
method which references a newpending
attribute (the oldinFlight
attribute) of a Breaker.PodTracker
struct with a new attributehealthy
, which a load balancing policy can use to skip unhealthy podTrackers.inFlight
requests as determined by the current state of the respective podTracker's breaker semaphore-inflight as "unhealthy". If a podTracker is both unhealthy and has no in-flights, the tracker can be removed.nil
(which can now be possible due to us removing the reference for the podTracker inupdateThrottlerState
There are 5 more things to note in this PR that are tangential/important but not directly related to the functionality:
Capacity
andInFlight
calls to returnuint64
as indicated by a TODO commentgolang.org/x/exp/maps
, in order to do map manipulation more easily. (I believe this can be removed though when the project updates its go version)nil
, the RandomChoice policy may now not succeed. I have updated it to continue to try until it randomly selects a podTracker that is notnil
. I'm not married to this implementation, we can definitely do it another way. I am aware the unit test for this is currently broken, but I will await feedback to change the actual implementation here.The implementation of this functionality is open to debate. I am happy to refine this towards an implementation that the core maintainers are more amenable to.