Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ensure consistent breaker state for unhealthy hosts with infligh… #15811

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

elijah-rou
Copy link

@elijah-rou elijah-rou commented Mar 16, 2025

This PR references the discussion in the CNCF knative-serving Slack channel, which can be found here: https://cloud-native.slack.com/archives/C04LMU0AX60/p1740420658746939

Issue

There has been a long-standing issue in KNative, with Issues dating back to 2022 of the Load Balancer erroneously sending requests to pods which are already busy, even though their set max concurrency should not permit the request to be sent through to that pod. I believe I have found (at least one of) the causes.

In specific scenarios, particularly long-running requests, the breaker state for a podTracker will be functionally reset if the revision backend temporarily registers the pod as "unhealthy". There is no explicit logic that dictates this behaviour. Rather, this is a result of how throttler.go updates the state for its podTrackers:

func (rt *revisionThrottler) updateThrottlerState(backendCount int, trackers []*podTracker, clusterIPDest *podTracker) {

To summarise, the new state is determined by completely remaking its []podTrackers list from scratch, using information it retrieves from the respective revision backend. Only the healthy hosts from the revision backend are used to make this podTrackers list, and therefore, any unhealthy podTrackers are effectively removed. This is the case even if the "unhealthy" pod is currently busy with a request. If the pod then becomes healthy timely, the revision backend will report it, and the throttler will re-add a brand new podTracker for the pod, effectively removing any previous state held by the podTracker (ie setting InFlight to 0). The pod therefore becomes a valid target for the loadBalancing policy, even though in reality it is still busy with a request.

Proposed Changes

This PR seeks to address this issue fairly simply, by not relinquishing the state of the podTrackers until a host is both unhealthy AND finished with in-flight requests. Technically, this amounts to the following:

  • Change the core data-structure for storing all pod trackers to be a map[string]*podTracker as opposed to a []*podTracker. This allows us to easily manage viable podTrackers by referencing them with their dest at any point. It has the added benefit on not re-creating the map on every assignSlice call.
  • Update the Breaker interface to have new Pending method. This involves moving the current definition for InFlight to be the inflight value associated with the semaphore (similarly to Capacity), and creating a new Pending method which references a new pending attribute (the old inFlight attribute) of a Breaker.
  • Update the PodTracker struct with a new attribute healthy, which a load balancing policy can use to skip unhealthy podTrackers.
  • When updating Throttler state, use both healthy pods obtained by the revision backend and the current podTracker list to not only add new podTrackers, but to mark podTrackers which are unhealthy but still have inFlight requests as determined by the current state of the respective podTracker's breaker semaphore-inflight as "unhealthy". If a podTracker is both unhealthy and has no in-flights, the tracker can be removed.
  • Update all load balancing policies to skip pod trackers that are unhealthy, and also to skip podTrackers in the assignedTrackers list which are nil (which can now be possible due to us removing the reference for the podTracker in updateThrottlerState

There are 5 more things to note in this PR that are tangential/important but not directly related to the functionality:

  1. I have updated the Capacity and InFlight calls to return uint64 as indicated by a TODO comment
  2. The default minimum number of activators for the KPA was set to 2. I have updated this value to 1, as there can be setups with 1 activator present in the proxy path.
  3. The changes introduce a dependency from golang.org/x/exp/maps, in order to do map manipulation more easily. (I believe this can be removed though when the project updates its go version)
  4. Since a podTracker can now be nil, the RandomChoice policy may now not succeed. I have updated it to continue to try until it randomly selects a podTracker that is not nil. I'm not married to this implementation, we can definitely do it another way. I am aware the unit test for this is currently broken, but I will await feedback to change the actual implementation here.
  5. I have added some debug logs for podTracker acquisition for a specific request. This also requires that X-Request-Id be passed in for the log to be sensical.

The implementation of this functionality is open to debate. I am happy to refine this towards an implementation that the core maintainers are more amenable to.

…t requests

add x/exp

capacity and inflight are uint64

change min kpa for test
Copy link

knative-prow bot commented Mar 16, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: elijah-rou
Once this PR has been reviewed and has the lgtm label, please assign skonto for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot requested review from dsimansk and skonto March 16, 2025 21:15
Copy link

knative-prow bot commented Mar 16, 2025

Hi @elijah-rou. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@knative-prow knative-prow bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant