fix: clear stale pod-name annotation instead of hard error#521
fix: clear stale pod-name annotation instead of hard error#521noeljackson wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
When the pod tracked by agents.x-k8s.io/pod-name doesn't exist (deleted during warm pool rotation, eviction, or image pull failure), the controller returned a hard error, leaving the Sandbox stuck in a reconcile loop unable to create a replacement pod. Now the controller clears the stale annotation and falls through to pod creation. The new pod gets tracked via ensurePodNameAnnotation.
✅ Deploy Preview for agent-sandbox canceled.
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: noeljackson The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @noeljackson. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary
When the pod tracked by
agents.x-k8s.io/pod-nameannotation doesn't exist, clear the stale annotation and fall through to pod creation instead of returning a hard error.Problem
The
ensurePodNameAnnotationfunction (commit 32cddd3) records the backing pod's name on the Sandbox CR. This is used for stable pod tracking across reconciliations. However, when the annotated pod is deleted (warm pool rotation, eviction, image pull failure),reconcilePodreturns a hard error:The controller never reaches PATH 3 (create pod). The Sandbox is stuck in a reconcile error loop and the warm pool never becomes ready.
Fix
When the annotated pod isn't found, clear the stale annotation and let
pod = nilfall through to pod creation:The subsequent
ensurePodNameAnnotationcall after pod creation re-sets the annotation to track the new pod.Test plan
TestReconcilePodClearsStaleAnnotation— sandbox with stale annotation pointing to non-existent pod creates a new pod and updates the annotation