fix: stop codeinterpreter controller from spamming status updates#261
fix: stop codeinterpreter controller from spamming status updates#261Sanchit2662 wants to merge 2 commits intovolcano-sh:mainfrom
Conversation
Signed-off-by: Sanchit2662 <[email protected]>
|
Hi @hzxuzhonghu , Please review the fix. |
There was a problem hiding this comment.
Pull request overview
This PR fixes a reconciliation loop issue in the CodeInterpreter controller that was causing excessive status updates. The controller was continuously triggering itself due to status writes updating the resource version and LastTransitionTime on every reconciliation, regardless of whether the actual state changed. The fix uses two complementary approaches: replacing the manual condition handling with apimeta.SetStatusCondition() (which only updates LastTransitionTime on actual transitions) and adding a GenerationChangedPredicate to filter out status-only updates before they reach the reconciliation queue.
Changes:
- Replaced manual condition update logic with
apimeta.SetStatusCondition()which handles LastTransitionTime correctly - Added
GenerationChangedPredicateto the CodeInterpreter controller to filter status-only watch events - Added necessary imports for the new functionality
- Improved inline documentation explaining the fix
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/workloadmanager/codeinterpreter_controller.go | Refactored status update logic to use SetStatusCondition instead of manual condition handling, removing the problematic unconditional metav1.Now() calls |
| cmd/workload-manager/main.go | Added GenerationChangedPredicate to filter status-only updates and added supporting imports |
| // updateStatus updates the CodeInterpreter status | ||
| func (r *CodeInterpreterReconciler) updateStatus(ctx context.Context, ci *runtimev1alpha1.CodeInterpreter) error { | ||
| // Update status | ||
| ci.Status.Ready = true |
There was a problem hiding this comment.
The ci.Status.Ready field is set to true unconditionally on every reconciliation. This causes the status object to be marked as changed even when the condition hasn't transitioned, which means the status will still be written to etcd and trigger another reconciliation cycle. Consider only setting it if the current value is false or check if it has already been set to true before modifying it.
There was a problem hiding this comment.
Code Review
This pull request addresses an infinite reconciliation loop in the CodeInterpreter controller by introducing a GenerationChangedPredicate and replacing manual condition updates with apimeta.SetStatusCondition. I have identified a significant maintainability issue: the controller setup logic is duplicated between main.go and the reconciler's own SetupWithManager method, leading to potential inconsistencies and an uninitialized manager field that could cause runtime errors.
cmd/workload-manager/main.go
Outdated
| if err := ctrl.NewControllerManagedBy(mgr). | ||
| For(&runtimev1alpha1.CodeInterpreter{}). | ||
| For(&runtimev1alpha1.CodeInterpreter{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})). | ||
| Complete(codeInterpreterReconciler); err != nil { |
There was a problem hiding this comment.
The controller setup logic for CodeInterpreter is currently duplicated here and in the SetupWithManager method within pkg/workloadmanager/codeinterpreter_controller.go. This duplication has already led to an inconsistency: the GenerationChangedPredicate fix is applied here but is missing in the reconciler's own setup method. To improve maintainability and ensure the fix is applied consistently, consider consolidating this logic by having main.go call codeInterpreterReconciler.SetupWithManager(mgr) and moving the predicate configuration into that method. Additionally, the mgr field in the reconciler struct is currently uninitialized in main.go, which will cause GetCodeInterpreter to return nil unexpectedly.
There was a problem hiding this comment.
can you please also resolve this coment
There was a problem hiding this comment.
Moving the predicate configuration into SetupWithManager and calling that from main.go would keep the setup consistent.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #261 +/- ##
==========================================
+ Coverage 43.32% 43.37% +0.04%
==========================================
Files 30 30
Lines 2613 2610 -3
==========================================
Hits 1132 1132
+ Misses 1358 1355 -3
Partials 123 123
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // SetStatusCondition only updates LastTransitionTime when the condition | ||
| // Status actually changes, preventing spurious status writes that would | ||
| // trigger an infinite reconciliation loop. | ||
| apimeta.SetStatusCondition(&ci.Status.Conditions, metav1.Condition{ | ||
| Type: "Ready", | ||
| Status: metav1.ConditionTrue, | ||
| Reason: "Reconciled", | ||
| Message: "CodeInterpreter is ready", | ||
| LastTransitionTime: metav1.Now(), | ||
| ObservedGeneration: ci.Generation, | ||
| } | ||
|
|
||
| // Update or add condition | ||
| conditionIndex := -1 | ||
| for i, cond := range ci.Status.Conditions { | ||
| if cond.Type == "Ready" { | ||
| conditionIndex = i | ||
| break | ||
| } | ||
| } | ||
|
|
||
| if conditionIndex >= 0 { | ||
| ci.Status.Conditions[conditionIndex] = readyCondition | ||
| } else { | ||
| ci.Status.Conditions = append(ci.Status.Conditions, readyCondition) | ||
| } | ||
| }) |
There was a problem hiding this comment.
This method still calls Status().Update() on every reconcile? It would be safer to skip the status update when nothing actually changed.
cmd/workload-manager/main.go
Outdated
| if err := ctrl.NewControllerManagedBy(mgr). | ||
| For(&runtimev1alpha1.CodeInterpreter{}). | ||
| For(&runtimev1alpha1.CodeInterpreter{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})). | ||
| Complete(codeInterpreterReconciler); err != nil { |
There was a problem hiding this comment.
Moving the predicate configuration into SetupWithManager and calling that from main.go would keep the setup consistent.
Signed-off-by: Sanchit2662 <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: acsoto The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
so the codeinterpreter controller had a pretty nasty reconcile loop going on. every reconcile was calling updateStatus which was stamping a fresh metav1.Now() on the condition every single time, no matter what. kubernetes sees a different timestamp, treats it as a real change, writes it to etcd, bumps resourceversion, informer catches it, re-enqueues the object, reconcile fires again. rinse and repeat, forever, for every codeinterpreter object on the cluster.
no errors surfaced anywhere. controller looked perfectly healthy. it was just silently hammering the api server and etcd with pointless writes the entire time it was running. scales linearly with how many codeinterpreter objects you have so it gets worse over time.
fixed it two ways , swapped metav1.Now() out for apimeta.SetStatusCondition which only touches LastTransitionTime when the condition actually transitions, so when it's already ready nothing gets dirtied. also added GenerationChangedPredicate on the controller builder so status-only watch events get dropped before they even reach the reconcile queue.
Special notes for your reviewer:
the two changes together are belt-and-suspenders. SetStatusCondition is the real fix , it stops the status object from being dirty on no-op reconciles. the predicate is a second layer so even if something else causes a status diff it won't re-trigger reconciliation. both changes are tiny and self-contained, no behavior changes outside the reconcile loop.