Skip to content

fix: stop codeinterpreter controller from spamming status updates#261

Open
Sanchit2662 wants to merge 2 commits intovolcano-sh:mainfrom
Sanchit2662:fix/codeinterpreter-reconcile-storm
Open

fix: stop codeinterpreter controller from spamming status updates#261
Sanchit2662 wants to merge 2 commits intovolcano-sh:mainfrom
Sanchit2662:fix/codeinterpreter-reconcile-storm

Conversation

@Sanchit2662
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

so the codeinterpreter controller had a pretty nasty reconcile loop going on. every reconcile was calling updateStatus which was stamping a fresh metav1.Now() on the condition every single time, no matter what. kubernetes sees a different timestamp, treats it as a real change, writes it to etcd, bumps resourceversion, informer catches it, re-enqueues the object, reconcile fires again. rinse and repeat, forever, for every codeinterpreter object on the cluster.
no errors surfaced anywhere. controller looked perfectly healthy. it was just silently hammering the api server and etcd with pointless writes the entire time it was running. scales linearly with how many codeinterpreter objects you have so it gets worse over time.
fixed it two ways , swapped metav1.Now() out for apimeta.SetStatusCondition which only touches LastTransitionTime when the condition actually transitions, so when it's already ready nothing gets dirtied. also added GenerationChangedPredicate on the controller builder so status-only watch events get dropped before they even reach the reconcile queue.

Special notes for your reviewer:

the two changes together are belt-and-suspenders. SetStatusCondition is the real fix , it stops the status object from being dirty on no-op reconciles. the predicate is a second layer so even if something else causes a status diff it won't re-trigger reconciliation. both changes are tiny and self-contained, no behavior changes outside the reconcile loop.

Copilot AI review requested due to automatic review settings April 8, 2026 12:29
@Sanchit2662
Copy link
Copy Markdown
Contributor Author

Hi @hzxuzhonghu , Please review the fix.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a reconciliation loop issue in the CodeInterpreter controller that was causing excessive status updates. The controller was continuously triggering itself due to status writes updating the resource version and LastTransitionTime on every reconciliation, regardless of whether the actual state changed. The fix uses two complementary approaches: replacing the manual condition handling with apimeta.SetStatusCondition() (which only updates LastTransitionTime on actual transitions) and adding a GenerationChangedPredicate to filter out status-only updates before they reach the reconciliation queue.

Changes:

  • Replaced manual condition update logic with apimeta.SetStatusCondition() which handles LastTransitionTime correctly
  • Added GenerationChangedPredicate to the CodeInterpreter controller to filter status-only watch events
  • Added necessary imports for the new functionality
  • Improved inline documentation explaining the fix

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pkg/workloadmanager/codeinterpreter_controller.go Refactored status update logic to use SetStatusCondition instead of manual condition handling, removing the problematic unconditional metav1.Now() calls
cmd/workload-manager/main.go Added GenerationChangedPredicate to filter status-only updates and added supporting imports

// updateStatus updates the CodeInterpreter status
func (r *CodeInterpreterReconciler) updateStatus(ctx context.Context, ci *runtimev1alpha1.CodeInterpreter) error {
// Update status
ci.Status.Ready = true
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ci.Status.Ready field is set to true unconditionally on every reconciliation. This causes the status object to be marked as changed even when the condition hasn't transitioned, which means the status will still be written to etcd and trigger another reconciliation cycle. Consider only setting it if the current value is false or check if it has already been set to true before modifying it.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an infinite reconciliation loop in the CodeInterpreter controller by introducing a GenerationChangedPredicate and replacing manual condition updates with apimeta.SetStatusCondition. I have identified a significant maintainability issue: the controller setup logic is duplicated between main.go and the reconciler's own SetupWithManager method, leading to potential inconsistencies and an uninitialized manager field that could cause runtime errors.

Comment on lines 183 to 185
if err := ctrl.NewControllerManagedBy(mgr).
For(&runtimev1alpha1.CodeInterpreter{}).
For(&runtimev1alpha1.CodeInterpreter{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Complete(codeInterpreterReconciler); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The controller setup logic for CodeInterpreter is currently duplicated here and in the SetupWithManager method within pkg/workloadmanager/codeinterpreter_controller.go. This duplication has already led to an inconsistency: the GenerationChangedPredicate fix is applied here but is missing in the reconciler's own setup method. To improve maintainability and ensure the fix is applied consistently, consider consolidating this logic by having main.go call codeInterpreterReconciler.SetupWithManager(mgr) and moving the predicate configuration into that method. Additionally, the mgr field in the reconciler struct is currently uninitialized in main.go, which will cause GetCodeInterpreter to return nil unexpectedly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please also resolve this coment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the predicate configuration into SetupWithManager and calling that from main.go would keep the setup consistent.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 8, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.37%. Comparing base (27274ee) to head (57f6d84).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloadmanager/codeinterpreter_controller.go 0.00% 10 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #261      +/-   ##
==========================================
+ Coverage   43.32%   43.37%   +0.04%     
==========================================
  Files          30       30              
  Lines        2613     2610       -3     
==========================================
  Hits         1132     1132              
+ Misses       1358     1355       -3     
  Partials      123      123              
Flag Coverage Δ
unittests 43.37% <0.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +103 to +112
// SetStatusCondition only updates LastTransitionTime when the condition
// Status actually changes, preventing spurious status writes that would
// trigger an infinite reconciliation loop.
apimeta.SetStatusCondition(&ci.Status.Conditions, metav1.Condition{
Type: "Ready",
Status: metav1.ConditionTrue,
Reason: "Reconciled",
Message: "CodeInterpreter is ready",
LastTransitionTime: metav1.Now(),
ObservedGeneration: ci.Generation,
}

// Update or add condition
conditionIndex := -1
for i, cond := range ci.Status.Conditions {
if cond.Type == "Ready" {
conditionIndex = i
break
}
}

if conditionIndex >= 0 {
ci.Status.Conditions[conditionIndex] = readyCondition
} else {
ci.Status.Conditions = append(ci.Status.Conditions, readyCondition)
}
})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method still calls Status().Update() on every reconcile? It would be safer to skip the status update when nothing actually changed.

Comment on lines 183 to 185
if err := ctrl.NewControllerManagedBy(mgr).
For(&runtimev1alpha1.CodeInterpreter{}).
For(&runtimev1alpha1.CodeInterpreter{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Complete(codeInterpreterReconciler); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the predicate configuration into SetupWithManager and calling that from main.go would keep the setup consistent.

Copy link
Copy Markdown
Member

@acsoto acsoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: acsoto
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants