optimization: replace .Update() with .Patch() for sandbox updateStatus#509
Conversation
✅ Deploy Preview for agent-sandbox canceled.
|
|
Duplicate of #508 or different? |
|
Never mind - different controllers! |
|
If there's a conflict here, does that mean a different controller is also updating Sandbox Status? |
I think the situation here is two claim requests trying to claim the same sandbox. |
|
Hello @justinsb @aditya-shantanu I ran two very small tests by having 1 claim with a warmpool of size 2. This is the timeline of api calls. This is from OSS main (no optimizations).
Also, because the r.Update() method causes K8s to throw 409 Conflicts, the reconcile loop is forced to restart and execute from the top five separate times. Every time it restarts, it fires a DELETE request for a Network Policy that doesn't exist, resulting in five wasted 404 Not Found API calls (maybe this is a separate issue to fix and get from cache)
On the other hand with .Patch() for the sandbox status we see the following behavior:
So technically, two controllers are acting on the exact same resource but they are acting on different sub-domains of that resource. The claim controls owns the metadata, and the sandbox controller owns the status. The 409 Conflicts weren't caused by a logic collision; they were caused by |
dfff39d to
3a8fbf2
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: barney-s, vicentefb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@vicentefb do we see conflicts in status update behavior with single resource (not scale testing) |
In an effort to reduce "Operation cannot be fulfilled..." conflicts at scale, this PR switches to patching to the status of Sandbox resource status.
Tests from main without this change indicate:
526 operation cannot be fulfilled conflicts from sandboxclaim (protoPayload.resourceName="pods/sandboxclaim-" OR protoPayload.resourceName="sandboxclaims/")
With this change, it decreased to ~16 conflicts.
Test parameters:
Deployment args: