Skip to content

feat: support workspace resource overrides on SandboxClaims#459

Open
noeljackson wants to merge 4 commits intokubernetes-sigs:mainfrom
noeljackson:pr/workspace-resources-only
Open

feat: support workspace resource overrides on SandboxClaims#459
noeljackson wants to merge 4 commits intokubernetes-sigs:mainfrom
noeljackson:pr/workspace-resources-only

Conversation

@noeljackson
Copy link
Copy Markdown
Contributor

@noeljackson noeljackson commented Mar 22, 2026

Summary

  • add spec.workspaceResources to SandboxClaim so a claim can request workspace-specific CPU, memory, and ephemeral-storage sizing
  • apply those workspace resource overrides when creating a sandbox from a claim
  • apply the same workspace resource overrides when adopting a sandbox from a warm pool
  • keep the behavior scoped to the container named workspace so non-workspace templates remain unchanged

Why

Today SandboxClaim cannot carry per-claim workspace sizing through the CRD/controller path. In practice that means claims requesting larger workspaces still adopt or create sandboxes using the template defaults.

This change adds a narrow claim-level resource override for the workspace container so the requested sizing survives both cold create and warm-pool adoption.

Scope

  • includes workspaceResources only
  • does not reintroduce envOverrides
  • does not add generic per-container mutation

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 22, 2026

Deploy Preview for agent-sandbox ready!

Name Link
🔨 Latest commit 7dd1b10
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/69cfb3500a4f0d0008217a69
😎 Deploy Preview https://deploy-preview-459--agent-sandbox.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot requested review from barney-s and igooch March 22, 2026 09:33
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 22, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @noeljackson. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 22, 2026
@noeljackson noeljackson force-pushed the pr/workspace-resources-only branch from 518b632 to 9347f1d Compare March 22, 2026 11:35
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 22, 2026
@noeljackson noeljackson marked this pull request as ready for review March 22, 2026 11:36
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2026
@k8s-ci-robot k8s-ci-robot requested a review from janetkuo March 22, 2026 11:36
Copy link
Copy Markdown

@codebot-robot codebot-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this PR introduces a clean and narrow way to override workspace resources at the claim level. The struct additions, deep copying, and Sandbox template modifications are solidly implemented.

However, there is a critical flaw regarding the warm pool adoption path. The PR updates the Sandbox CR's PodTemplate upon adoption, but the sandbox_controller (which manages the actual Pod) currently ignores PodTemplate.Spec changes for existing, running pods. As a result, when a claim adopts a warm sandbox, the actual Pod will continue running with the template's default resources instead of the requested overrides.

To fix this, the system must either:

  1. Support in-place Pod resource resizing in sandbox_controller (requires K8s 1.27+ InPlacePodVerticalScaling), or
  2. Skip warm pool adoption if the claim's requested resources differ from the warm pool's default sizing, ensuring it falls back to a cold start to guarantee the correct sizing.

I've left detailed inline comments on this and a few minor optimizations.

(This review was generated by Overseer)

for k, v := range sandbox.Spec.PodTemplate.ObjectMeta.Labels {
if pod.Labels[k] != v {
pod.Labels[k] = v
changed = true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This correctly propagates annotations from the Sandbox to the Pod. However, note that any changes to Sandbox.Spec.PodTemplate.Spec (like the new WorkspaceResources) are completely ignored for existing pods in this reconcilePod path.

This breaks the warm-pool adoption of WorkspaceResources. If the intention is for adopted pods to resize, Pod patching must be implemented here for the .Spec.Containers[].Resources field.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Rather than modifying the sandbox controller's reconcilePod path (which has broader implications), #487 handles this from the claim controller side by patching the pod directly when workspaceResources change. This keeps the sandbox controller's existing behavior intact.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled from the claim controller side. Rather than changing sandbox_controller's reconcilePod behavior, the claim controller patches the pod directly when workspaceResources differ from what's running.

"agents.x-k8s.io/sandbox-name-hash": nameHash,
"custom-label": "label-val",
},
Annotations: map[string]string{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test effectively verifies that the annotation map is propagated, which aligns with the new logic in reconcilePod. Good addition.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@noeljackson noeljackson force-pushed the pr/workspace-resources-only branch from e68a2e3 to f361431 Compare March 26, 2026 11:24
@noeljackson
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review.

Warm pool resource drift: The sandbox_controller.reconcilePod path now includes generic resource drift detection — it compares container resources between the Sandbox CR's PodTemplate and the running pod using equality.Semantic.DeepEqual. When they differ (e.g., after warm pool adoption with workspace resource overrides), the pod is deleted and recreated with the correct sizing. This handles the concern without requiring in-place resizing.

resource.NewMilliQuantity / resource.NewQuantity: Applied — replaced fmt.Sprintf + MustParse with direct quantity constructors.

int32 vs *int32: Zero means "no override" by design. Removing limits set by a template is not a supported use case — the override is purely additive.

Redundant nil check: The ensureClaimIdentityLabels helper handles nil map initialization, so the manual check was removed (addressed in #455).

@noeljackson noeljackson force-pushed the pr/workspace-resources-only branch from f361431 to 71677e5 Compare March 29, 2026 09:47
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 29, 2026
@noeljackson noeljackson force-pushed the pr/workspace-resources-only branch from c62e038 to b3a0c58 Compare March 31, 2026 08:36
@noeljackson
Copy link
Copy Markdown
Contributor Author

Friendly bump. All review comments addressed, tests pass. Could a maintainer run /ok-to-test when convenient?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 2, 2026
@noeljackson noeljackson force-pushed the pr/workspace-resources-only branch from b3a0c58 to 6ce4cd1 Compare April 2, 2026 20:03
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 2, 2026
@aditya-shantanu
Copy link
Copy Markdown
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 2, 2026
@justinsb
Copy link
Copy Markdown
Contributor

justinsb commented Apr 3, 2026

Thanks for this @noeljackson

I think the propagation of the annotations from the "parent" Sandbox podTemplate to the Pod is very interesting from the sig-apps domain, particularly in light of in-place-pod-resize.

Historically, IIUC, label/annotations can be changed on a Pod dynamically, this does not cause a Pod restart. (The other mutable field is image, for "reasons"). On a Deployment/StatefulSet/Daemonset, you can change the labels/annotations on a podTemplate, this triggers new pods to be created (so it propagates, but in an expensive way).

Now we have in-place-pod-updates, I honestly don't know how this works with Deployment/Statefulset/Daemonset. I think it still causes pod restarts.

I think there's a strong case for Sandbox to lead the way here and prove out that in-place pod mutation is a good strategy for labels/annotations/resources (and maybe other fields that are mutable in future). But it's really a sig-apps question, because I don't want to set the wrong precedent.

@janetkuo and @soltysh please chime-in / keep me honest here.

Would it be helpful to have a different top-level issue for specifically this question?

And @noeljackson - it looks like maybe the annotation propagation from the Sandbox to the Pod is separable? If so, do you want to split it out so we can treat this PR as "only part of extensions/" - i.e. SandboxClaim only? I don't want to hold up your PR on what is a fairly fundamental design question, so might take a while (unless it's already decided and I just don't know about it, which is certainly possible!)

@noeljackson
Copy link
Copy Markdown
Contributor Author

Good call — the annotation propagation is separable. I'll split it out so this PR stays scoped to extensions/ (SandboxClaim workspaceResources only).

The Sandbox→Pod annotation/resource propagation can be a follow-up once the sig-apps design question is settled.

Will rebase and push a scoped version shortly.

When SandboxClaim.spec.workspaceResources is updated on a claim with an
existing running sandbox, the controller now patches the pod's container
resources in-place. On Kubernetes 1.27+ with InPlacePodVerticalScaling,
this triggers the kubelet to call UpdateContainerResources on the
container runtime, resizing the workload without restart.

Previously, resource changes on existing claims were silently ignored
('sandbox already exists, skipping update').
Remove Sandbox→Pod annotation propagation from sandbox_controller.go
per maintainer feedback. The in-place annotation/resource propagation
is a sig-apps design question and will be a separate follow-up.

This PR now only touches extensions/ (SandboxClaim workspaceResources).
@noeljackson noeljackson force-pushed the pr/workspace-resources-only branch from 6ce4cd1 to 7dd1b10 Compare April 3, 2026 12:32
@noeljackson
Copy link
Copy Markdown
Contributor Author

/retest

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: noeljackson
Once this PR has been reviewed and has the lgtm label, please ask for approval from janetkuo. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@noeljackson: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
presubmit-agent-sandbox-lint-api 7dd1b10 link true /test presubmit-agent-sandbox-lint-api

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:extensions cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants