feat(chart,api): allow resource requests/limits for API-managed pods#423
Open
pkobielak wants to merge 1 commit into
Open
feat(chart,api): allow resource requests/limits for API-managed pods#423pkobielak wants to merge 1 commit into
pkobielak wants to merge 1 commit into
Conversation
The API-managed iron-proxy Pods (the API self-proxy and every per-sandbox
proxy), the tool-server sidecar, and the workflow-run pod had no dedicated way
to set Kubernetes resource requests/limits: the chart exposed no knobs and the
container specs in sandbox/kubernetes.py were hardcoded (the workflow-run pod
incidentally reused the sandbox sizing).
Mirror the existing sandbox pattern (chart values -> API env -> Pod spec) and
size every API-managed container independently:
- chart values: ironProxy.apiResources (API self-proxy), ironProxy.sandboxResources
(per-sandbox proxy), toolServer.resources (sidecar), and workflowRun.resources
(workflow-run pod).
- workloads template: emit KUBERNETES_{API_PROXY,SANDBOX_PROXY,TOOL_SERVER,
WORKFLOW_RUN}_* resource env from the api Deployment, guarded so unset keys
emit nothing.
- api: _resources_from_env() (no defaults) for the proxies and tool-server;
_resources_from_env_with_default_limits() for workloads that historically ran
with implicit cpu=2/memory=4Gi limits (the sandbox and workflow-run pods). The
proxy spec picks API vs per-sandbox values by sandbox id.
Behavior is preserved when values are unset: the proxies and tool-server stay
unconstrained, and the workflow-run pod keeps its prior sandbox-equivalent
sizing. Tests cover full/partial/empty env mapping for all knobs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The pods the API creates at runtime — the iron-proxy Pods (the API self-proxy and every per-sandbox proxy), the
tool-serversidecar, and the workflow-run pod — had no way to set Kubernetes resource requests/limits. The chart exposed no knob and the container specs inservices/api/api/sandbox/kubernetes.pywere hardcoded, so these pods ran with no scheduler reservation or OOM/throttle ceiling (the workflow-run pod only incidentally inherited the sandbox sizing).This wires per-pod resources through the existing sandbox pattern (chart values → API env → Pod spec), sizing each API-managed container independently:
ironProxy.apiResources,ironProxy.sandboxResources,toolServer.resources,workflowRun.resources.KUBERNETES_{API_PROXY,SANDBOX_PROXY,TOOL_SERVER,WORKFLOW_RUN}_*resource env from the api Deployment, guarded so unset keys emit nothing._resources_from_env()(no defaults) for the proxies and tool-server;_resources_from_env_with_default_limits()for workloads that historically ran with implicit cpu=2/memory=4Gi limits (the sandbox and workflow-run pods). The shared proxy spec selects API vs per-sandbox values by sandbox id.Behavior is preserved when values are unset: the proxies and tool-server stay unconstrained, and the workflow-run pod keeps its prior sandbox-equivalent sizing.
Fixes #420
Testing
helm template: each knob renders itsKUBERNETES_*env independently; partial specs work; unset emits no resource env (and the workflow-run default reproduces the prior sandbox-equivalent sizing).ruff check: clean.pytest tests/test_sandbox_kubernetes_backend.py: new full/partial/empty tests for all four knobs pass; existing sandbox/_pod_resourcestests unchanged.