Skip to content

feat(chart,api): allow resource requests/limits for API-managed pods#423

Open
pkobielak wants to merge 1 commit into
paradigmxyz:mainfrom
duneanalytics:feat/api-managed-pod-resources
Open

feat(chart,api): allow resource requests/limits for API-managed pods#423
pkobielak wants to merge 1 commit into
paradigmxyz:mainfrom
duneanalytics:feat/api-managed-pod-resources

Conversation

@pkobielak
Copy link
Copy Markdown

Summary

The pods the API creates at runtime — the iron-proxy Pods (the API self-proxy and every per-sandbox proxy), the tool-server sidecar, and the workflow-run pod — had no way to set Kubernetes resource requests/limits. The chart exposed no knob and the container specs in services/api/api/sandbox/kubernetes.py were hardcoded, so these pods ran with no scheduler reservation or OOM/throttle ceiling (the workflow-run pod only incidentally inherited the sandbox sizing).

This wires per-pod resources through the existing sandbox pattern (chart values → API env → Pod spec), sizing each API-managed container independently:

  • chart values: ironProxy.apiResources, ironProxy.sandboxResources, toolServer.resources, workflowRun.resources.
  • workloads template: emit KUBERNETES_{API_PROXY,SANDBOX_PROXY,TOOL_SERVER,WORKFLOW_RUN}_* resource env from the api Deployment, guarded so unset keys emit nothing.
  • api: _resources_from_env() (no defaults) for the proxies and tool-server; _resources_from_env_with_default_limits() for workloads that historically ran with implicit cpu=2/memory=4Gi limits (the sandbox and workflow-run pods). The shared proxy spec selects API vs per-sandbox values by sandbox id.

Behavior is preserved when values are unset: the proxies and tool-server stay unconstrained, and the workflow-run pod keeps its prior sandbox-equivalent sizing.

Fixes #420

Testing

  • helm template: each knob renders its KUBERNETES_* env independently; partial specs work; unset emits no resource env (and the workflow-run default reproduces the prior sandbox-equivalent sizing).
  • ruff check: clean.
  • pytest tests/test_sandbox_kubernetes_backend.py: new full/partial/empty tests for all four knobs pass; existing sandbox/_pod_resources tests unchanged.

The API-managed iron-proxy Pods (the API self-proxy and every per-sandbox
proxy), the tool-server sidecar, and the workflow-run pod had no dedicated way
to set Kubernetes resource requests/limits: the chart exposed no knobs and the
container specs in sandbox/kubernetes.py were hardcoded (the workflow-run pod
incidentally reused the sandbox sizing).

Mirror the existing sandbox pattern (chart values -> API env -> Pod spec) and
size every API-managed container independently:

- chart values: ironProxy.apiResources (API self-proxy), ironProxy.sandboxResources
  (per-sandbox proxy), toolServer.resources (sidecar), and workflowRun.resources
  (workflow-run pod).
- workloads template: emit KUBERNETES_{API_PROXY,SANDBOX_PROXY,TOOL_SERVER,
  WORKFLOW_RUN}_* resource env from the api Deployment, guarded so unset keys
  emit nothing.
- api: _resources_from_env() (no defaults) for the proxies and tool-server;
  _resources_from_env_with_default_limits() for workloads that historically ran
  with implicit cpu=2/memory=4Gi limits (the sandbox and workflow-run pods). The
  proxy spec picks API vs per-sandbox values by sandbox id.

Behavior is preserved when values are unset: the proxies and tool-server stay
unconstrained, and the workflow-run pod keeps its prior sandbox-equivalent
sizing. Tests cover full/partial/empty env mapping for all knobs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API-managed pods (iron-proxy, tool-server, workflow-run) can't set resource requests/limits

1 participant