Skip to content

feat(api-rs): serve tools + overlay to agent sandboxes#443

Merged
Zygimantass merged 10 commits into
paradigmxyz:api-rs-control-planefrom
0xSplits:feat/api-rs-sandbox-tools-overlay
Jun 10, 2026
Merged

feat(api-rs): serve tools + overlay to agent sandboxes#443
Zygimantass merged 10 commits into
paradigmxyz:api-rs-control-planefrom
0xSplits:feat/api-rs-sandbox-tools-overlay

Conversation

@0xdiid

@0xdiid 0xdiid commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Agents spawned by the api-rs control plane come up with none of the deployment tools and no organization overlay. The control plane registers credentials at the egress proxy but never gets the tool sources or overlay tree into the agent container, so the boot-time installer has nothing to turn into commands. This affects every ingress riding api-rs.

This finishes the last mile, following the repo-cache architecture: at sandbox boot, an init container pulls the tools from their source repository at a pinned revision into the directory the boot-time installer scans, and the organization overlay rides the same overlay mechanism workflow-host sandboxes already use — same image, same configuration, same mount point — extended to also stage the overlay's system prompt for the agent. One code path shapes the overlay for every kind of sandbox, so the two can't collide or drift. Adding or changing a tool becomes a push to the repository — no image rebuild, live in the next sandbox.

The pull rides the same per-sandbox egress proxy as everything else (the locked-down egress policies allow no direct outbound, and the source host is already on the baseline allowlist), so no network policy changes are needed. A token can be supplied for private sources. Secrets stay untouched: placeholders in the sandbox, real values injected at the proxy. The tools clone is opt-in and additive through the existing values; the overlay needs no new configuration at all — deployments that already configure an overlay image get it in agent sandboxes automatically.

flowchart TB
    GH[("tools repo<br/>(pinned revision)")]
    OI[("overlay image")]

    subgraph CP["control plane"]
        TD["tool discovery<br/>(own tools copy)"] --> ICTL["credential grants"]
    end

    subgraph POD["agent sandbox pod"]
        TB["tools bootstrap<br/>(init)"] --> TV[("tools dir")]
        OB["overlay bootstrap<br/>(init, shared with<br/>workflow hosts)"] --> OV[("overlay dir")]
        OB --> SP[("prompt overlay")]
        TV --> SI["boot-time installer"]
        OV --> SI
        SI -->|"command shims"| AG["agent"]
        SP -->|"appended to base prompt"| AG
    end

    PX["per-sandbox egress proxy"]

    TB -->|"git clone (via proxy)"| PX
    PX -->|"allowlisted"| GH
    OI -->|"image copy"| OB
    AG -->|"tool calls, placeholder creds"| PX
    PX -->|"real creds injected"| EXT["external APIs"]
    ICTL -.->|"renders proxy config"| PX
Loading

One caveat worth calling out: the control plane still reads its own copy of the tools to decide which credentials to grant, so the pinned revision should track the set that copy carries until discovery moves to the same source.

🤖 Generated with Claude Code

@0xdiid 0xdiid marked this pull request as ready for review June 9, 2026 18:00
@0xdiid 0xdiid marked this pull request as draft June 9, 2026 18:33
@0xdiid 0xdiid marked this pull request as ready for review June 9, 2026 21:48
@Zygimantass Zygimantass force-pushed the api-rs-control-plane branch from e2d3db4 to eb0969b Compare June 10, 2026 14:35
0xdiid and others added 9 commits June 10, 2026 11:24
api-rs sandboxes had no tools and no overlay. Give api-rs-spawned agents the
same base + overlay tools and overlay system-prompt the chart already wires for
the api-rs pod, using upstream's CLI-shim tool model rather than a sidecar.

Upstream direction: tools are shell CLI shims, not an HTTP registry. The agent
image's install-tool-shims (services/sandbox/install_tool_shims.py) scans
TOOL_DIRS at entrypoint and `uvx`-installs each pyproject [project.scripts] as a
CLI; the SYSTEM_PROMPT points agents at those CLIs and `centaur-tools list`. The
old `call <tool>` HTTP registry is deprecated to control-plane-only. Tool
secrets are already handled upstream: codex_app_server_env_template pushes the
tool placeholder creds onto the agent env, iron-control grants the per-sandbox
principal the real secrets, and Postgres rides proxied `*_DSN` env from
apply_proxy_env. So the agent needs only the tool SOURCES at the right paths —
no sidecar, no HMAC sandbox token, no loopback tool server.

- tools.rs (replaces tool_server.rs): a `tools-bootstrap` init container copies
  /app/tools out of the shared centaur-api image into an emptyDir mounted at
  /app/tools in the agent, and an `overlay-bootstrap` init container copies the
  org overlay tree into overlay-root mounted at overlay.mountPath (the same path
  the api-rs Deployment uses) and stages the overlay's SYSTEM_PROMPT.md as
  $HOME/AGENTS_OVERLAY.md, which the sandbox entrypoint appends to the base
  prompt. TOOL_DIRS is set on the agent env to /app/tools (or
  /app/tools:<mountPath>/tools with the overlay) — identical to the value the
  api-rs pod computes for its own tool discovery, set deterministically in the
  spec builder rather than via passthrough env.
- lib.rs: build_agent_sandbox layers the tools/overlay env over spec.env, mounts
  the bootstrapped sources read-only into the agent, and appends the
  tools-bootstrap + overlay-bootstrap init containers and their volumes. No
  sidecar container, no token minting.
- args.rs: a minimal ToolsArgs (source image/pull-policy, reusing the
  KUBERNETES_TOOL_SERVER_IMAGE* env the chart sets from the shared api image) and
  OverlayArgs (image/pull-policy/source-path/mount-path) wired into
  AgentSandboxConfig. Explicit clap arg ids avoid id collisions with the other
  flattened arg structs.
- chart apirs.yaml: render the tools source image (api.image.*, gated on
  toolServer.enabled) and overlay (overlay.*) onto the api-rs env, replacing the
  KUBERNETES_TOOL_SERVER_* sidecar block.

Gone vs the sidecar port: tool_server.rs, the sbx1 HMAC token minting and its
SANDBOX_SIGNING_KEY requirement, CENTAUR_TOOLS_URL, the sidecar pg-DSN/proxy-env
collection, and the hmac/base64/sha2 dependency additions (nothing else in the
agent-k8s crate uses them).

Warm-pool sandboxes route through the same build_agent_sandbox path, so they get
the tools/overlay init containers and volumes for free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The tools-bootstrap init container mounted the tools emptyDir at
/app/tools — the same path it copies FROM. The mount shadows the source
image's tools tree, so the script self-copies the empty volume and GNU
cp rejects it (exit 1); every sandbox dies with 'reached terminal state
before running' and no agent ever starts.

Mount the volume at /tools-bootstrap instead (mirroring how
overlay-bootstrap stages to a distinct target) and copy the image's
/app/tools into it. The agent container keeps mounting the same volume
at /app/tools, so TOOL_DIRS and the shim installer are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Gate overlay env, volumes, and mounts independently from the tools source image so overlay-only sandbox configs produce valid pod specs.
Set an fsGroup on sandbox pods that use tools or overlays so non-root bootstrap init containers can populate their emptyDir mounts.
The tools-bootstrap init container copied /app/tools from .Values.api.image
(centaur-api), but api-rs discovers its tools from /app/tools in its own
container (.Values.apiRs.image). Sourcing from a different image risked the
agent installing a different tool set than api-rs granted per-sandbox creds
for. Source from the same api-rs image the Deployment runs so the two match
by construction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replumb the tools-bootstrap init container to git-clone the tools repo at a
pinned ref into each sandbox's /app/tools (sparse on the tools subdir; GitHub
token via askpass for private repos), instead of copying /app/tools out of the
api-rs image. Mirrors the repo-cache architecture — clone a repo into a
pre-provisioned directory — without sharing its node-level cache, so adding a
tool is a push to the repo rather than an api-rs image rebuild.

api-rs still discovers its own /app/tools to grant proxy creds, so pin
toolServer.ref to the tool set the image carries to avoid drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The sandbox NetworkPolicy only allows egress to the sandbox's iron-proxy,
api-rs, and DNS, so the tools-bootstrap init container's direct git clone to
github.com is blocked whenever iron-proxy is enabled. Route the clone through
the proxy like all other sandbox egress: export HTTPS_PROXY (the resolved
per-sandbox proxy URL apply_proxy_env already put on the spec) and
GIT_SSL_CAINFO, and mount the pod's existing firewall-ca volume into the init
container. github.com/api.github.com are already in the baseline proxy
allowlist, so no policy or allowlist changes are needed. Without iron-proxy
the clone still goes direct.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
These are operator config (helm values -> env -> clap), not user input, but
interpolating them bare into the /bin/sh -ec script means a stray space or
metacharacter breaks in the shell instead of loudly in git. Quote them at
every interpolation site.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The per-sandbox iron-proxy is created in the same reconcile as the Sandbox CR
and isn't accepting connections yet when the tools-bootstrap init container
first runs — the clone dies with connection-refused, and an init failure is
terminal for the Sandbox (no kubelet retry), so every cold spawn failed with
'reached terminal state before running'. Wrap the clone/sparse-checkout/ref
fetch in a bounded retry loop (30 x 2s) so the init container rides out the
proxy's startup instead of killing the sandbox.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
(cherry picked from commit b1f274d)
@0xdiid 0xdiid force-pushed the feat/api-rs-sandbox-tools-overlay branch from 4ef3522 to f1e41be Compare June 10, 2026 17:25
…image plumbing

The base branch grew its own overlay mechanism (SandboxSpec.overlay +
overlay_json) for workflow-host sandboxes, configured by the same
CENTAUR_OVERLAY_* env this branch's OverlayArgs read — so a workflow-host
pod with an overlay configured got two init containers and two volumes
with identical names, which Kubernetes rejects.

Adopt the upstream plumbing wholesale: the backend default is now an
OverlayImage from the same env helper the workflow host uses (the
OverlayArgs flags are gone), a spec-level overlay takes precedence over
the backend default so only one overlay-bootstrap/overlay-root pair ever
exists, and agent sandboxes mount the overlay at /opt/centaur/overlay
like workflow hosts do. The AGENTS_OVERLAY.md prompt staging moves into
the shared overlay_json path, and the chart's duplicated CENTAUR_OVERLAY_*
env block is dropped — the upstream block already feeds it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Zygimantass Zygimantass merged commit 60f7eac into paradigmxyz:api-rs-control-plane Jun 10, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants