diff --git a/docs/guides/centaur-install.md b/docs/guides/centaur-install.md new file mode 100644 index 00000000..17456050 --- /dev/null +++ b/docs/guides/centaur-install.md @@ -0,0 +1,151 @@ +# Install Centaur on Obol Stack + +[Centaur](https://github.com/paradigmxyz/centaur) is Paradigm's open-source +orchestrator of isolated coding agents — you @-mention a bot in Slack, the +bot spawns a per-conversation sandbox with shell + git + Python + Node, and +your harness of choice (codex / claude-code / amp / pi-mono) drives the work +back into the thread. + +Running it on the Obol stack gets you two things you don't get from a +raw-Kubernetes install: + +1. **Every Centaur agent runs through LiteLLM**, so any model in + `obol model list` — including `paid/` aliases purchased via x402 — + is available inside sandboxes with zero extra config. +2. **One-command install with auto-generated secrets**: postgres password, + firewall CA, signing keys, and the LiteLLM master key are wired up by a + chart-managed bootstrap Job. You only paste Slack tokens. + +> [!IMPORTANT] +> v1 supports a single LLM harness (`codex`, OpenAI-compatible through +> LiteLLM). Multi-harness selection, 1Password Connect, and gVisor sandboxing +> ship later. + +> [!IMPORTANT] +> The bootstrap Job copies the LiteLLM master key into the sandbox env so +> agents can call `paid/*` models. This is acceptable for single-user +> installs where your obol wallet bounds the spend. Multi-tenant deployments +> should wait for v2 (per-install LiteLLM virtual keys). + +## What you'll need + +- Obol stack running locally (`obol stack up`). +- A Slack app (instructions below) with three secrets ready: bot token, + signing secret, and a service-to-service API key. +- Cloudflare tunnel hostname (auto-provisioned by `obol stack up`). +- ~3 GB of free disk for the bundled Postgres PVC. + +## Step 1 — Create the Slack app + +1. Visit [api.slack.com/apps](https://api.slack.com/apps) → **Create New App** → **From scratch**. +2. **OAuth & Permissions** → add bot scopes: `app_mentions:read`, `chat:write`, + `channels:history`, `groups:history`, `im:history`, `mpim:history`. +3. **Install to Workspace** → copy the **Bot User OAuth Token** (`xoxb-...`). + That's your `SLACK_BOT_TOKEN`. +4. **Basic Information** → copy the **Signing Secret**. That's your + `SLACK_SIGNING_SECRET`. +5. Mint a random string for `SLACKBOT_API_KEY` (used between slackbot ↔ api + internally — not Slack-issued): + ```bash + openssl rand -hex 32 + ``` +6. **Event Subscriptions** → paste this URL once you have your tunnel hostname + (you'll get it back from `obol app sync centaur`): + ``` + https:///api/webhooks/slack + ``` +7. Subscribe to bot events: `app_mention`, `message.channels`, + `message.groups`, `message.im`, `message.mpim`. + +## Step 2 — Install + +```bash +export SLACK_BOT_TOKEN=xoxb-... +export SLACK_SIGNING_SECRET=... +export SLACKBOT_API_KEY=$(openssl rand -hex 32) + +obol app install obol/centaur \ + --set slack.botToken=$SLACK_BOT_TOKEN \ + --set slack.signingSecret=$SLACK_SIGNING_SECRET \ + --set slack.botApiKey=$SLACKBOT_API_KEY + +obol app sync centaur +``` + +The bootstrap Job runs automatically before the main pods come up; it +generates the postgres password, firewall CA, iron-proxy management key, +sandbox signing key, and reads the LiteLLM master key from `llm/litellm-secrets`. +No further secret juggling. + +When `obol app sync` finishes you'll see the slack webhook URL and the +internal REST endpoint: + +``` +tip: Configure your Slack app event subscription: + https:///api/webhooks/slack +tip: REST API: http://centaur.obol.stack +``` + +Paste the webhook URL into your Slack app (step 1.6 above). + +## Step 3 — Use it + +In any Slack channel where you've added the bot, mention it: + +> @centaur write a quick prime-sieve in python and report wall-clock for 1e8 + +The bot opens a thread, the API spawns an isolated sandbox pod, the harness +runs against your preferred LiteLLM-routed model, and progress streams back +into the thread. + +## What model runs inside the sandbox? + +Whatever's at the head of `obol model list`. The sandbox calls +`litellm.llm.svc.cluster.local:4000` with OpenAI semantics; LiteLLM picks the +model. To change it: + +```bash +obol model prefer paid/aeon # use a paid x402 model you've bought +obol model prefer qwen3.5:9b # use local Ollama (free, slower) +obol model sync # propagate +``` + +Centaur picks up the new default on the next sandbox spawn. + +## Troubleshooting + +**Slack `url_verification` fails on event subscription.** Tunnel may not be +running. Check `obol tunnel status`. + +**Sandbox spawns but agent times out.** Likely LiteLLM-side. Port-forward and +inspect: `kubectl port-forward -n llm svc/litellm 14000:4000`, then +`curl http://127.0.0.1:14000/v1/models`. + +**`centaur-bootstrap` Job in Error state.** Almost always RBAC: the Job needs +read access to `llm/litellm-secrets`. Confirm `obol stack up` finished +successfully before installing. + +**Clock skew on Slack webhooks.** Slack rejects webhooks more than 5 minutes +out of date. If your k3d host has drifted (laptop suspend, VM clock issues), +slack integration silently fails. Resync your host clock. + +## Tearing down + +```bash +obol app delete centaur +``` + +Removes the namespace, PVC (deletes the postgres data), and all generated +secrets. Slack app definition stays in your Slack workspace and can be reused +for a future install. + +## Reconfiguring + +Edit `~/.config/obol/applications/centaur//values.yaml`, then: + +```bash +obol app sync centaur +``` + +There's no `obol app configure` wizard yet — for v1 the edit-and-sync loop is +the supported reconfigure path. diff --git a/internal/app/app.go b/internal/app/app.go index e2f899d2..fc3105e6 100644 --- a/internal/app/app.go +++ b/internal/app/app.go @@ -39,26 +39,41 @@ func Install(cfg *config.Config, u *ui.UI, chartRef string, opts InstallOptions) return err } - // 2. If repo/chart format, resolve via ArtifactHub + // 2. If repo/chart format, resolve the repo URL. Prefer the user's local + // `helm repo` config so charts published to repos already added during + // `obol stack up` (notably `obol` → obolnetwork.github.io/helm-charts) + // resolve without an ArtifactHub round-trip — and keep working when + // ArtifactHub is unreachable. if chart.NeedsResolution() { - u.Info("Resolving chart via ArtifactHub...") + helmBinary := filepath.Join(cfg.BinDir, "helm") + if repos, err := helmcmd.LocalRepos(helmBinary); err == nil { + if url, ok := repos[chart.RepoName]; ok { + chart.RepoURL = url + u.Detail("Resolved (local helm repo)", fmt.Sprintf("%s/%s", chart.RepoName, chart.ChartName)) + u.Detail("Repository URL", url) + } + } - client := NewArtifactHubClient() + if chart.RepoURL == "" { + u.Info("Resolving chart via ArtifactHub...") - info, err := client.ResolveChart(chartRef) - if err != nil { - return err - } + client := NewArtifactHubClient() + + info, err := client.ResolveChart(chartRef) + if err != nil { + return err + } - chart.RepoURL = info.RepoURL + chart.RepoURL = info.RepoURL - chart.RepoName = info.RepoName - if chart.Version == "" { - chart.Version = info.Version - } + chart.RepoName = info.RepoName + if chart.Version == "" { + chart.Version = info.Version + } - u.Detail("Resolved", fmt.Sprintf("%s/%s version %s", info.RepoName, info.ChartName, info.Version)) - u.Detail("Repository URL", info.RepoURL) + u.Detail("Resolved", fmt.Sprintf("%s/%s version %s", info.RepoName, info.ChartName, info.Version)) + u.Detail("Repository URL", info.RepoURL) + } } // Apply version override from CLI flag diff --git a/internal/helmcmd/helmcmd.go b/internal/helmcmd/helmcmd.go index 7d04f714..bef69232 100644 --- a/internal/helmcmd/helmcmd.go +++ b/internal/helmcmd/helmcmd.go @@ -10,6 +10,8 @@ package helmcmd import ( + "encoding/json" + "errors" "fmt" "os" "os/exec" @@ -185,3 +187,39 @@ func UpdateRepos(helmBinary string, names []string) ([]byte, error) { out, err := cmd.CombinedOutput() return out, err } + +// LocalRepos returns the user's helm-CLI repo configuration as a name → URL +// map by running `helm repo list -o json`. When no repos are configured helm +// exits non-zero with "no repositories" on stderr; that case is reported as +// an empty map with a nil error so callers can treat it as "nothing matched" +// rather than a hard failure. +func LocalRepos(helmBinary string) (map[string]string, error) { + cmd := exec.Command(helmBinary, "repo", "list", "-o", "json") + out, err := cmd.Output() + if err != nil { + var ee *exec.ExitError + if errors.As(err, &ee) && strings.Contains(string(ee.Stderr), "no repositories") { + return map[string]string{}, nil + } + return nil, fmt.Errorf("helm repo list: %w", err) + } + return parseHelmRepoList(out) +} + +func parseHelmRepoList(data []byte) (map[string]string, error) { + var entries []struct { + Name string `json:"name"` + URL string `json:"url"` + } + if err := json.Unmarshal(data, &entries); err != nil { + return nil, fmt.Errorf("parse helm repo list json: %w", err) + } + repos := make(map[string]string, len(entries)) + for _, e := range entries { + if e.Name == "" || e.URL == "" { + continue + } + repos[e.Name] = e.URL + } + return repos, nil +} diff --git a/internal/helmcmd/helmcmd_test.go b/internal/helmcmd/helmcmd_test.go index aa68bba1..1be4f2bf 100644 --- a/internal/helmcmd/helmcmd_test.go +++ b/internal/helmcmd/helmcmd_test.go @@ -236,3 +236,41 @@ func contains(haystack, needle string) bool { } return false } + +func TestParseHelmRepoList(t *testing.T) { + cases := []struct { + name string + in string + want map[string]string + }{ + { + name: "two repos", + in: `[{"name":"obol","url":"https://obolnetwork.github.io/helm-charts/"},{"name":"ethpandaops","url":"https://ethpandaops.github.io/ethereum-helm-charts"}]`, + want: map[string]string{ + "obol": "https://obolnetwork.github.io/helm-charts/", + "ethpandaops": "https://ethpandaops.github.io/ethereum-helm-charts", + }, + }, + { + name: "empty array", + in: `[]`, + want: map[string]string{}, + }, + { + name: "entry without url skipped", + in: `[{"name":"broken","url":""},{"name":"obol","url":"https://example.com"}]`, + want: map[string]string{"obol": "https://example.com"}, + }, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + got, err := parseHelmRepoList([]byte(tc.in)) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if !reflect.DeepEqual(got, tc.want) { + t.Fatalf("got %v, want %v", got, tc.want) + } + }) + } +} diff --git a/plans/centaur-install.md b/plans/centaur-install.md new file mode 100644 index 00000000..43b37ea7 --- /dev/null +++ b/plans/centaur-install.md @@ -0,0 +1,317 @@ +# Centaur on Obol Stack — Implementation Plan + +**Status**: Planned, no code yet. + +## Summary + +Package [paradigmxyz/centaur](https://github.com/paradigmxyz/centaur) — an +orchestrator of isolated coding-agent sandboxes with a Slack frontend — as an +Obol app. Users install via: + +``` +obol app install obol/centaur \ + --set slack.botToken=<...> \ + --set slack.signingSecret=<...> \ + --set slack.botApiKey=<...> +``` + +This routes every Centaur sandbox's LLM traffic through LiteLLM in the obol +cluster, so the user inherits all of `obol model list`, including any +`paid/` purchased through x402. Centaur agents become first-class +consumers of the obol stack's paid-inference rails. + +## Why a wrapper, not a direct upstream install + +Upstream's `contrib/chart` runs as-is on raw k8s. To make it an Obol app we need: + +1. **Route LLM traffic through LiteLLM** instead of api.openai.com directly, + so x402-bought paid models work inside sandboxes. +2. **Wire HTTPRoutes** onto `centaur.obol.stack` (internal) + the tunnel + hostname (slack webhook only), following the public/private routing rules + in `CLAUDE.md`. +3. **Extra NetworkPolicy** to allow sandbox→LiteLLM egress (upstream's + sandbox policy only allows egress to its own API on :8000). +4. **Generate operator-managed secrets** (`IRON_MANAGEMENT_API_KEY`, + `SANDBOX_SIGNING_KEY`, postgres password, harness key) without making the + user juggle openssl + kubectl. Pre-install Job, see below. +5. **Hard-code image pulls** to `ghcr.io/paradigmxyz/centaur/...` (upstream + defaults to bare `centaur-api:latest` which docker.io can't pull). + +## Architecture + +``` + ┌─────────────────────────────────┐ + Slack ──────────▶ tunnel /api/webhooks/slack │ + │ └─▶ centaur-slackbot │ + │ │ │ + │ ▼ │ + │ centaur-api (control plane) │ + │ ├─ spawns sandbox pods │ + │ └─ centaur.obol.stack │ + │ ▲ │ + │ local-only HTTPRoute │ + │ │ + │ sandbox pod (per conversation)│ + │ ├─ user's harness (codex/...) │ + │ ├─ iron-proxy sidecar (MITM) │ + │ │ └─▶ external internet │ + │ └─ NO_PROXY bypass for │ + │ *.svc.cluster.local │ + │ │ │ + │ ▼ │ + │ litellm.llm.svc:4000 ◀───────┼─── all obol models, + │ │ paid/*, Ollama, etc. + └─────────────────────────────────┘ +``` + +## Catches we've already designed around + +### Iron-proxy is a vendored MITM proxy +`services/iron-proxy/Dockerfile` is `FROM ironsh/iron-proxy:0.42.0-rc.2`. We +can't change its DNS behaviour to forward `*.svc.cluster.local` to CoreDNS +without forking. + +**v0.1 sidestep**: set `NO_PROXY=*.svc.cluster.local,cluster.local` in the +sandbox env so HTTP clients bypass iron-proxy entirely for cluster-internal +calls. iron-proxy still gates *external* outbound (its security purpose). + +Cost of the sidestep: the LiteLLM master key sits in the sandbox env directly +rather than behind iron-proxy substitution. Acceptable for single-user +obol-stack installs where the user's wallet bounds the blast radius. + +**v0.2 (waiting on upstream)**: [paradigmxyz/centaur#189](https://github.com/paradigmxyz/centaur/pull/189) +adds `CENTAUR_LLM_GATEWAY_HOST` — a single env var on the API container that +rewrites the iron-proxy host-allowlist for `ANTHROPIC_API_KEY` and +`OPENAI_API_KEY` to point at a gateway instead of `api.anthropic.com` / +`api.openai.com`. iron-proxy stays in the path; the master key never enters +the sandbox env. Once the PR lands, we drop the NO_PROXY hack AND the +`sandbox-egress-policy.yaml` NetworkPolicy override below, AND retire the +"LiteLLM virtual-key per install" v2 item — the master key staying in +iron-proxy makes that strictly less urgent. + +Open question on PR 189: the draft takes a bare hostname and the docs example +shows `https://...`, implying iron-proxy talks HTTPS to the gateway. LiteLLM +in our cluster is plain HTTP on `litellm.llm.svc:4000`. Asked upstream to +support `http://host:port` (or an explicit scheme) — the alternative of +fronting LiteLLM with TLS via Traefik just to satisfy iron-proxy is janky for +traffic that never leaves the cluster. + +### Sandbox NetworkPolicy blocks LiteLLM +*v0.1 only — disappears once PR 189 lands and sandboxes no longer reach +LiteLLM directly.* + +Upstream's `templates/networkpolicy.yaml:387` allows sandbox egress only to +the API on :8000. We layer one extra NetworkPolicy in our umbrella chart: + +```yaml +# allows centaur.ai/managed=true pods → llm/litellm:4000 +spec: + podSelector: + matchLabels: { centaur.ai/managed: "true" } + policyTypes: [Egress] + egress: + - to: + - namespaceSelector: + matchLabels: { kubernetes.io/metadata.name: llm } + podSelector: + matchLabels: { app.kubernetes.io/name: litellm } + ports: [{ protocol: TCP, port: 4000 }] +``` + +**Follow-up upstream PR**: add `sandbox.extraEgress` + `ironProxy.extraEgress` +value knobs so we can drop this override. + +### Required Secrets not generated by upstream +`contrib/chart/templates/secrets.yaml` is a *validator* — it requires +`centaur-infra-env`, `centaur-firewall-ca`, `centaur-firewall-ca-key` to exist +before install. We generate them via a `helm pre-install` Job: + +- ServiceAccount `centaur-bootstrap` in the centaur namespace. +- Role `centaur-litellm-key-reader` in the `llm` namespace, scoped to + `get secret/litellm-secrets`. RoleBinding from the SA above. +- Job container: openssl-rand for `IRON_MANAGEMENT_API_KEY`, + `SANDBOX_SIGNING_KEY`, postgres password; reads LiteLLM master key from + `llm/litellm-secrets`; mints firewall CA via `openssl req -x509`; writes all + into `centaur-infra-env`, `centaur-firewall-ca`, `centaur-firewall-ca-key`. +- Job uses `helm.sh/hook: pre-install,pre-upgrade` + + `hook-delete-policy: hook-succeeded`. + +RBAC surface: one Role over one Secret in one foreign namespace. Bounded; +acceptable. + +### Images private vs public +All four `ghcr.io/paradigmxyz/centaur/*` are **public** (path includes the +mid-segment `centaur/`). No `imagePullSecret` needed. No semver tags in the +registry; pin to `sha-` tags + digest. Renovate config: + +```json +{ "datasourceTemplate": "docker", + "packageRules": [ + { "matchPackagePatterns": ["^ghcr.io/paradigmxyz/centaur/"], + "versioning": "loose" } ] } +``` + +### Sandbox runtimeClassName +Upstream defaults to `""` (default container runtime). We don't change it. +NetworkPolicy + iron-proxy outbound gating is the v1 defence. Users running +on production k3s with gVisor installed can `--set sandbox.runtimeClassName=gvisor`. + +### Slack signing & clock skew +Slack rejects webhooks with `>5 minute` timestamp skew. If the k3d host clock +drifts, slack integration silently fails. Not fixable from our side; document +as a known-fragile point in the user guide. + +## Chart layout + +`obol-helm-charts/charts/centaur/` (publishes to +`https://obolnetwork.github.io/helm-charts/`): + +``` +Chart.yaml # name: centaur, deps: contrib subchart pinned to upstream ref +values.yaml # our overrides (image registry, sandbox env, defaults) +values.schema.json # requires slack.botToken, slack.signingSecret, slack.botApiKey +templates/ + bootstrap-job.yaml # pre-install Job + RBAC (see above) + httproute-api.yaml # centaur.obol.stack, hostnames-restricted + httproute-slack.yaml # tunnel host, /api/webhooks/slack only + sandbox-egress-policy.yaml # the extra NetworkPolicy for sandbox→LiteLLM + NOTES.txt # post-install dim tips: slack webhook URL, API URL +charts/ + contrib/ # vendored from /Users/oisinkyne/code/paradigmxyz/centaur/contrib/chart +``` + +Our `values.yaml` defaults (key bits): + +```yaml +contrib: + global: + imagePullSecrets: [] + ironProxy: + image: + repository: ghcr.io/paradigmxyz/centaur/centaur-iron-proxy + tag: sha- + secretSource: env # not onepassword + api: + image: + repository: ghcr.io/paradigmxyz/centaur/centaur-api + tag: sha- + defaultHarness: codex + sandbox: + image: + repository: ghcr.io/paradigmxyz/centaur/centaur-agent + tag: sha- + extraEnv: + OPENAI_BASE_URL: http://litellm.llm.svc.cluster.local:4000/v1 + ANTHROPIC_BASE_URL: http://litellm.llm.svc.cluster.local:4000 + NO_PROXY: "*.svc.cluster.local,cluster.local,127.0.0.1,localhost" + slackbot: + enabled: true + image: + repository: ghcr.io/paradigmxyz/centaur/centaur-slackbot + tag: sha- + postgres: + enabled: true # upstream default, paradedb/paradedb:0.23.0-pg16 + networkPolicy: + enabled: true + ingress: + enabled: false # we use Gateway API HTTPRoute, not Ingress + httpRoutes: [] # rendered by our wrapper templates instead + +slack: + botToken: "" # required + signingSecret: "" # required + botApiKey: "" # required +``` + +## Install UX + +Per user direction, no shortname CLI, no `obol app configure`/wizard in v1. +The existing `obol app install` resolver in `internal/app/resolve.go` needs to +learn one new repo prefix: `obol/` → `https://obolnetwork.github.io/helm-charts/`. + +User flow: + +``` +$ obol app install obol/centaur \ + --set slack.botToken=$SLACK_BOT_TOKEN \ + --set slack.signingSecret=$SLACK_SIGNING_SECRET \ + --set slack.botApiKey=$SLACKBOT_API_KEY +✓ Installed at ~/.config/obol/applications/centaur/ + +$ obol app sync centaur +Helmfile sync running... +✓ centaur-bootstrap Job completed (secrets generated) +✓ centaur-api Ready +✓ centaur-slackbot Ready + +tip: Configure your Slack app event subscription: + https:///api/webhooks/slack +tip: REST API: http://centaur.obol.stack:8080 +tip: Reconfigure by editing + ~/.config/obol/applications/centaur//values.yaml + then re-running 'obol app sync centaur' +``` + +(The dim-tip rendering uses `ui.UI.Detail` or equivalent; one upstream PR opens +to add a `--quiet` flag so scripts don't trip on it.) + +## V2+ (explicitly deferred) + +- **v0.2: adopt `CENTAUR_LLM_GATEWAY_HOST`** once + [paradigmxyz/centaur#189](https://github.com/paradigmxyz/centaur/pull/189) + merges + upstream accepts our HTTP-gateway feedback. Removes the NO_PROXY + hack, the sandbox-egress NetworkPolicy override, and the master-key-in-sandbox + exposure all in one go. +- 1Password Connect mode (`ironProxy.secretSource: onepassword-connect`) +- Multi-harness selection (today: codex only) +- gVisor as a first-class `obol stack` add-on +- Auto-renewal of `IRON_MANAGEMENT_API_KEY` / `SANDBOX_SIGNING_KEY` +- `obol app setup centaur` wizard (`setup.yaml` schema in chart) + +## Files touched + +**This repo (`obol-stack`)**: +- `internal/app/resolve.go` — recognize `obol/` prefix. +- `cmd/obol/main.go` — no change (existing `obol app install` already accepts arbitrary chart refs). +- `flows/flow-NN-centaur-install.sh` — release-smoke flow. +- `docs/guides/centaur-install.md` — user-facing how-to (written alongside this plan). + +**`obol-helm-charts`**: +- New chart `charts/centaur/` per layout above. +- Renovate config update for ghcr centaur images. + +**Upstream tracking**: +- [paradigmxyz/centaur#189](https://github.com/paradigmxyz/centaur/pull/189) — + `CENTAUR_LLM_GATEWAY_HOST`. We're asking them to support + `http://host:port` (or an explicit scheme/port) so an in-cluster plain-HTTP + LiteLLM works without a TLS-fronting hop. Adopt in v0.2. +- Our own follow-up PR: add `sandbox.extraEgress` / `ironProxy.extraEgress` + value knobs to `templates/networkpolicy.yaml` and `values.schema.json` so we + can drop the `sandbox-egress-policy.yaml` override. Lower priority once PR + 189 lands (we won't need the override at all). +- Multi-arch image builds: staged on branch `feat/multi-arch-images` in the + upstream checkout — adds QEMU + `platforms: linux/amd64,linux/arm64`. Unblocks + Apple Silicon pulls; lets us re-add `@sha256:` pins in our values. + +## Smoke test (`flows/flow-NN-centaur-install.sh`) + +1. `obol stack up` (assumes already-running LiteLLM). +2. `obol app install obol/centaur --set slack.*=dummy`. +3. `obol app sync centaur`. +4. Wait pods Ready; assert `centaur-bootstrap` Job Completed. +5. `kubectl exec deploy/centaur-api -- curl -fsS http://localhost:8000/health`. +6. POST a Slack-signed payload to the tunnel webhook URL; assert sandbox spawns. +7. `kubectl exec -- curl -fsS http://litellm.llm.svc.cluster.local:4000/v1/models` — must succeed, proving NO_PROXY + NetworkPolicy correct. +8. `obol app delete centaur`; assert namespace gone. + +## Open risk + +The dynamically-spawned sandbox pods take their pod spec from the API +process (env vars `KUBERNETES_*` in `templates/workloads.yaml:268-364`). We +need to verify that `sandbox.extraEnv` from chart values actually propagates +into the spawned pod env — upstream may merge it into the PodSpec template, +or may need it via a different value key. **First chart-skeleton task is to +helm-template the upstream chart with our overrides and look at the API +deployment env to confirm the propagation path.** If it doesn't flow +automatically, we need an upstream PR for that too, or set `OPENAI_BASE_URL` +via the Secret-injection path instead.