fix: gate broker credentials to enabled harnesses only by lyoungblood · Pull Request #398 · paradigmxyz/centaur

lyoungblood · 2026-06-04T05:25:58Z

Problem

ToolManager.collect_secrets() unconditionally appended a BrokeredTokenSecret for every (engine, auth_mode) in _HARNESS_SECRETS (anthropic-claude, openai-codex, …), so the iron-token-broker ConfigMap rendered by broker_config.render_broker_yaml() always carried credentials for harnesses a deployment may never run.

iron-token-broker (ironsh/iron-token-broker:0.0.1-rc.2) corrupts its shared 1Password Go-SDK client the first time it touches a credential whose referenced vault items are missing: the first item op succeeds, then every subsequent op fails with

an internal error occurred ... invalid client id

which breaks the rotation write path (Items.Get/Items.Put) for all credentials — including the one actually in use. Confirmed empirically: removing the phantom openai-codex credential made the broker healthy; re-adding it with missing vault items re-broke it.

The current workaround on the Moonwell deployment is to bootstrap placeholder OPENAI_CODEX_CLIENT_ID / OPENAI_CODEX_BLOB items in the ai-agents vault so resolution succeeds and codex is harmlessly marked unauthenticated.

Fix

Gate the harness loop in collect_secrets() on a new harness_config.enabled_harnesses() allowlist:

CENTAUR_DEFAULT_HARNESS is always enabled.
CENTAUR_ENABLED_HARNESSES (comma/whitespace-separated, same aliases as the default) adds any other harnesses the deployment can spawn.

The shared API-side iron-proxy and iron-token-broker now manage credentials only for harnesses the deployment can actually reach, so a credential with missing vault items for an unused harness is never emitted. Both auth-mode variants of each enabled engine are still emitted (the broker manages a credential regardless of which mode a sandbox currently uses).

After this lands, the Moonwell deployment can run claude-code-only (default) and delete the placeholder vault items.

Migration note ⚠️

This changes the set emitted by collect_secrets(). Previously every harness credential was managed; now only CENTAUR_DEFAULT_HARNESS is, unless CENTAUR_ENABLED_HARNESSES lists more. A deployment that can spawn more than one harness must enumerate them in CENTAUR_ENABLED_HARNESSES (api.enabledHarnesses) — otherwise a sandbox spawned on a non-enabled harness in access_token mode has no broker credential. (api_key mode is unaffected; it doesn't use the broker.)

Changes

harness_config: add enabled_harnesses().
tool_manager: gate collect_secrets() on the enabled set.
chart: api.enabledHarnesses → CENTAUR_ENABLED_HARNESSES (rendered only when non-empty); documented in configuration.md.
tests: default-harness-only deployments omit codex creds; explicitly enabled harnesses still emit them; broker config omits unenabled brokered creds; enabled_harnesses() parsing.

Testing

uv run pytest tests/test_harness_config.py tests/test_broker_config.py \
  tests/test_tool_manager.py::TestHarnessSecretSelection -q
# 31 passed

helm template renders CENTAUR_ENABLED_HARNESSES only when api.enabledHarnesses is non-empty. (One pre-existing, unrelated failure — test_tool_rest_router_lists_describes_and_invokes_tools, a 401 auth-env artifact present on main too.)

Upstream

The robust fix is broker-side: iron-token-broker should tolerate a credential whose vault items are missing without poisoning the shared SDK client used by every other credential. Worth reporting to ironsh/iron-token-broker; this PR is the deployment-side mitigation.

🤖 Generated with Claude Code

ToolManager.collect_secrets() unconditionally emitted a BrokeredTokenSecret for every (engine, auth_mode) in _HARNESS_SECRETS, so the iron-token-broker ConfigMap always carried both anthropic-claude and openai-codex regardless of which harnesses a deployment actually runs. iron-token-broker (0.0.1-rc.2) corrupts its shared 1Password Go-SDK client the first time it touches a credential whose vault items are missing: the initial item op succeeds, then every subsequent op fails with "an internal error occurred ... invalid client id", breaking token rotation for ALL credentials — including the one actually in use. A deployment that only runs claude-code therefore had to bootstrap placeholder OPENAI_CODEX_* vault items just to keep the broker healthy. Gate the harness loop in collect_secrets() on enabled_harnesses(): the CENTAUR_DEFAULT_HARNESS plus any engines listed in the new CENTAUR_ENABLED_HARNESSES. The broker (and the shared API-side iron-proxy) now only manage credentials for harnesses the deployment can actually reach, so missing-vault-item poisoning can't happen for a harness you don't use. After this lands, the placeholder vault items can be deleted. - harness_config: add enabled_harnesses() - tool_manager: gate collect_secrets() on the enabled set (both auth-mode variants of each enabled engine are still emitted) - chart: api.enabledHarnesses -> CENTAUR_ENABLED_HARNESSES (rendered only when set); documented in configuration.md - tests: default-harness-only deployments omit codex creds; explicitly enabled harnesses still emit them; broker config omits unenabled brokered creds Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

lyoungblood · 2026-06-04T05:51:24Z

Reported the underlying broker bug upstream: ironsh/iron-proxy#176 (iron-token-broker's shared 1Password SDK client is poisoned by a credential with missing vault items). This PR is the deployment-side mitigation.

lyoungblood mentioned this pull request Jun 4, 2026

iron-token-broker: a credential with missing 1Password items poisons the shared SDK client ("invalid client id"), breaking rotation for all credentials ironsh/iron-proxy#176

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gate broker credentials to enabled harnesses only#398

fix: gate broker credentials to enabled harnesses only#398
lyoungblood wants to merge 1 commit into
paradigmxyz:mainfrom
moonwell-fi:gate-broker-creds-by-enabled-harness

lyoungblood commented Jun 4, 2026

Uh oh!

lyoungblood commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lyoungblood commented Jun 4, 2026

Problem

Fix

Migration note ⚠️

Changes

Testing

Upstream

Uh oh!

lyoungblood commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant