fix: gate broker credentials to enabled harnesses only#398
Open
lyoungblood wants to merge 1 commit into
Open
Conversation
ToolManager.collect_secrets() unconditionally emitted a BrokeredTokenSecret for every (engine, auth_mode) in _HARNESS_SECRETS, so the iron-token-broker ConfigMap always carried both anthropic-claude and openai-codex regardless of which harnesses a deployment actually runs. iron-token-broker (0.0.1-rc.2) corrupts its shared 1Password Go-SDK client the first time it touches a credential whose vault items are missing: the initial item op succeeds, then every subsequent op fails with "an internal error occurred ... invalid client id", breaking token rotation for ALL credentials — including the one actually in use. A deployment that only runs claude-code therefore had to bootstrap placeholder OPENAI_CODEX_* vault items just to keep the broker healthy. Gate the harness loop in collect_secrets() on enabled_harnesses(): the CENTAUR_DEFAULT_HARNESS plus any engines listed in the new CENTAUR_ENABLED_HARNESSES. The broker (and the shared API-side iron-proxy) now only manage credentials for harnesses the deployment can actually reach, so missing-vault-item poisoning can't happen for a harness you don't use. After this lands, the placeholder vault items can be deleted. - harness_config: add enabled_harnesses() - tool_manager: gate collect_secrets() on the enabled set (both auth-mode variants of each enabled engine are still emitted) - chart: api.enabledHarnesses -> CENTAUR_ENABLED_HARNESSES (rendered only when set); documented in configuration.md - tests: default-harness-only deployments omit codex creds; explicitly enabled harnesses still emit them; broker config omits unenabled brokered creds Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Author
|
Reported the underlying broker bug upstream: ironsh/iron-proxy#176 (iron-token-broker's shared 1Password SDK client is poisoned by a credential with missing vault items). This PR is the deployment-side mitigation. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ToolManager.collect_secrets()unconditionally appended aBrokeredTokenSecretfor every(engine, auth_mode)in_HARNESS_SECRETS(anthropic-claude,openai-codex, …), so the iron-token-broker ConfigMap rendered bybroker_config.render_broker_yaml()always carried credentials for harnesses a deployment may never run.iron-token-broker (
ironsh/iron-token-broker:0.0.1-rc.2) corrupts its shared 1Password Go-SDK client the first time it touches a credential whose referenced vault items are missing: the first item op succeeds, then every subsequent op fails withwhich breaks the rotation write path (
Items.Get/Items.Put) for all credentials — including the one actually in use. Confirmed empirically: removing the phantomopenai-codexcredential made the broker healthy; re-adding it with missing vault items re-broke it.The current workaround on the Moonwell deployment is to bootstrap placeholder
OPENAI_CODEX_CLIENT_ID/OPENAI_CODEX_BLOBitems in theai-agentsvault so resolution succeeds and codex is harmlessly marked unauthenticated.Fix
Gate the harness loop in
collect_secrets()on a newharness_config.enabled_harnesses()allowlist:CENTAUR_DEFAULT_HARNESSis always enabled.CENTAUR_ENABLED_HARNESSES(comma/whitespace-separated, same aliases as the default) adds any other harnesses the deployment can spawn.The shared API-side iron-proxy and iron-token-broker now manage credentials only for harnesses the deployment can actually reach, so a credential with missing vault items for an unused harness is never emitted. Both auth-mode variants of each enabled engine are still emitted (the broker manages a credential regardless of which mode a sandbox currently uses).
After this lands, the Moonwell deployment can run claude-code-only (default) and delete the placeholder vault items.
Migration note⚠️
This changes the set emitted by
collect_secrets(). Previously every harness credential was managed; now onlyCENTAUR_DEFAULT_HARNESSis, unlessCENTAUR_ENABLED_HARNESSESlists more. A deployment that can spawn more than one harness must enumerate them inCENTAUR_ENABLED_HARNESSES(api.enabledHarnesses) — otherwise a sandbox spawned on a non-enabled harness inaccess_tokenmode has no broker credential. (api_keymode is unaffected; it doesn't use the broker.)Changes
harness_config: addenabled_harnesses().tool_manager: gatecollect_secrets()on the enabled set.api.enabledHarnesses→CENTAUR_ENABLED_HARNESSES(rendered only when non-empty); documented inconfiguration.md.enabled_harnesses()parsing.Testing
helm templaterendersCENTAUR_ENABLED_HARNESSESonly whenapi.enabledHarnessesis non-empty. (One pre-existing, unrelated failure —test_tool_rest_router_lists_describes_and_invokes_tools, a 401 auth-env artifact present onmaintoo.)Upstream
The robust fix is broker-side: iron-token-broker should tolerate a credential whose vault items are missing without poisoning the shared SDK client used by every other credential. Worth reporting to
ironsh/iron-token-broker; this PR is the deployment-side mitigation.🤖 Generated with Claude Code