Skip to content

feat: add support for llm gateways#189

Open
nivye wants to merge 2 commits into
paradigmxyz:mainfrom
nivye:add-llm-gateway-support
Open

feat: add support for llm gateways#189
nivye wants to merge 2 commits into
paradigmxyz:mainfrom
nivye:add-llm-gateway-support

Conversation

@nivye
Copy link
Copy Markdown

@nivye nivye commented May 24, 2026

Summary

Adds opt-in support for routing harness LLM traffic through an LLM gateway (e.g. LiteLLM, Portkey, self-hosted) instead of directly to provider APIs. Activated by a single environment variable — when unset, behavior is unchanged.

# Default (unchanged):
#   ANTHROPIC_API_KEY → api.anthropic.com
#   OPENAI_API_KEY    → api.openai.com

export CENTAUR_LLM_GATEWAY_HOST=litellm.internal.example.com
#   ANTHROPIC_API_KEY → litellm.internal.example.com
#   OPENAI_API_KEY    → litellm.internal.example.com

The gateway is expected to be Anthropic- and/or OpenAI-API-compatible (LiteLLM, Portkey, custom). Operators put the gateway's API key under the existing ANTHROPIC_API_KEY / OPENAI_API_KEY secret slot — iron-proxy still injects it into the same headers (X-Api-Key / Authorization), it just does so for the gateway host instead of the provider host. No new secret types, no new injection paths.

Motivation

Enterprises routinely standardize on an LLM gateway for centralized cost accounting, audit logging, key rotation, model fallback, and policy enforcement. For those environments, any service that calls api.openai.com or api.anthropic.com directly is a blocker on installation. Today the host allowlist that iron-proxy uses for credential injection is hardcoded in _INFRA_SECRETS (services/api/api/tool_manager.py), so there is no supported way to point Centaur at a gateway.

Approach

ToolManager._INFRA_SECRETS was a ClassVar literal. This PR converts it to a method, _infra_secrets(), that reads CENTAUR_LLM_GATEWAY_HOST and substitutes it into the host tuple for ANTHROPIC_API_KEY and OPENAI_API_KEY when set. Everything else (other provider keys, GitHub, Slack) is unchanged.

collect_secrets() — the single call site — now invokes the method. The downstream proxy_config.py rendering and iron-proxy secrets transform are untouched.

The env var is read with os.getenv(...) inline, matching how other config knobs are read elsewhere in the same file (TOOL_BINARY_INLINE_MAX_BYTES, TOOL_CALL_TIMEOUT_S, etc.).

Replacement, not addition

When CENTAUR_LLM_GATEWAY_HOST is set the provider host is replaced rather than appended. Rationale: if an operator has explicitly configured a gateway, they almost certainly want all LLM traffic to route through it — not "also allow direct calls". Replacement also narrows the surface where the gateway key gets injected, which is the safer default.

Testing

  • Two new unit tests in services/api/tests/test_tool_manager.py:
    • test_infra_secrets_default_to_provider_hosts — env var unset → original behavior preserved.
    • test_infra_secrets_route_llm_keys_through_gateway_host — env var set → Anthropic + OpenAI keys route to the gateway host.
  • Manual end-to-end verification against a real LiteLLM gateway:
    1. Set CENTAUR_LLM_GATEWAY_HOST=<gateway-host> on the API container, set ANTHROPIC_BASE_URL=https://<gateway-host> on the sandbox via sandbox.extraEnv, store the LiteLLM key in the ANTHROPIC_API_KEY secret slot.
    2. Spawn a Claude Code thread, send "Reply with PONG".
    3. Confirmed in iron-proxy audit logs: POST /v1/messages to the gateway host, status 200, secrets.swapped: ANTHROPIC_API_KEY in header X-Api-Key. Result text: "PONG".

Compatibility

  • Backwards-compatible: when CENTAUR_LLM_GATEWAY_HOST is unset, _infra_secrets() returns the exact same list _INFRA_SECRETS did. Any deployment that doesn't opt in sees no behavior change.
  • No new dependencies, no schema changes, no chart changes.
  • Existing secret types (HttpSecret), injection mechanism, and iron-proxy config rendering are untouched.

Files changed

File Change
services/api/api/tool_manager.py _INFRA_SECRETS ClassVar → _infra_secrets() method; reads CENTAUR_LLM_GATEWAY_HOST; removed now-unused ClassVar import.
services/api/tests/test_tool_manager.py Two new tests covering default + gateway-routed behavior.
docs/pages/deploying-in-production.mdx Adds documentation to the new environment variable that adds LLM gateway support.

@nivye nivye force-pushed the add-llm-gateway-support branch from 8a9e650 to 3847dfc Compare May 25, 2026 06:56
# Conflicts:
#	docs/pages/deploying-in-production.mdx
#	services/api/api/tool_manager.py
#	services/api/tests/test_tool_manager.py
@OisinKyne
Copy link
Copy Markdown

We could make use of this PR. Question though, can we allow http to a local litellm? We can do some sort of TLS termination between ironproxy and litellm if we must, but it needs custom self signed CA certs etc. and idk if its meaningfully more secure in a local context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants