Add MCP OAuth authentication flow#411
Open
timvisher-dd wants to merge 1 commit intoagentclientprotocol:mainfrom
Open
Add MCP OAuth authentication flow#411timvisher-dd wants to merge 1 commit intoagentclientprotocol:mainfrom
timvisher-dd wants to merge 1 commit intoagentclientprotocol:mainfrom
Conversation
d8e9b90 to
a086305
Compare
After query() initialization, check mcpServerStatus() for HTTP/SSE MCP servers in 'needs-auth' state and trigger the Claude Code CLI's built-in OAuth flow via the mcp_authenticate control message. The CLI handles the full PKCE flow: RFC 9728 discovery, dynamic client registration, localhost callback server, token exchange, and keychain storage. The agent opens the user's browser for OAuth consent and polls until the server transitions to 'connected'. Browser opening mirrors the CLI's internal approach: respects $BROWSER, uses rundll32 on Windows, open on macOS, xdg-open on Linux. In headless environments where opening fails, the auth URL is logged as an error and the server is skipped gracefully. Previously, MCP servers requiring OAuth would silently fail to connect unless the ACP client pre-injected static Authorization headers. Validated end-to-end with Datadog and Atlassian MCP servers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
a086305 to
caae409
Compare
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When ACP clients provide HTTP/SSE MCP servers without
Authorizationheaders, the agent now detects servers that need authentication and triggers the Claude Code SDK's built-in OAuth flow automatically. Previously, MCP servers requiring OAuth would silently fail to connect.Problem
MCP servers that require OAuth (e.g., Datadog, Atlassian) return 401 on connection. The SDK detects this and marks them as
needs-auth, butclaude-agent-acpnever checked this status or triggered authentication. ACP clients had to work around this by performing OAuth themselves and injecting staticAuthorizationheaders at session creation time.For example, we built
agent-shell-mcp-oauth--- a full MCP OAuth 2.1 implementation in Emacs Lisp (RFC 9728 discovery, dynamic client registration, PKCE, keychain-backed token storage) --- as an add-on for the agent-shell ACP client, specifically to paper over this gap on the client side. Every ACP client that wants OAuth MCP servers currently has to implement something similar.I think this is inconsistent with the ACP spec's intent: the client specifies MCP servers, and the agent handles connecting to them --- including auth. The spec could be clearer here, but the
headersfield onMcpServerHttpreads as a convenience for static credentials, not the intended mechanism for OAuth. With this change, ACP clients can be truly agnostic about MCP auth --- they just provide the server URL and the agent handles the rest.Solution
After
query().initializationResult(), checkmcpServerStatus()for servers withstatus === 'needs-auth'. For each, send the SDK'smcp_authenticatecontrol message, which triggers:The agent opens the user's browser for OAuth consent and polls
mcpServerStatus()until the server transitions toconnected.How we got here: empirical investigation
This change is backed by extensive empirical testing against the Claude Code SDK and CLI internals. Here's the investigation trail:
Phase 1: Can the SDK handle MCP OAuth?
Hypothesis: The Claude Agent SDK already implements MCP OAuth. If we pass an MCP server config without headers, the SDK should handle auth.
Test: Built a headless ACP client (
x.oauth-probe.mjs) that spawnsclaude-agent-acp, creates a session with Datadog MCP (no headers), and sends a prompt.Result (with cached tokens): MCP tools loaded and worked. The SDK found cached OAuth tokens in the macOS keychain under
Claude Code-credentials->mcpOAuthand used them silently.Result (with cache cleared): MCP tools did NOT load. No
[OAUTH]events, no browser prompt. The SDK detectedneeds-authbut silently gave up.Conclusion: The SDK uses cached tokens when available but does NOT autonomously trigger OAuth when called via
query().Phase 2: What does the SDK expose?
Test: Used
query()directly (x.sdk-direct-probe.mjs) to callmcpServerStatus().Result:
The SDK correctly detects the 401 and reports
needs-auth. ButreconnectMcpServer()rejects with"Server status: needs-auth"--- it only works for servers that have already been authenticated.Phase 3: Does the CLI have an auth control message?
Investigation: Searched the minified
cli.jsfor control message subtypes. All identifiers below are minified names from the bundled CLI; we infer their purpose from usage context.Discovery: The CLI handles these undocumented MCP auth control messages:
mcp_authenticate--- triggers OAuth discovery + PKCE + localhost callbackmcp_oauth_callback_url--- provides the callback URL manuallymcp_clear_auth--- clears stored tokensThe CLI has an internal OAuth provider class (minified as
S_6, likely something likeMcpOAuthProvider) with askipBrowserOpenflag that defaults tofalse. Whenfalse, the provider opens the browser itself via an internal cross-platform open function (minified as$Y, likelyopenUrl). There are two code paths that invoke this provider:skipBrowserOpen: true, get back the auth URL, and render it in the terminal UI for the user to interact with. The TUI never uses the control message protocol.mcp_authenticate): The CLI's control message handler also calls the provider withskipBrowserOpen: truehardcoded, returning the URL as JSON. This is the path available to out-of-process callers like the SDK.In both paths,
skipBrowserOpenistrue--- the browser is never opened by the CLI when auth is triggered programmatically. The TUI handles presentation in its React UI; we handle it by callingopen/xdg-open/start. The internal browser-opening function ($Y) is only used when the provider is called with the defaultskipBrowserOpen: false, which none of the current code paths actually do for MCP auth.The SDK wrapper (
sdk.mjs) doesn't exposemcp_authenticatein its TypeScript types, but theQueryobject'srequest()method can send arbitrary control messages to the CLI subprocess.Phase 4: End-to-end validation
Test:
x.sdk-auth-spike.mjs--- clear keychain, callquery(), detectneeds-auth, sendmcp_authenticate, open browser, poll.Result:
Full PKCE flow completed in ~12 seconds.
Phase 5: Patched ACP agent validation
Test:
x.acp-patched-spike.mjs--- applied this patch, launched via ACP, created session with empty headers.Result:
Both Datadog and Atlassian authenticated automatically. MCP tools worked in the subsequent prompt.
Architecture
Caveats
mcp_authenticateis undocumented: This control message is not in the SDK's public TypeScript types. It's used internally by the Claude Code CLI. Both the TUI and the control message handler call the CLI's internal OAuth provider withskipBrowserOpen: true, so the caller is always responsible for opening the browser or presenting the URL. See Phase 3 above for details. This message could change without notice; we use@ts-ignoreand should add feature detection / fallback in a follow-up.mcp_authenticatereturns the auth URL without opening a browser, we handle browser opening ourselves, mirroring the CLI's internalopenUrlfunction (minified as$Y): respects the$BROWSERenv var, usesrundll32 url,OpenURLon Windows,openon macOS, andxdg-openon Linux. If opening fails (headless environments, containers, SSH, CI), the auth URL is logged as an error and the server is skipped --- the client falls back to providing staticAuthorizationheaders atsession/new. Supporting headless agents properly would require ACP protocol extensions (ExtRequest/ExtNotification) to surface auth URLs to the client, plus a mechanism for mid-session credential injection. That's orthogonal to this change.session/newfor up to 60 seconds per server. This is acceptable for first-time auth (subsequent sessions use cached tokens) but could be improved with async notification to the ACP client.cli.js) reveals that whilerefreshTokenandexpiresAtare stored in the keychain alongsideaccessToken, the CLI does not usegrant_type=refresh_tokenfor MCP server tokens. The CLI does have 401→refresh→retry logic for its own API calls (e.g.,/api/oauth/claude_cli/client_data), but this code path is not wired up for MCP HTTP requests. When a cached MCP token expires, the server returns 401, and the CLI simply marks itneeds-auth— no refresh attempt, no retry. The ACP layer has no visibility into token state (only the binaryneeds-authvsconnectedstatus), so it cannot paper over this. In practice, this means users will see a full browser re-auth flow every time their MCP tokens expire (typically ~1h15m for Datadog/Atlassian), rather than a silent background refresh. Compare withagent-shell-mcp-oauth, which proactively refreshes tokens 60 minutes before expiry and never requires re-login for cached servers. This needs an upstream SDK fix: the CLI should attemptgrant_type=refresh_tokenbefore falling back toneeds-auth.Authorizationheaders atsession/neware unaffected. When valid headers are provided, the MCP server connects normally andmcpServerStatus()never returnsneeds-auth, so none of the new auth code is triggered.Testing
Validated end-to-end with:
https://mcp.datadoghq.com/api/unstable/mcp-server/mcp)https://mcp.atlassian.com/v1/mcp)Both with cache cleared (fresh OAuth) and with cached tokens (automatic reconnection).
Future work
ExtNotificationinstead of blocking session creationmcp_authenticatewith fallback to client-provided headersgrant_type=refresh_tokenfor MCP OAuth tokens before falling back toneeds-auth. The refresh plumbing already exists for the CLI's own API calls; it just needs to be wired up for MCP server connections. Without this, users face a full browser re-auth every ~1h15m.