fix: close progress notification race condition that crashes the MCP server#114
fix: close progress notification race condition that crashes the MCP server#114jobordu wants to merge 1 commit intotuannvm:mainfrom
Conversation
WalkthroughUpdates the MCP config to use an absolute path for the codex-cli command and adds a Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Server as MCP Server
participant Handler as ToolHandler
participant Progress as ProgressEmitter
Client->>Server: invoke tool
Server->>Handler: call handler(context)
Handler->>Progress: sendProgress(...)
Handler->>Handler: finishes work
Handler->>Server: returns result
Server->>Handler: invoke context.done() (if provided)
Server->>Progress: (suppress) further sendProgress after done
Server->>Client: return handler result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…pletion Adds a `done` flag and callback to `ToolHandlerContext` so that any in-flight progress messages are dropped once the tool handler has returned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: jobordu <jonathanborduas@gmail.com>
abe540d to
5e19b80
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/server.ts (2)
88-90:context.done?.()is skipped whenhandler.execute()throws — usefinally.If the handler rejects (e.g., CLI spawn failure), the code jumps to the
catchat line 91 anddoneis never set totrue. Any queued-but-not-yet-startedsendProgresscallbacks from the streaming phase will then skip thedoneguard and attemptsendNotificationwith a token that is being torn down.♻️ Proposed fix
- const result = await handler.execute(args, context); - context.done?.(); - return result; + const result = await handler.execute(args, context); + return result; } catch (error) { return { content: [ { type: 'text', text: handleError(error, `tool "${name}"`), }, ], isError: true, }; + } finally { + context.done?.(); }Moving
context.done?.()into afinallyblock guarantees the flag is set regardless of whether the handler succeeds or throws.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server.ts` around lines 88 - 90, The call to context.done?.() should be moved out of the normal flow and into a finally block so it always runs even if handler.execute throws; update the try/catch around await handler.execute(args, context) to add a finally { context.done?.(); } (referencing handler.execute and context.done) so queued sendProgress callbacks see the done flag during teardown and avoid using torn-down tokens.
55-77: The race condition is narrow but real; however, the try/catch is sufficient for actual error scenarios.The guard on line 60 correctly blocks any
sendProgresscall that hasn't started yet. OncesendProgresspasses the check and suspends atawait extra.sendNotification(...), settingdone = trueafterward won't prevent that notification from being sent—this narrow residual window is unavoidable without tracking all in-flight Promises.However, the concern about "unhandled transport error events" for unknown/expired
progressTokendoesn't apply here. Per MCP specification,sendNotificationis a JSON-RPC notification (fire-and-forget) that cannot receive error responses. Unknown tokens are silently handled by the receiver and do not generate transport-level errors. ThesendNotificationPromise can only reject due to local transport failures (e.g., write failed, transport closed), and thetry/catchwill correctly intercept those. No additional error handling is needed.The current implementation is sufficient in practice. The narrow race window is acceptable given its limited scope.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server.ts` around lines 55 - 77, The narrow race where done can flip true after the initial guard but before the awaited sendNotification can be eliminated by re-checking done immediately before invoking extra.sendNotification and by tracking in-flight sends; update sendProgress (and the surrounding scope using progressToken, done, progressCount) to (1) capture/compute the params, then if (done) return before calling await extra.sendNotification, and (2) optionally add each sendNotification Promise to an inFlight Set (add before await, remove in finally) so you can cancel or await outstanding notifications when marking done; this uses the existing progressToken, sendProgress, done flag, progressCount, and extra.sendNotification symbols to locate and implement the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.mcp.json:
- Line 7: Replace the hardcoded absolute path
"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" in .mcp.json with a
portable path (e.g. "dist/index.js" or "./dist/index.js" or a
configurable/env-driven value) so the server can run on other machines and CI;
locate the string in .mcp.json, remove the machine-specific path entry and
restore the intended relative path or configuration reference used by the
project.
---
Nitpick comments:
In `@src/server.ts`:
- Around line 88-90: The call to context.done?.() should be moved out of the
normal flow and into a finally block so it always runs even if handler.execute
throws; update the try/catch around await handler.execute(args, context) to add
a finally { context.done?.(); } (referencing handler.execute and context.done)
so queued sendProgress callbacks see the done flag during teardown and avoid
using torn-down tokens.
- Around line 55-77: The narrow race where done can flip true after the initial
guard but before the awaited sendNotification can be eliminated by re-checking
done immediately before invoking extra.sendNotification and by tracking
in-flight sends; update sendProgress (and the surrounding scope using
progressToken, done, progressCount) to (1) capture/compute the params, then if
(done) return before calling await extra.sendNotification, and (2) optionally
add each sendNotification Promise to an inFlight Set (add before await, remove
in finally) so you can cancel or await outstanding notifications when marking
done; this uses the existing progressToken, sendProgress, done flag,
progressCount, and extra.sendNotification symbols to locate and implement the
change.
| "command": "node", | ||
| "args": [ | ||
| "dist/index.js" | ||
| "/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" |
There was a problem hiding this comment.
Revert the hardcoded absolute machine-specific path — this will break the server for everyone else.
/Users/jonathanborduas/code/codex-mcp-server/dist/index.js is specific to one developer's local environment. This path won't exist on any other machine, CI runner, or deployment target, causing the MCP server to fail to start.
This change appears to be an accidental local debugging artifact that should be reverted before merge.
🐛 Proposed fix
- "/Users/jonathanborduas/code/codex-mcp-server/dist/index.js"
+ "dist/index.js"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" | |
| "dist/index.js" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.mcp.json at line 7, Replace the hardcoded absolute path
"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" in .mcp.json with a
portable path (e.g. "dist/index.js" or "./dist/index.js" or a
configurable/env-driven value) so the server can run on other machines and CI;
locate the string in .mcp.json, remove the machine-specific path entry and
restore the intended relative path or configuration reference used by the
project.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/server.ts (1)
81-101:⚠️ Potential issue | 🟠 Major
done()is never called whenhandler.execute()throws — the race condition remains on the error path.
contextis declared inside thetryblock (line 87), making it inaccessible in thecatchblock. Ifexecute()rejects, the catch returns an error response immediately — Claude Code deregisters theprogressToken— but anysendProgresscalls that were already past theif (!progressToken || done) returnguard at line 60 will still attemptsendNotificationwith a deregistered token, triggering the exact crash this PR is fixing.The cleanest fix is to hoist
contextoutside thetryblock and use afinallyclause to guaranteedone()is always called:🐛 Proposed fix using `try/finally`
- try { - if (!this.isValidToolName(name)) { - throw new Error(`Unknown tool: ${name}`); - } - - const handler = toolHandlers[name]; - const context = createProgressContext(); - const result = await handler.execute(args, context); - context.done?.(); - return result; - } catch (error) { - return { - content: [ - { - type: 'text', - text: handleError(error, `tool "${name}"`), - }, - ], - isError: true, - }; - } + let context: ToolHandlerContext | undefined; + try { + if (!this.isValidToolName(name)) { + throw new Error(`Unknown tool: ${name}`); + } + + const handler = toolHandlers[name]; + context = createProgressContext(); + const result = await handler.execute(args, context); + return result; + } catch (error) { + return { + content: [ + { + type: 'text', + text: handleError(error, `tool "${name}"`), + }, + ], + isError: true, + }; + } finally { + context?.done?.(); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server.ts` around lines 81 - 101, The progress context is created inside the try so context.done() is never guaranteed on error; hoist the context by declaring e.g. "let context; // ProgressContext | undefined" before the try, then after validating the tool name and fetching the handler assign "context = createProgressContext()", call "await handler.execute(args, context)" inside the try, and add a finally block that calls "context?.done?.()" so done() always runs whether execute() resolves or rejects; refer to isValidToolName, toolHandlers, createProgressContext, handler.execute, and context.done when making the change.
🧹 Nitpick comments (1)
src/server.ts (1)
59-77: Remaining in-flight race window forsendProgresscalls already past thedoneguard.The
doneflag prevents newsendProgressinvocations after completion, but it does not cancel asendProgresscall that has already passed theif (!progressToken || done) returncheck (line 60) and is suspended atawait extra.sendNotification(...)(line 64). Ifhandler.execute()resolves while such a notification is in-flight on the event loop, the tool result is returned, the token is deregistered, and the pendingsendNotificationstill fires against a dead token.The window is significantly narrower than before this fix (requiring a notification to be mid-send at the exact moment execution completes), and the existing
catchon line 73 silently absorbs the resulting error — so in practice this won't crash the server. However, for complete correctness, consider tracking the set of pending notification promises and draining them before returning:♻️ Suggested approach — drain in-flight notifications
const createProgressContext = (): ToolHandlerContext => { let progressCount = 0; let done = false; + const pending = new Set<Promise<void>>(); return { progressToken, + flush: () => Promise.allSettled(pending), done: () => { done = true; }, sendProgress: async (message: string, progress?: number, total?: number) => { if (!progressToken || done) return; progressCount++; + const p = (async () => { try { await extra.sendNotification({ /* ... */ }); } catch (err) { console.error(chalk.yellow('Failed to send progress notification:'), err); } + })(); + pending.add(p); + void p.finally(() => pending.delete(p)); }, }; }; - const result = await handler.execute(args, context); - context.done?.(); - return result; + const result = await handler.execute(args, context); + context.done?.(); + await (context as any).flush?.(); + return result;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server.ts` around lines 59 - 77, The sendProgress closure (sendProgress) can pass the done check and then await extra.sendNotification while handler.execute completes and deregisters progressToken, causing in-flight sends to target a dead token; fix by tracking in-flight notification promises: create a Set to add the Promise returned by extra.sendNotification (from within sendProgress) before awaiting it, remove the promise when it settles, and when finishing the operation (just after handler.execute resolves and before deregistering the token/returning the tool result) await Promise.allSettled on that Set to drain pending notifications; reference sendProgress, progressToken, done, extra.sendNotification and handler.execute to locate and update the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/server.ts`:
- Around line 81-101: The progress context is created inside the try so
context.done() is never guaranteed on error; hoist the context by declaring e.g.
"let context; // ProgressContext | undefined" before the try, then after
validating the tool name and fetching the handler assign "context =
createProgressContext()", call "await handler.execute(args, context)" inside the
try, and add a finally block that calls "context?.done?.()" so done() always
runs whether execute() resolves or rejects; refer to isValidToolName,
toolHandlers, createProgressContext, handler.execute, and context.done when
making the change.
---
Duplicate comments:
In @.mcp.json:
- Line 7: Replace the hardcoded absolute machine-specific path string
"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" with the generic
relative entry "dist/index.js" in the .mcp.json file so the config is portable;
locate the offending string and update it to exactly "dist/index.js".
---
Nitpick comments:
In `@src/server.ts`:
- Around line 59-77: The sendProgress closure (sendProgress) can pass the done
check and then await extra.sendNotification while handler.execute completes and
deregisters progressToken, causing in-flight sends to target a dead token; fix
by tracking in-flight notification promises: create a Set to add the Promise
returned by extra.sendNotification (from within sendProgress) before awaiting
it, remove the promise when it settles, and when finishing the operation (just
after handler.execute resolves and before deregistering the token/returning the
tool result) await Promise.allSettled on that Set to drain pending
notifications; reference sendProgress, progressToken, done,
extra.sendNotification and handler.execute to locate and update the code.
|
FYI: the Cursor Code Review check is failing but this is a pre-existing infra issue on the repo — every PR/branch going back weeks shows the same failure (dependabot updates, other feature branches, etc.). It's unrelated to this PR's changes. The DCO check is now green. Happy to address any feedback from the actual review. |
The Bug: Progress Notification Race Condition
When Claude Code calls a tool like
mcp__codex-cli__review, it attaches aprogressTokento the request. The server uses this token to send incremental progress notifications back while the codex CLI is running.What goes wrong
The server uses
executeCommandStreaming, which pipes the CLI's stdout line-by-line intoonProgresscallbacks. Each callback firessendProgress(), which sends anotifications/progressmessage over the STDIO pipe.The problem is timing:
sendProgressis async but theonProgresscallback in the streaming handler is not awaited.So the sequence actually looks like this:
onProgressfires →sendProgress()starts async sendexecuteCommandStreamingreturnshandler.execute()returns the tool resultCallToolresponse to Claude CodeprogressTokensendProgress()resolves → tries to sendnotifications/progresswith deregistered tokenThe Fix
Adds a
doneflag and callback toToolHandlerContextso that any in-flight progress messages are dropped once the tool handler has returned. Afterhandler.execute()resolves, we callcontext.done?.()which sets the flag totrue, causing subsequentsendProgresscalls to short-circuit immediately.Related Issues
Related to #87 (server timeouts after tool calls — symptom of this crash)
Related to #62 (MCP server disconnects in Codex context)
Summary by CodeRabbit
New Features
Chores