fix: close progress notification race condition that crashes the MCP server by jobordu · Pull Request #114 · tuannvm/codex-mcp-server

jobordu · 2026-02-20T16:44:02Z

The Bug: Progress Notification Race Condition

When Claude Code calls a tool like mcp__codex-cli__review, it attaches a progressToken to the request. The server uses this token to send incremental progress notifications back while the codex CLI is running.

What goes wrong

The server uses executeCommandStreaming, which pipes the CLI's stdout line-by-line into onProgress callbacks. Each callback fires sendProgress(), which sends a notifications/progress message over the STDIO pipe.

The problem is timing: sendProgress is async but the onProgress callback in the streaming handler is not awaited.

So the sequence actually looks like this:

codex CLI finishes
last onProgress fires → sendProgress() starts async send
executeCommandStreaming returns
handler.execute() returns the tool result
server sends the CallTool response to Claude Code
Claude Code receives the response, deregisters progressToken
in-flight sendProgress() resolves → tries to send notifications/progress with deregistered token
Claude Code: "unknown token" → drops the STDIO pipe
server process stdin closes → server exits

The Fix

Adds a done flag and callback to ToolHandlerContext so that any in-flight progress messages are dropped once the tool handler has returned. After handler.execute() resolves, we call context.done?.() which sets the flag to true, causing subsequent sendProgress calls to short-circuit immediately.

Related Issues

Related to #87 (server timeouts after tool calls — symptom of this crash)
Related to #62 (MCP server disconnects in Codex context)

Summary by CodeRabbit

New Features
- Added a completion callback to tool handler progress reporting to stop further progress updates once an operation finishes, improving progress accuracy.
Chores
- Updated server startup configuration for improved reliability.

coderabbitai · 2026-02-20T16:44:17Z

Walkthrough

Updates the MCP config to use an absolute path for the codex-cli command and adds a done completion callback to the tool handler progress context; server code now suppresses progress after done() and invokes it when handlers finish.

Changes

Cohort / File(s)	Summary
Configuration `.mcp.json`	Replaced relative `dist/index.js` CLI argument with an absolute path in the codex-cli command args.
Type System `src/types.ts`	Added optional `done?: () => void` to `ToolHandlerContext` to signal completion.
Server Implementation `src/server.ts`	Added completion flag handling and `done()` invocation: progress sending ignores messages after completion; server calls `context.done()` (if present) after handler finishes and returns the handler result.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Server as MCP Server
    participant Handler as ToolHandler
    participant Progress as ProgressEmitter

    Client->>Server: invoke tool
    Server->>Handler: call handler(context)
    Handler->>Progress: sendProgress(...)
    Handler->>Handler: finishes work
    Handler->>Server: returns result
    Server->>Handler: invoke context.done() (if provided)
    Server->>Progress: (suppress) further sendProgress after done
    Server->>Client: return handler result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat(mcp): add tool progress notifications and streaming command output #88 — Modifies ToolHandlerContext and progress streaming/management; touches related progress APIs and types, likely overlapping with the done() and progress suppression changes.

Poem

🐰 I hopped through logs and progress streams,
Poked a done() to end the dreams,
No more ticks after the bell,
Handlers finish — all is well,
A rabbit cheers in quiet beams!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: fixing a progress notification race condition crash. It accurately reflects the core problem and solution implemented across src/server.ts, src/types.ts, and .mcp.json.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…pletion Adds a `done` flag and callback to `ToolHandlerContext` so that any in-flight progress messages are dropped once the tool handler has returned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: jobordu <jonathanborduas@gmail.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/server.ts (2)
88-90: context.done?.() is skipped when handler.execute() throws — use finally.

If the handler rejects (e.g., CLI spawn failure), the code jumps to the catch at line 91 and done is never set to true. Any queued-but-not-yet-started sendProgress callbacks from the streaming phase will then skip the done guard and attempt sendNotification with a token that is being torn down.
♻️ Proposed fix
-        const result = await handler.execute(args, context);
-        context.done?.();
-        return result;
+        const result = await handler.execute(args, context);
+        return result;
       } catch (error) {
         return {
           content: [
             {
               type: 'text',
               text: handleError(error, `tool "${name}"`),
             },
           ],
           isError: true,
         };
+      } finally {
+        context.done?.();
       }
Moving context.done?.() into a finally block guarantees the flag is set regardless of whether the handler succeeds or throws.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 88 - 90, The call to context.done?.() should be
moved out of the normal flow and into a finally block so it always runs even if
handler.execute throws; update the try/catch around await handler.execute(args,
context) to add a finally { context.done?.(); } (referencing handler.execute and
context.done) so queued sendProgress callbacks see the done flag during teardown
and avoid using torn-down tokens.
55-77: The race condition is narrow but real; however, the try/catch is sufficient for actual error scenarios.

The guard on line 60 correctly blocks any sendProgress call that hasn't started yet. Once sendProgress passes the check and suspends at await extra.sendNotification(...), setting done = true afterward won't prevent that notification from being sent—this narrow residual window is unavoidable without tracking all in-flight Promises.

However, the concern about "unhandled transport error events" for unknown/expired progressToken doesn't apply here. Per MCP specification, sendNotification is a JSON-RPC notification (fire-and-forget) that cannot receive error responses. Unknown tokens are silently handled by the receiver and do not generate transport-level errors. The sendNotification Promise can only reject due to local transport failures (e.g., write failed, transport closed), and the try/catch will correctly intercept those. No additional error handling is needed.

The current implementation is sufficient in practice. The narrow race window is acceptable given its limited scope.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 55 - 77, The narrow race where done can flip true
after the initial guard but before the awaited sendNotification can be
eliminated by re-checking done immediately before invoking
extra.sendNotification and by tracking in-flight sends; update sendProgress (and
the surrounding scope using progressToken, done, progressCount) to (1)
capture/compute the params, then if (done) return before calling await
extra.sendNotification, and (2) optionally add each sendNotification Promise to
an inFlight Set (add before await, remove in finally) so you can cancel or await
outstanding notifications when marking done; this uses the existing
progressToken, sendProgress, done flag, progressCount, and
extra.sendNotification symbols to locate and implement the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.mcp.json:
- Line 7: Replace the hardcoded absolute path
"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" in .mcp.json with a
portable path (e.g. "dist/index.js" or "./dist/index.js" or a
configurable/env-driven value) so the server can run on other machines and CI;
locate the string in .mcp.json, remove the machine-specific path entry and
restore the intended relative path or configuration reference used by the
project.

---

Nitpick comments:
In `@src/server.ts`:
- Around line 88-90: The call to context.done?.() should be moved out of the
normal flow and into a finally block so it always runs even if handler.execute
throws; update the try/catch around await handler.execute(args, context) to add
a finally { context.done?.(); } (referencing handler.execute and context.done)
so queued sendProgress callbacks see the done flag during teardown and avoid
using torn-down tokens.
- Around line 55-77: The narrow race where done can flip true after the initial
guard but before the awaited sendNotification can be eliminated by re-checking
done immediately before invoking extra.sendNotification and by tracking
in-flight sends; update sendProgress (and the surrounding scope using
progressToken, done, progressCount) to (1) capture/compute the params, then if
(done) return before calling await extra.sendNotification, and (2) optionally
add each sendNotification Promise to an inFlight Set (add before await, remove
in finally) so you can cancel or await outstanding notifications when marking
done; this uses the existing progressToken, sendProgress, done flag,
progressCount, and extra.sendNotification symbols to locate and implement the
change.

coderabbitai · 2026-02-20T16:49:58Z

.mcp.json

      "command": "node",
      "args": [
-        "dist/index.js"
+        "/Users/jonathanborduas/code/codex-mcp-server/dist/index.js"


⚠️ Potential issue | 🔴 Critical

Revert the hardcoded absolute machine-specific path — this will break the server for everyone else.

/Users/jonathanborduas/code/codex-mcp-server/dist/index.js is specific to one developer's local environment. This path won't exist on any other machine, CI runner, or deployment target, causing the MCP server to fail to start.

This change appears to be an accidental local debugging artifact that should be reverted before merge.

🐛 Proposed fix

- "/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" + "dist/index.js"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js"

"dist/index.js"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.mcp.json at line 7, Replace the hardcoded absolute path "/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" in .mcp.json with a portable path (e.g. "dist/index.js" or "./dist/index.js" or a configurable/env-driven value) so the server can run on other machines and CI; locate the string in .mcp.json, remove the machine-specific path entry and restore the intended relative path or configuration reference used by the project.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/server.ts (1)

81-101: ⚠️ Potential issue | 🟠 Major

done() is never called when handler.execute() throws — the race condition remains on the error path.

context is declared inside the try block (line 87), making it inaccessible in the catch block. If execute() rejects, the catch returns an error response immediately — Claude Code deregisters the progressToken — but any sendProgress calls that were already past the if (!progressToken || done) return guard at line 60 will still attempt sendNotification with a deregistered token, triggering the exact crash this PR is fixing.

The cleanest fix is to hoist context outside the try block and use a finally clause to guarantee done() is always called:

🐛 Proposed fix using `try/finally`

-      try {
-        if (!this.isValidToolName(name)) {
-          throw new Error(`Unknown tool: ${name}`);
-        }
-
-        const handler = toolHandlers[name];
-        const context = createProgressContext();
-        const result = await handler.execute(args, context);
-        context.done?.();
-        return result;
-      } catch (error) {
-        return {
-          content: [
-            {
-              type: 'text',
-              text: handleError(error, `tool "${name}"`),
-            },
-          ],
-          isError: true,
-        };
-      }
+      let context: ToolHandlerContext | undefined;
+      try {
+        if (!this.isValidToolName(name)) {
+          throw new Error(`Unknown tool: ${name}`);
+        }
+
+        const handler = toolHandlers[name];
+        context = createProgressContext();
+        const result = await handler.execute(args, context);
+        return result;
+      } catch (error) {
+        return {
+          content: [
+            {
+              type: 'text',
+              text: handleError(error, `tool "${name}"`),
+            },
+          ],
+          isError: true,
+        };
+      } finally {
+        context?.done?.();
+      }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 81 - 101, The progress context is created inside
the try so context.done() is never guaranteed on error; hoist the context by
declaring e.g. "let context; // ProgressContext | undefined" before the try,
then after validating the tool name and fetching the handler assign "context =
createProgressContext()", call "await handler.execute(args, context)" inside the
try, and add a finally block that calls "context?.done?.()" so done() always
runs whether execute() resolves or rejects; refer to isValidToolName,
toolHandlers, createProgressContext, handler.execute, and context.done when
making the change.

🧹 Nitpick comments (1)

src/server.ts (1)
59-77: Remaining in-flight race window for sendProgress calls already past the done guard.

The done flag prevents new sendProgress invocations after completion, but it does not cancel a sendProgress call that has already passed the if (!progressToken || done) return check (line 60) and is suspended at await extra.sendNotification(...) (line 64). If handler.execute() resolves while such a notification is in-flight on the event loop, the tool result is returned, the token is deregistered, and the pending sendNotification still fires against a dead token.

The window is significantly narrower than before this fix (requiring a notification to be mid-send at the exact moment execution completes), and the existing catch on line 73 silently absorbs the resulting error — so in practice this won't crash the server. However, for complete correctness, consider tracking the set of pending notification promises and draining them before returning:
♻️ Suggested approach — drain in-flight notifications
 const createProgressContext = (): ToolHandlerContext => {
   let progressCount = 0;
   let done = false;
+  const pending = new Set<Promise<void>>();
   return {
     progressToken,
+    flush: () => Promise.allSettled(pending),
     done: () => { done = true; },
     sendProgress: async (message: string, progress?: number, total?: number) => {
       if (!progressToken || done) return;
       progressCount++;
+      const p = (async () => {
         try {
           await extra.sendNotification({ /* ... */ });
         } catch (err) {
           console.error(chalk.yellow('Failed to send progress notification:'), err);
         }
+      })();
+      pending.add(p);
+      void p.finally(() => pending.delete(p));
     },
   };
 };

-      const result = await handler.execute(args, context);
-      context.done?.();
-      return result;
+      const result = await handler.execute(args, context);
+      context.done?.();
+      await (context as any).flush?.();
+      return result;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 59 - 77, The sendProgress closure (sendProgress)
can pass the done check and then await extra.sendNotification while
handler.execute completes and deregisters progressToken, causing in-flight sends
to target a dead token; fix by tracking in-flight notification promises: create
a Set to add the Promise returned by extra.sendNotification (from within
sendProgress) before awaiting it, remove the promise when it settles, and when
finishing the operation (just after handler.execute resolves and before
deregistering the token/returning the tool result) await Promise.allSettled on
that Set to drain pending notifications; reference sendProgress, progressToken,
done, extra.sendNotification and handler.execute to locate and update the code.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/server.ts`:
- Around line 81-101: The progress context is created inside the try so
context.done() is never guaranteed on error; hoist the context by declaring e.g.
"let context; // ProgressContext | undefined" before the try, then after
validating the tool name and fetching the handler assign "context =
createProgressContext()", call "await handler.execute(args, context)" inside the
try, and add a finally block that calls "context?.done?.()" so done() always
runs whether execute() resolves or rejects; refer to isValidToolName,
toolHandlers, createProgressContext, handler.execute, and context.done when
making the change.

---

Duplicate comments:
In @.mcp.json:
- Line 7: Replace the hardcoded absolute machine-specific path string
"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" with the generic
relative entry "dist/index.js" in the .mcp.json file so the config is portable;
locate the offending string and update it to exactly "dist/index.js".

---

Nitpick comments:
In `@src/server.ts`:
- Around line 59-77: The sendProgress closure (sendProgress) can pass the done
check and then await extra.sendNotification while handler.execute completes and
deregisters progressToken, causing in-flight sends to target a dead token; fix
by tracking in-flight notification promises: create a Set to add the Promise
returned by extra.sendNotification (from within sendProgress) before awaiting
it, remove the promise when it settles, and when finishing the operation (just
after handler.execute resolves and before deregistering the token/returning the
tool result) await Promise.allSettled on that Set to drain pending
notifications; reference sendProgress, progressToken, done,
extra.sendNotification and handler.execute to locate and update the code.

jobordu · 2026-02-20T16:54:57Z

FYI: the Cursor Code Review check is failing but this is a pre-existing infra issue on the repo — every PR/branch going back weeks shows the same failure (dependabot updates, other feature branches, etc.). It's unrelated to this PR's changes.

The DCO check is now green. Happy to address any feedback from the actual review.

jobordu force-pushed the fix/progress-after-done branch from abe540d to 5e19b80 Compare February 20, 2026 16:49

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

fix: close progress notification race condition that crashes the MCP server#114

fix: close progress notification race condition that crashes the MCP server#114
jobordu wants to merge 1 commit intotuannvm:mainfrom
LangBlaze-AI:fix/progress-after-done

jobordu commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

jobordu commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js"
	"dist/index.js"

Uh oh!

Comments

Conversation

jobordu commented Feb 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Bug: Progress Notification Race Condition

What goes wrong

The Fix

Related Issues

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

jobordu commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jobordu commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 20, 2026 •

edited

Loading