Skip to content

Comments

fix: close progress notification race condition that crashes the MCP server#114

Open
jobordu wants to merge 1 commit intotuannvm:mainfrom
LangBlaze-AI:fix/progress-after-done
Open

fix: close progress notification race condition that crashes the MCP server#114
jobordu wants to merge 1 commit intotuannvm:mainfrom
LangBlaze-AI:fix/progress-after-done

Conversation

@jobordu
Copy link

@jobordu jobordu commented Feb 20, 2026

The Bug: Progress Notification Race Condition

When Claude Code calls a tool like mcp__codex-cli__review, it attaches a progressToken to the request. The server uses this token to send incremental progress notifications back while the codex CLI is running.

What goes wrong

The server uses executeCommandStreaming, which pipes the CLI's stdout line-by-line into onProgress callbacks. Each callback fires sendProgress(), which sends a notifications/progress message over the STDIO pipe.

The problem is timing: sendProgress is async but the onProgress callback in the streaming handler is not awaited.

So the sequence actually looks like this:

  1. codex CLI finishes
  2. last onProgress fires → sendProgress() starts async send
  3. executeCommandStreaming returns
  4. handler.execute() returns the tool result
  5. server sends the CallTool response to Claude Code
  6. Claude Code receives the response, deregisters progressToken
  7. in-flight sendProgress() resolves → tries to send notifications/progress with deregistered token
  8. Claude Code: "unknown token" → drops the STDIO pipe
  9. server process stdin closes → server exits

The Fix

Adds a done flag and callback to ToolHandlerContext so that any in-flight progress messages are dropped once the tool handler has returned. After handler.execute() resolves, we call context.done?.() which sets the flag to true, causing subsequent sendProgress calls to short-circuit immediately.

Related Issues

Related to #87 (server timeouts after tool calls — symptom of this crash)
Related to #62 (MCP server disconnects in Codex context)

Summary by CodeRabbit

  • New Features

    • Added a completion callback to tool handler progress reporting to stop further progress updates once an operation finishes, improving progress accuracy.
  • Chores

    • Updated server startup configuration for improved reliability.

@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

Walkthrough

Updates the MCP config to use an absolute path for the codex-cli command and adds a done completion callback to the tool handler progress context; server code now suppresses progress after done() and invokes it when handlers finish.

Changes

Cohort / File(s) Summary
Configuration
​.mcp.json
Replaced relative dist/index.js CLI argument with an absolute path in the codex-cli command args.
Type System
src/types.ts
Added optional done?: () => void to ToolHandlerContext to signal completion.
Server Implementation
src/server.ts
Added completion flag handling and done() invocation: progress sending ignores messages after completion; server calls context.done() (if present) after handler finishes and returns the handler result.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Server as MCP Server
    participant Handler as ToolHandler
    participant Progress as ProgressEmitter

    Client->>Server: invoke tool
    Server->>Handler: call handler(context)
    Handler->>Progress: sendProgress(...)
    Handler->>Handler: finishes work
    Handler->>Server: returns result
    Server->>Handler: invoke context.done() (if provided)
    Server->>Progress: (suppress) further sendProgress after done
    Server->>Client: return handler result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through logs and progress streams,
Poked a done() to end the dreams,
No more ticks after the bell,
Handlers finish — all is well,
A rabbit cheers in quiet beams!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: fixing a progress notification race condition crash. It accurately reflects the core problem and solution implemented across src/server.ts, src/types.ts, and .mcp.json.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…pletion

Adds a `done` flag and callback to `ToolHandlerContext` so that any in-flight
progress messages are dropped once the tool handler has returned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: jobordu <jonathanborduas@gmail.com>
@jobordu jobordu force-pushed the fix/progress-after-done branch from abe540d to 5e19b80 Compare February 20, 2026 16:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/server.ts (2)

88-90: context.done?.() is skipped when handler.execute() throws — use finally.

If the handler rejects (e.g., CLI spawn failure), the code jumps to the catch at line 91 and done is never set to true. Any queued-but-not-yet-started sendProgress callbacks from the streaming phase will then skip the done guard and attempt sendNotification with a token that is being torn down.

♻️ Proposed fix
-        const result = await handler.execute(args, context);
-        context.done?.();
-        return result;
+        const result = await handler.execute(args, context);
+        return result;
       } catch (error) {
         return {
           content: [
             {
               type: 'text',
               text: handleError(error, `tool "${name}"`),
             },
           ],
           isError: true,
         };
+      } finally {
+        context.done?.();
       }

Moving context.done?.() into a finally block guarantees the flag is set regardless of whether the handler succeeds or throws.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 88 - 90, The call to context.done?.() should be
moved out of the normal flow and into a finally block so it always runs even if
handler.execute throws; update the try/catch around await handler.execute(args,
context) to add a finally { context.done?.(); } (referencing handler.execute and
context.done) so queued sendProgress callbacks see the done flag during teardown
and avoid using torn-down tokens.

55-77: The race condition is narrow but real; however, the try/catch is sufficient for actual error scenarios.

The guard on line 60 correctly blocks any sendProgress call that hasn't started yet. Once sendProgress passes the check and suspends at await extra.sendNotification(...), setting done = true afterward won't prevent that notification from being sent—this narrow residual window is unavoidable without tracking all in-flight Promises.

However, the concern about "unhandled transport error events" for unknown/expired progressToken doesn't apply here. Per MCP specification, sendNotification is a JSON-RPC notification (fire-and-forget) that cannot receive error responses. Unknown tokens are silently handled by the receiver and do not generate transport-level errors. The sendNotification Promise can only reject due to local transport failures (e.g., write failed, transport closed), and the try/catch will correctly intercept those. No additional error handling is needed.

The current implementation is sufficient in practice. The narrow race window is acceptable given its limited scope.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 55 - 77, The narrow race where done can flip true
after the initial guard but before the awaited sendNotification can be
eliminated by re-checking done immediately before invoking
extra.sendNotification and by tracking in-flight sends; update sendProgress (and
the surrounding scope using progressToken, done, progressCount) to (1)
capture/compute the params, then if (done) return before calling await
extra.sendNotification, and (2) optionally add each sendNotification Promise to
an inFlight Set (add before await, remove in finally) so you can cancel or await
outstanding notifications when marking done; this uses the existing
progressToken, sendProgress, done flag, progressCount, and
extra.sendNotification symbols to locate and implement the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.mcp.json:
- Line 7: Replace the hardcoded absolute path
"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" in .mcp.json with a
portable path (e.g. "dist/index.js" or "./dist/index.js" or a
configurable/env-driven value) so the server can run on other machines and CI;
locate the string in .mcp.json, remove the machine-specific path entry and
restore the intended relative path or configuration reference used by the
project.

---

Nitpick comments:
In `@src/server.ts`:
- Around line 88-90: The call to context.done?.() should be moved out of the
normal flow and into a finally block so it always runs even if handler.execute
throws; update the try/catch around await handler.execute(args, context) to add
a finally { context.done?.(); } (referencing handler.execute and context.done)
so queued sendProgress callbacks see the done flag during teardown and avoid
using torn-down tokens.
- Around line 55-77: The narrow race where done can flip true after the initial
guard but before the awaited sendNotification can be eliminated by re-checking
done immediately before invoking extra.sendNotification and by tracking
in-flight sends; update sendProgress (and the surrounding scope using
progressToken, done, progressCount) to (1) capture/compute the params, then if
(done) return before calling await extra.sendNotification, and (2) optionally
add each sendNotification Promise to an inFlight Set (add before await, remove
in finally) so you can cancel or await outstanding notifications when marking
done; this uses the existing progressToken, sendProgress, done flag,
progressCount, and extra.sendNotification symbols to locate and implement the
change.

"command": "node",
"args": [
"dist/index.js"
"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Revert the hardcoded absolute machine-specific path — this will break the server for everyone else.

/Users/jonathanborduas/code/codex-mcp-server/dist/index.js is specific to one developer's local environment. This path won't exist on any other machine, CI runner, or deployment target, causing the MCP server to fail to start.

This change appears to be an accidental local debugging artifact that should be reverted before merge.

🐛 Proposed fix
-        "/Users/jonathanborduas/code/codex-mcp-server/dist/index.js"
+        "dist/index.js"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js"
"dist/index.js"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.mcp.json at line 7, Replace the hardcoded absolute path
"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" in .mcp.json with a
portable path (e.g. "dist/index.js" or "./dist/index.js" or a
configurable/env-driven value) so the server can run on other machines and CI;
locate the string in .mcp.json, remove the machine-specific path entry and
restore the intended relative path or configuration reference used by the
project.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/server.ts (1)

81-101: ⚠️ Potential issue | 🟠 Major

done() is never called when handler.execute() throws — the race condition remains on the error path.

context is declared inside the try block (line 87), making it inaccessible in the catch block. If execute() rejects, the catch returns an error response immediately — Claude Code deregisters the progressToken — but any sendProgress calls that were already past the if (!progressToken || done) return guard at line 60 will still attempt sendNotification with a deregistered token, triggering the exact crash this PR is fixing.

The cleanest fix is to hoist context outside the try block and use a finally clause to guarantee done() is always called:

🐛 Proposed fix using `try/finally`
-      try {
-        if (!this.isValidToolName(name)) {
-          throw new Error(`Unknown tool: ${name}`);
-        }
-
-        const handler = toolHandlers[name];
-        const context = createProgressContext();
-        const result = await handler.execute(args, context);
-        context.done?.();
-        return result;
-      } catch (error) {
-        return {
-          content: [
-            {
-              type: 'text',
-              text: handleError(error, `tool "${name}"`),
-            },
-          ],
-          isError: true,
-        };
-      }
+      let context: ToolHandlerContext | undefined;
+      try {
+        if (!this.isValidToolName(name)) {
+          throw new Error(`Unknown tool: ${name}`);
+        }
+
+        const handler = toolHandlers[name];
+        context = createProgressContext();
+        const result = await handler.execute(args, context);
+        return result;
+      } catch (error) {
+        return {
+          content: [
+            {
+              type: 'text',
+              text: handleError(error, `tool "${name}"`),
+            },
+          ],
+          isError: true,
+        };
+      } finally {
+        context?.done?.();
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 81 - 101, The progress context is created inside
the try so context.done() is never guaranteed on error; hoist the context by
declaring e.g. "let context; // ProgressContext | undefined" before the try,
then after validating the tool name and fetching the handler assign "context =
createProgressContext()", call "await handler.execute(args, context)" inside the
try, and add a finally block that calls "context?.done?.()" so done() always
runs whether execute() resolves or rejects; refer to isValidToolName,
toolHandlers, createProgressContext, handler.execute, and context.done when
making the change.
🧹 Nitpick comments (1)
src/server.ts (1)

59-77: Remaining in-flight race window for sendProgress calls already past the done guard.

The done flag prevents new sendProgress invocations after completion, but it does not cancel a sendProgress call that has already passed the if (!progressToken || done) return check (line 60) and is suspended at await extra.sendNotification(...) (line 64). If handler.execute() resolves while such a notification is in-flight on the event loop, the tool result is returned, the token is deregistered, and the pending sendNotification still fires against a dead token.

The window is significantly narrower than before this fix (requiring a notification to be mid-send at the exact moment execution completes), and the existing catch on line 73 silently absorbs the resulting error — so in practice this won't crash the server. However, for complete correctness, consider tracking the set of pending notification promises and draining them before returning:

♻️ Suggested approach — drain in-flight notifications
 const createProgressContext = (): ToolHandlerContext => {
   let progressCount = 0;
   let done = false;
+  const pending = new Set<Promise<void>>();
   return {
     progressToken,
+    flush: () => Promise.allSettled(pending),
     done: () => { done = true; },
     sendProgress: async (message: string, progress?: number, total?: number) => {
       if (!progressToken || done) return;
       progressCount++;
+      const p = (async () => {
         try {
           await extra.sendNotification({ /* ... */ });
         } catch (err) {
           console.error(chalk.yellow('Failed to send progress notification:'), err);
         }
+      })();
+      pending.add(p);
+      void p.finally(() => pending.delete(p));
     },
   };
 };

-      const result = await handler.execute(args, context);
-      context.done?.();
-      return result;
+      const result = await handler.execute(args, context);
+      context.done?.();
+      await (context as any).flush?.();
+      return result;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 59 - 77, The sendProgress closure (sendProgress)
can pass the done check and then await extra.sendNotification while
handler.execute completes and deregisters progressToken, causing in-flight sends
to target a dead token; fix by tracking in-flight notification promises: create
a Set to add the Promise returned by extra.sendNotification (from within
sendProgress) before awaiting it, remove the promise when it settles, and when
finishing the operation (just after handler.execute resolves and before
deregistering the token/returning the tool result) await Promise.allSettled on
that Set to drain pending notifications; reference sendProgress, progressToken,
done, extra.sendNotification and handler.execute to locate and update the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/server.ts`:
- Around line 81-101: The progress context is created inside the try so
context.done() is never guaranteed on error; hoist the context by declaring e.g.
"let context; // ProgressContext | undefined" before the try, then after
validating the tool name and fetching the handler assign "context =
createProgressContext()", call "await handler.execute(args, context)" inside the
try, and add a finally block that calls "context?.done?.()" so done() always
runs whether execute() resolves or rejects; refer to isValidToolName,
toolHandlers, createProgressContext, handler.execute, and context.done when
making the change.

---

Duplicate comments:
In @.mcp.json:
- Line 7: Replace the hardcoded absolute machine-specific path string
"/Users/jonathanborduas/code/codex-mcp-server/dist/index.js" with the generic
relative entry "dist/index.js" in the .mcp.json file so the config is portable;
locate the offending string and update it to exactly "dist/index.js".

---

Nitpick comments:
In `@src/server.ts`:
- Around line 59-77: The sendProgress closure (sendProgress) can pass the done
check and then await extra.sendNotification while handler.execute completes and
deregisters progressToken, causing in-flight sends to target a dead token; fix
by tracking in-flight notification promises: create a Set to add the Promise
returned by extra.sendNotification (from within sendProgress) before awaiting
it, remove the promise when it settles, and when finishing the operation (just
after handler.execute resolves and before deregistering the token/returning the
tool result) await Promise.allSettled on that Set to drain pending
notifications; reference sendProgress, progressToken, done,
extra.sendNotification and handler.execute to locate and update the code.

@jobordu
Copy link
Author

jobordu commented Feb 20, 2026

FYI: the Cursor Code Review check is failing but this is a pre-existing infra issue on the repo — every PR/branch going back weeks shows the same failure (dependabot updates, other feature branches, etc.). It's unrelated to this PR's changes.

The DCO check is now green. Happy to address any feedback from the actual review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant