fix: clear terminalEventSeen on task restart to prevent stuck-after-planning (#1828)#1840
fix: clear terminalEventSeen on task restart to prevent stuck-after-planning (#1828)#1840
Conversation
…lanning (#1828) The terminalEventSeen Set in TaskStateManager was never cleared when a task was restarted. When spec_runner.py emits PLANNING_COMPLETE, the taskId is added to terminalEventSeen. If the subsequent coding process (run.py) fails, handleProcessExited() returns early because terminalEventSeen.has(taskId) is true, silently swallowing the PROCESS_EXITED event. The XState actor never transitions, leaving the task permanently stuck in 'coding' state. Additionally, lastSequenceByTask from the old process would cause events from a new process (starting at sequence 0) to be dropped as duplicates. Fix: Add prepareForRestart(taskId) method that clears both terminalEventSeen and lastSequenceByTask without stopping the XState actor. Call it in all 4 locations where a new agent process is started: - TASK_START handler - TASK_STOP handler (so subsequent restart works) - TASK_UPDATE_STATUS auto-start path - TASK_RECOVER_STUCK auto-restart path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello @AndyMik90, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug where tasks could become permanently stuck after the planning stage if they were restarted. The root cause was identified as stale tracking state ( Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughAdded proactive clearing of per-task tracking state by calling TaskStateManager.prepareForRestart(taskId) at multiple task restart/start points so restarted tasks don't inherit stale terminalEventSeen or lastSequenceByTask entries. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request provides an excellent fix for a subtle but critical bug causing tasks to get stuck. The root cause analysis is clear, and the solution of introducing a prepareForRestart method in TaskStateManager is clean and effective. You've done a great job applying this fix consistently to the four identified task restart paths.
While reviewing, I identified one additional scenario where this state-clearing logic is needed but appears to be missing. In the TASK_REVIEW handler within apps/frontend/src/main/ipc-handlers/task/execution-handlers.ts, when a review is rejected (!approved), agentManager.startQAProcess is called. This also initiates a new agent process for the task. If the previous run ended with a terminal event (e.g., ALL_SUBTASKS_DONE), terminalEventSeen would be set. Without clearing it, a failure in the new QA process could lead to the same "stuck task" bug.
I recommend adding a call to taskStateManager.prepareForRestart(taskId) before agentManager.startQAProcess is called in the rejection path of the TASK_REVIEW handler. Since this part of the file is not included in the current diff, I am providing this feedback here in the general summary.
With that addition, this fix will be even more robust.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/frontend/src/main/ipc-handlers/task/execution-handlers.ts (1)
495-506:⚠️ Potential issue | 🟠 MajorMissing
prepareForRestartbeforestartQAProcesson review rejection.When a review is rejected,
startQAProcessspawns a new QA process for the sametaskId(line 499). If a terminal event (e.g.ALL_SUBTASKS_DONE,QA_PASSED) was previously recorded interminalEventSeenfor this task, the new QA process's exit will be swallowed byhandleProcessExited(which returns early at line 78 whenterminalEventSeen.has(taskId)) — the same class of bug this PR fixes at the other four call sites.Proposed fix
+ // Clear stale tracking state before starting a new QA process + taskStateManager.prepareForRestart(taskId); + // Restart QA process - use worktree path if it exists, otherwise main project // The QA process needs to run where the implementation_plan.json with completed subtasks is const qaProjectPath = hasWorktree ? worktreePath : project.path;
AndyMik90
left a comment
There was a problem hiding this comment.
🤖 Auto Claude PR Review
Merge Verdict: 🟠 NEEDS REVISION
🟠 Needs revision - 1 issue(s) require attention.
1 issue(s) must be addressed (0 required, 1 recommended)
Risk Assessment
| Factor | Level | Notes |
|---|---|---|
| Complexity | Low | Based on lines changed |
| Security Impact | None | Based on security findings |
| Scope Coherence | Good | Based on structural review |
Findings Summary
- Medium: 1 issue(s)
Generated by Auto Claude PR Review
Findings (1 selected of 1 total)
🟡 [4ae1f291d305] [MEDIUM] Missing prepareForRestart in TASK_REVIEW rejection path
📁 apps/frontend/src/main/ipc-handlers/task/execution-handlers.ts:499
The PR description states prepareForRestart is called in "all 4 locations where a new agent process is started," but there are actually 5 locations. The TASK_REVIEW rejection handler starts a new QA fixer process via agentManager.startQAProcess() (line 499) without calling prepareForRestart(). When a task reaches human_review, a terminal event (e.g., ALL_SUBTASKS_DONE or QA_PASSED) was recorded during the previous execution, so terminalEventSeen.has(taskId) is true. If the QA fixer process crashes without emitting a terminal event, handleProcessExited() (task-state-manager.ts line 77) checks terminalEventSeen.has(taskId) → returns early → XState actor never receives PROCESS_EXITED → task gets stuck. Additionally, lastSequenceByTask retains the old high-water mark, so events from the new QA process with lower sequence numbers would be dropped by isNewSequence() (task-state-manager.ts line 386). This is the exact same bug class this PR fixes in the other 4 locations. | The PR establishes a clear pattern: call prepareForRestart(taskId) before starting any new process for an existing task. This is applied in 4 locations (TASK_START, TASK_STOP, TASK_UPDATE_STATUS auto-start, TASK_RECOVER_STUCK auto-restart). However, the TASK_REVIEW rejection path at line 499 calls agentManager.startQAProcess() to start a new QA fixer process without calling prepareForRestart() first. This creates the exact same stale-state scenario the PR fixes: terminalEventSeen would still contain the taskId from the previous execution's QA_PASSED/ALL_SUBTASKS_DONE event, causing handleProcessExited() to swallow exit events if the QA fixer crashes.
Searched Grep('prepareForRestart|startQAProcess|startTaskExecution|startSpecCreation', 'apps/frontend/src/main/ipc-handlers/task/execution-handlers.ts') - confirmed startQAProcess at line 499 has no preceding prepareForRestart call, while all other start* calls do.
Suggested fix:
Add `taskStateManager.prepareForRestart(taskId);` before line 499 (the `agentManager.startQAProcess()` call), matching the pattern used at the other 4 call sites. For example:
```typescript
// Clear stale tracking state before starting new QA process
taskStateManager.prepareForRestart(taskId);
agentManager.startQAProcess(taskId, qaProjectPath, task.specId, project.id);
---
*This review was generated by Auto Claude.*
Add missing prepareForRestart(taskId) call before startQAProcess() in the TASK_REVIEW rejection handler. This is the 5th location where a new agent process is started for an existing task, but was missed in the original fix. Without this, if the QA fixer process crashes after a review rejection, terminalEventSeen would cause handleProcessExited() to swallow the exit event, leaving the task permanently stuck. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Fixes the root cause of tasks getting permanently stuck after the planning stage completes. The
terminalEventSeenSet inTaskStateManagerwas never cleared when a task was restarted, causinghandleProcessExited()to silently swallowPROCESS_EXITEDevents. The XState actor never received the transition, leaving the task permanently stuck in the 'coding' state with no way to recover.Root cause: When
spec_runner.pyemitsPLANNING_COMPLETE(a terminal event),terminalEventSeen.add(taskId)is called. If the subsequent coding process (run.py) fails or crashes,handleProcessExited()checksterminalEventSeen.has(taskId)and returns early, never sendingPROCESS_EXITEDto the XState actor. Restarting the task doesn't help becauseTASK_STARTnever cleared the stale entry.Changes
task-state-manager.ts: AddedprepareForRestart(taskId)method that clearsterminalEventSeenandlastSequenceByTaskfor a task without stopping the XState actorexecution-handlers.ts: CalledprepareForRestart(taskId)in all 4 locations where a new agent process is started:TASK_STARThandler - before XState event dispatchTASK_STOPhandler - after USER_STOPPED, so subsequent restart worksTASK_UPDATE_STATUSauto-start path - when dragging card to in_progressTASK_RECOVER_STUCKauto-restart path - when recovering stuck tasksCloses #1828
Summary by CodeRabbit