Skip to content

fix: resolve QA validation deadlock when subtasks are stuck or failed#1853

Open
AndyMik90 wants to merge 1 commit intodevelopfrom
fix/qa-validation-stuck-subtasks
Open

fix: resolve QA validation deadlock when subtasks are stuck or failed#1853
AndyMik90 wants to merge 1 commit intodevelopfrom
fix/qa-validation-stuck-subtasks

Conversation

@AndyMik90
Copy link
Owner

@AndyMik90 AndyMik90 commented Feb 16, 2026

Summary

  • Fixes QA validation deadlock where stuck/failed subtasks block QA from ever starting
  • Adds is_build_ready_for_qa() that considers builds ready when all subtasks reach a terminal state (completed, failed, or stuck)
  • Updates mark_subtask_stuck() to sync implementation_plan.json status to "failed"
  • Reorders QA loop to honor human feedback (QA_FIX_REQUEST.md) before checking build completeness
  • Replaces is_build_complete() with is_build_ready_for_qa() in QA validation paths

Problem

When a coding agent marks a subtask as "stuck" (e.g., "Manual testing on multi-monitor setup"):

  1. get_next_subtask() skips stuck subtasks and returns None → coder exits
  2. should_run_qa() calls is_build_complete() which requires ALL subtasks status == "completed" → returns False
  3. QA never starts, creating a permanent deadlock
  4. Even clicking "proceed with QA" was blocked because the build completeness check ran before the human feedback check

Changes

File Change
core/progress.py Add is_build_ready_for_qa() — checks terminal states (completed/failed/stuck)
services/recovery.py mark_subtask_stuck() now also sets status="failed" in implementation_plan.json
qa/loop.py Reorder: check human feedback before build completeness; use is_build_ready_for_qa()
qa/criteria.py should_run_qa() uses is_build_ready_for_qa() instead of is_build_complete()
cli/qa_commands.py Updated error messages for new semantics
progress.py Re-export is_build_ready_for_qa in module facade

Test plan

  • 17 new unit tests for is_build_ready_for_qa() covering: all completed, mixed terminal states, stuck subtasks via attempt_history, missing files, empty phases, invalid JSON, graceful fallbacks
  • 3 new tests for mark_subtask_stuck() plan update: verifies status change to "failed", missing subtask handling, missing plan file handling
  • Updated 5 existing test mock targets from is_build_complete to is_build_ready_for_qa
  • All 2582 tests pass (pre-commit hook verified)
  • Ruff lint and format clean

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved QA validation messaging for incomplete builds with clearer status descriptions
    • Enhanced handling of failed and stuck subtasks in build readiness assessments
  • New Features

    • QA validation now proceeds when all subtasks reach terminal states (completed, failed, or stuck)
    • Stuck subtasks are now properly tracked and recorded in build plans

When a coding agent marks a subtask as "stuck", QA validation would never
start because is_build_complete() requires ALL subtasks to have status
"completed". This creates a deadlock: coder exits (no more subtasks to
work on), but QA never triggers.

Changes:
- Add is_build_ready_for_qa() that considers builds ready when all
  subtasks reach a terminal state (completed, failed, or stuck)
- Update mark_subtask_stuck() to also set status="failed" in
  implementation_plan.json, keeping plan file in sync with reality
- Reorder QA loop to check human feedback before build completeness,
  so QA_FIX_REQUEST.md bypasses the build gate as intended
- Replace is_build_complete() with is_build_ready_for_qa() in
  should_run_qa() and CLI qa commands
- Add 20 new tests covering is_build_ready_for_qa() edge cases and
  mark_subtask_stuck() plan update behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added area/backend This is backend only bug Something isn't working size/L Large (500-999 lines) labels Feb 16, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @AndyMik90, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a significant issue in the QA validation workflow where the system could enter a deadlock state if any subtasks were marked as stuck or failed. By introducing a more flexible definition of 'build readiness' that accounts for all terminal subtask states, and by ensuring that human feedback takes precedence in the QA loop, the system can now gracefully handle non-completed subtasks and proceed with QA. Additionally, the change improves data consistency by updating the implementation plan when subtasks become stuck.

Highlights

  • QA Deadlock Resolution: Fixed a critical QA validation deadlock where subtasks marked as 'stuck' or 'failed' would prevent the QA process from ever starting, leading to an infinite loop.
  • Enhanced Build Readiness Check: Introduced a new is_build_ready_for_qa() function that considers a build ready for QA if all subtasks have reached any terminal state (completed, failed, or stuck), rather than strictly requiring all to be 'completed'.
  • Subtask Status Synchronization: Modified mark_subtask_stuck() to automatically update the corresponding subtask's status to 'failed' within implementation_plan.json, ensuring consistency across the system.
  • QA Loop Logic Reordering: Reordered the QA validation loop to prioritize human feedback (via QA_FIX_REQUEST.md) before performing the build completeness check, allowing manual overrides to proceed.
  • Refactored QA Validation Paths: Replaced all instances of is_build_complete() with the more robust is_build_ready_for_qa() in relevant QA validation logic.
Changelog
  • apps/backend/cli/qa_commands.py
    • Updated error messages to reflect the new QA readiness criteria, informing users that all subtasks must reach a terminal state.
  • apps/backend/core/progress.py
    • Added is_build_ready_for_qa function, which checks if all subtasks are in a terminal state (completed, failed, or stuck).
  • apps/backend/progress.py
    • Exported is_build_ready_for_qa in the module's public interface.
  • apps/backend/qa/criteria.py
    • Modified should_run_qa to use is_build_ready_for_qa instead of is_build_complete.
  • apps/backend/qa/loop.py
    • Reordered the QA loop to check for human feedback (QA_FIX_REQUEST.md) before evaluating build readiness.
    • Replaced is_build_complete with is_build_ready_for_qa in the build readiness check.
  • apps/backend/services/recovery.py
    • Enhanced mark_subtask_stuck to also update the subtask's status to 'failed' in implementation_plan.json.
  • tests/agents/test_agent_flow.py
    • Updated test mocks to reference is_build_ready_for_qa instead of is_build_complete.
  • tests/test_cli_qa_commands.py
    • Modified test assertion for the updated 'Build not ready for QA' message.
  • tests/test_progress_qa_readiness.py
    • Added a new file containing 17 unit tests for the is_build_ready_for_qa function, covering various scenarios.
  • tests/test_qa_criteria.py
    • Updated existing tests to patch is_build_ready_for_qa instead of is_build_complete.
  • tests/test_recovery.py
    • Added 3 new tests for mark_subtask_stuck to verify its interaction with implementation_plan.json updates.
Activity
  • 17 new unit tests were added for the is_build_ready_for_qa() function, covering various terminal states, edge cases, and error handling.
  • 3 new tests were added for mark_subtask_stuck() to validate its new functionality of updating implementation_plan.json.
  • 5 existing test mock targets were updated to reflect the transition from is_build_complete to is_build_ready_for_qa.
  • All 2582 existing tests passed, and code quality checks (Ruff lint and format) were clean.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 16, 2026

📝 Walkthrough

Walkthrough

The pull request introduces a new QA readiness criterion that accepts subtasks in any terminal state (completed, failed, or stuck), rather than requiring all subtasks to be strictly completed. It adds is_build_ready_for_qa() function, updates QA gating logic to use this new criterion, persists stuck status to the plan, updates messaging, and adjusts tests accordingly.

Changes

Cohort / File(s) Summary
Build readiness logic
apps/backend/core/progress.py, apps/backend/progress.py
Introduces is_build_ready_for_qa() function that checks if all subtasks are in a terminal state (completed, failed, or stuck) by reading implementation_plan.json and attempt_history.json. Re-exported from core to public API.
QA gating updates
apps/backend/qa/criteria.py, apps/backend/qa/loop.py
Replaces is_build_complete with is_build_ready_for_qa in readiness checks. qa/loop.py additionally prioritizes human feedback: skips readiness check if QA_FIX_REQUEST.md exists, otherwise enforces the new readiness criterion.
CLI messaging
apps/backend/cli/qa_commands.py
Updates QA validation messages to reflect new readiness semantics: "Build not ready for QA" and "All subtasks must reach a terminal state" instead of referencing incomplete subtasks.
Recovery system
apps/backend/services/recovery.py
Enhances mark_subtask_stuck to propagate stuck status to implementation_plan.json by updating matching subtasks to "failed" status with notes.
Test updates
tests/agents/test_agent_flow.py, tests/test_cli_qa_commands.py, tests/test_progress_qa_readiness.py, tests/test_qa_criteria.py, tests/test_recovery.py
Updates all QA-related test patches from is_build_complete to is_build_ready_for_qa. Adds comprehensive test suite for new readiness function (418 lines) covering terminal states, missing files, malformed JSON, and stuck subtask scenarios. Adds tests for plan persistence in recovery system.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

bug, area/backend, size/M

Poem

🐰 A rabbit hops through QA's gate,
No longer waiting for the "complete" state—
Terminal states are now the way,
Stuck or failed, let's QA today!
The plan remembers, the tests align,
Our builds now ready, and that's just fine! 🌿

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: resolving a QA validation deadlock by allowing QA to proceed when subtasks are stuck or failed, which is the core objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into develop

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/qa-validation-stuck-subtasks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a critical deadlock in the QA validation loop by introducing a new is_build_ready_for_qa check that considers failed and stuck subtasks. The logic is sound, and the changes are applied consistently across the relevant modules. The reordering of the QA loop to prioritize human feedback is also a great improvement. The addition of comprehensive unit tests for the new and modified logic is excellent. I have one suggestion regarding data model consistency in services/recovery.py to prevent potential data loss.

Comment on lines +529 to +530
subtask["status"] = "failed"
subtask["notes"] = f"Marked as stuck: {reason}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The introduction of a new notes field here is inconsistent with the Subtask data model defined in implementation_plan/subtask.py. The Subtask class does not have a notes field, which means this data will be lost if the implementation_plan.json is ever loaded and resaved using the ImplementationPlan model objects.

To maintain data model consistency, I recommend using the existing actual_output field, which seems suitable for this kind of information. The Subtask.fail() method already uses this field for failure reasons.

Please also update the new test test_mark_subtask_stuck_updates_plan in tests/test_recovery.py to assert on the actual_output field instead of notes.

Suggested change
subtask["status"] = "failed"
subtask["notes"] = f"Marked as stuck: {reason}"
subtask["status"] = "failed"
subtask["actual_output"] = f"Marked as stuck: {reason}"

for entry in attempt_history.get("stuck_subtasks", [])
if "subtask_id" in entry
}
except (OSError, json.JSONDecodeError, UnicodeDecodeError):

Check notice

Code scanning / CodeQL

Empty except Note

'except' clause does nothing but pass and there is no explanatory comment.

Copilot Autofix

AI 1 day ago

In general, empty except blocks should either be removed (letting the exception propagate) or should perform some explicit handling such as logging, metrics, or substituting a safe default with a clear explanation. If ignoring an exception is truly desired, there should at least be a comment and usually a log statement to document that decision.

For this specific case in is_build_ready_for_qa, we want to keep the existing functional behavior: if attempt_history.json cannot be read or parsed, the function should behave as though there are no stuck subtasks (stuck_subtask_ids remains an empty set) and continue. The best fix is therefore:

  • In the except (OSError, json.JSONDecodeError, UnicodeDecodeError): block around loading attempt_history_file, add a call to the existing logger to record the failure, and optionally a brief doc comment that it is intentionally falling back to assuming no stuck subtasks.
  • Keep returning with the same logic downstream; do not re‑raise.

Only the block at lines 140–149 in apps/backend/core/progress.py needs to change. No new imports are required because logging and logger are already present at the top of the file.

Suggested changeset 1
apps/backend/core/progress.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/apps/backend/core/progress.py b/apps/backend/core/progress.py
--- a/apps/backend/core/progress.py
+++ b/apps/backend/core/progress.py
@@ -145,8 +145,13 @@
                 for entry in attempt_history.get("stuck_subtasks", [])
                 if "subtask_id" in entry
             }
-        except (OSError, json.JSONDecodeError, UnicodeDecodeError):
-            pass
+        except (OSError, json.JSONDecodeError, UnicodeDecodeError) as exc:
+            # Best-effort: if attempt history cannot be read, proceed assuming no stuck subtasks.
+            logger.warning(
+                "Failed to load attempt history from %s: %s",
+                attempt_history_file,
+                exc,
+            )
 
     try:
         with open(plan_file, encoding="utf-8") as f:
EOF
@@ -145,8 +145,13 @@
for entry in attempt_history.get("stuck_subtasks", [])
if "subtask_id" in entry
}
except (OSError, json.JSONDecodeError, UnicodeDecodeError):
pass
except (OSError, json.JSONDecodeError, UnicodeDecodeError) as exc:
# Best-effort: if attempt history cannot be read, proceed assuming no stuck subtasks.
logger.warning(
"Failed to load attempt history from %s: %s",
attempt_history_file,
exc,
)

try:
with open(plan_file, encoding="utf-8") as f:
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_qa_criteria.py (1)

74-78: 🧹 Nitpick | 🔵 Trivial

Mock module doesn't explicitly set is_build_ready_for_qa — relies on MagicMock auto-attribute.

The module-level mock at line 77 explicitly sets is_build_complete but not is_build_ready_for_qa. Since qa.criteria now imports is_build_ready_for_qa from progress, it silently gets a MagicMock (truthy by default). Tests that forget to patch qa.criteria.is_build_ready_for_qa would see it return a truthy mock, potentially masking failures.

Consider explicitly setting the default:

Proposed fix
 mock_progress.is_build_complete = MagicMock(return_value=True)
+mock_progress.is_build_ready_for_qa = MagicMock(return_value=True)
🤖 Fix all issues with AI agents
In `@apps/backend/core/progress.py`:
- Around line 136-149: The duplicated logic that reads attempt_history.json and
builds stuck_subtask_ids appears in two places (the shown block and inside
get_next_subtask); extract that logic into a single helper function named
_load_stuck_subtask_ids(spec_dir) that returns a set of subtask IDs, move the
try/except, file path construction (spec_dir / "memory" /
"attempt_history.json"), JSON loading, and the comprehension into that helper,
keep the same exception handling (OSError, json.JSONDecodeError,
UnicodeDecodeError) and return an empty set on error, then replace both call
sites with stuck_subtask_ids = _load_stuck_subtask_ids(spec_dir) so both callers
use the shared implementation.

In `@apps/backend/qa/loop.py`:
- Around line 117-133: The QA readiness failure message currently prints only
completed subtasks using count_subtasks(), which mismatches
is_build_ready_for_qa()'s terminal-state logic; update the failure branch in
qa_loop to call count_subtasks_detailed(spec_dir) (or otherwise compute terminal
states) and print a progress line like "Progress: {terminal}/{total} subtasks in
terminal state" and optionally include a breakdown (completed/failed/stuck) so
the message aligns with is_build_ready_for_qa() semantics; keep
debug_warning("qa_loop", ...) and debug("qa_loop", ...) as-is but use the
detailed counts when formatting the user-facing print.

In `@apps/backend/services/recovery.py`:
- Around line 517-542: The current implementation in the block that opens
implementation_plan.json unconditionally overwrites subtask["notes"] when
marking a subtask failed (in the loop referencing self.spec_dir, subtask_id and
plan["phases"]/subtasks), which loses prior notes; change it to preserve
existing notes by prepending or appending the "Marked as stuck: {reason}" text
to the existing subtask.get("notes", "") (e.g., build a new_notes variable that
combines the new marker and the prior notes with a separator only if prior notes
exist), then set subtask["notes"] = new_notes before writing the updated plan
back to disk, leaving the rest of the error handling (logger.warning on
exceptions) intact.

Comment on lines +136 to +149
# Load stuck subtask IDs from attempt_history.json
stuck_subtask_ids = set()
attempt_history_file = spec_dir / "memory" / "attempt_history.json"
if attempt_history_file.exists():
try:
with open(attempt_history_file, encoding="utf-8") as f:
attempt_history = json.load(f)
stuck_subtask_ids = {
entry["subtask_id"]
for entry in attempt_history.get("stuck_subtasks", [])
if "subtask_id" in entry
}
except (OSError, json.JSONDecodeError, UnicodeDecodeError):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Duplicated stuck-subtask loading logic — extract a shared helper.

The block that reads attempt_history.json and builds stuck_subtask_ids is duplicated almost verbatim at lines 479–493 in get_next_subtask(). Extracting a shared helper reduces maintenance burden and ensures both callers stay in sync.

Proposed helper
+def _load_stuck_subtask_ids(spec_dir: Path) -> set[str]:
+    """Load IDs of subtasks marked as stuck from attempt_history.json."""
+    stuck_subtask_ids: set[str] = set()
+    attempt_history_file = spec_dir / "memory" / "attempt_history.json"
+    if attempt_history_file.exists():
+        try:
+            with open(attempt_history_file, encoding="utf-8") as f:
+                attempt_history = json.load(f)
+            stuck_subtask_ids = {
+                entry["subtask_id"]
+                for entry in attempt_history.get("stuck_subtasks", [])
+                if "subtask_id" in entry
+            }
+        except (OSError, json.JSONDecodeError, UnicodeDecodeError):
+            pass
+    return stuck_subtask_ids

Then replace both call sites with stuck_subtask_ids = _load_stuck_subtask_ids(spec_dir).

🤖 Prompt for AI Agents
In `@apps/backend/core/progress.py` around lines 136 - 149, The duplicated logic
that reads attempt_history.json and builds stuck_subtask_ids appears in two
places (the shown block and inside get_next_subtask); extract that logic into a
single helper function named _load_stuck_subtask_ids(spec_dir) that returns a
set of subtask IDs, move the try/except, file path construction (spec_dir /
"memory" / "attempt_history.json"), JSON loading, and the comprehension into
that helper, keep the same exception handling (OSError, json.JSONDecodeError,
UnicodeDecodeError) and return an empty set on error, then replace both call
sites with stuck_subtask_ids = _load_stuck_subtask_ids(spec_dir) so both callers
use the shared implementation.

Comment on lines +117 to +133
# Check if there's pending human feedback that needs to be processed
fix_request_file = spec_dir / "QA_FIX_REQUEST.md"
has_human_feedback = fix_request_file.exists()

# Human feedback takes priority — if the user explicitly asked to proceed,
# skip the build completeness gate entirely
if not has_human_feedback:
# Verify build is ready for QA (all subtasks in terminal state)
if not is_build_ready_for_qa(spec_dir):
debug_warning(
"qa_loop", "Build is not ready for QA - subtasks still in progress"
)
print("\n❌ Build is not ready for QA validation.")
completed, total = count_subtasks(spec_dir)
debug("qa_loop", "Build progress", completed=completed, total=total)
print(f" Progress: {completed}/{total} subtasks completed")
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the count_subtasks function definition
rg -n 'def count_subtasks' --type=py -A 15

Repository: AndyMik90/Auto-Claude

Length of output: 1873


🏁 Script executed:

#!/bin/bash
# Also search for is_build_ready_for_qa to understand the terminal state logic
rg -n 'def is_build_ready_for_qa' --type=py -A 20

Repository: AndyMik90/Auto-Claude

Length of output: 1568


🏁 Script executed:

#!/bin/bash
# Check what files are in the qa directory
find apps/backend/qa -type f -name "*.py" | head -20

Repository: AndyMik90/Auto-Claude

Length of output: 250


🏁 Script executed:

#!/bin/bash
# Get full count_subtasks implementation
sed -n '33,65p' apps/backend/core/progress.py

Repository: AndyMik90/Auto-Claude

Length of output: 911


🏁 Script executed:

#!/bin/bash
# Get full is_build_ready_for_qa implementation
sed -n '118,170p' apps/backend/core/progress.py

Repository: AndyMik90/Auto-Claude

Length of output: 1926


🏁 Script executed:

#!/bin/bash
# Check count_subtasks_detailed full implementation
sed -n '66,117p' apps/backend/core/progress.py

Repository: AndyMik90/Auto-Claude

Length of output: 1370


🏁 Script executed:

#!/bin/bash
# Get the context around line 130 in loop.py
sed -n '115,135p' apps/backend/qa/loop.py

Repository: AndyMik90/Auto-Claude

Length of output: 1087


🏁 Script executed:

#!/bin/bash
# Check if count_subtasks_detailed is used anywhere in loop.py or related files
rg -n 'count_subtasks_detailed' --type=py apps/backend/qa/

Repository: AndyMik90/Auto-Claude

Length of output: 47


Update the progress message to reflect terminal state semantics.

When the QA readiness gate fails (line 130), the message shows count_subtasks() output which reports only completed subtasks. However, is_build_ready_for_qa() checks for terminal states (completed, failed, or stuck). This creates a mismatch: if 3 of 5 subtasks are completed, 1 failed, and 1 in-progress, the user sees "Progress: 3/5 subtasks completed" — which doesn't explain why QA is blocked.

Either update the message to show terminal/total count (e.g., "4/5 subtasks in terminal state") to align with the gate's semantics, or use count_subtasks_detailed() to show the breakdown of non-terminal subtasks.

🤖 Prompt for AI Agents
In `@apps/backend/qa/loop.py` around lines 117 - 133, The QA readiness failure
message currently prints only completed subtasks using count_subtasks(), which
mismatches is_build_ready_for_qa()'s terminal-state logic; update the failure
branch in qa_loop to call count_subtasks_detailed(spec_dir) (or otherwise
compute terminal states) and print a progress line like "Progress:
{terminal}/{total} subtasks in terminal state" and optionally include a
breakdown (completed/failed/stuck) so the message aligns with
is_build_ready_for_qa() semantics; keep debug_warning("qa_loop", ...) and
debug("qa_loop", ...) as-is but use the detailed counts when formatting the
user-facing print.

Comment on lines +517 to +542
# Also update the subtask status in implementation_plan.json
# so that other callers (like is_build_ready_for_qa) see accurate status
try:
plan_file = self.spec_dir / "implementation_plan.json"
if plan_file.exists():
with open(plan_file, encoding="utf-8") as f:
plan = json.load(f)

updated = False
for phase in plan.get("phases", []):
for subtask in phase.get("subtasks", []):
if subtask.get("id") == subtask_id:
subtask["status"] = "failed"
subtask["notes"] = f"Marked as stuck: {reason}"
updated = True
break
if updated:
break

if updated:
with open(plan_file, "w", encoding="utf-8") as f:
json.dump(plan, f, indent=2)
except (OSError, json.JSONDecodeError, UnicodeDecodeError) as e:
logger.warning(
f"Failed to update implementation_plan.json for stuck subtask {subtask_id}: {e}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Overwrites existing notes field — consider preserving prior notes.

Line 530 unconditionally sets subtask["notes"], discarding any pre-existing value. If subtasks carry notes from earlier stages (e.g., planner annotations), that context is silently lost.

Proposed fix: prepend stuck reason to existing notes
-                            subtask["notes"] = f"Marked as stuck: {reason}"
+                            stuck_note = f"Marked as stuck: {reason}"
+                            existing = subtask.get("notes", "")
+                            subtask["notes"] = (
+                                f"{stuck_note}\n{existing}" if existing else stuck_note
+                            )
🤖 Prompt for AI Agents
In `@apps/backend/services/recovery.py` around lines 517 - 542, The current
implementation in the block that opens implementation_plan.json unconditionally
overwrites subtask["notes"] when marking a subtask failed (in the loop
referencing self.spec_dir, subtask_id and plan["phases"]/subtasks), which loses
prior notes; change it to preserve existing notes by prepending or appending the
"Marked as stuck: {reason}" text to the existing subtask.get("notes", "") (e.g.,
build a new_notes variable that combines the new marker and the prior notes with
a separator only if prior notes exist), then set subtask["notes"] = new_notes
before writing the updated plan back to disk, leaving the rest of the error
handling (logger.warning on exceptions) intact.

@AndyMik90 AndyMik90 self-assigned this Feb 16, 2026
Copy link
Owner Author

@AndyMik90 AndyMik90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Auto Claude PR Review

Merge Verdict: 🟠 NEEDS REVISION

🟠 Needs revision - 2 issue(s) require attention.

2 issue(s) must be addressed (0 required, 2 recommended)

Risk Assessment

Factor Level Notes
Complexity High Based on lines changed
Security Impact None Based on security findings
Scope Coherence Good Based on structural review

Findings Summary

  • Medium: 2 issue(s)

Generated by Auto Claude PR Review

Findings (2 selected of 2 total)

🟡 [2d0df14a360e] [MEDIUM] Duplicated stuck-subtask-loading logic within same file

📁 apps/backend/core/progress.py:136

The new is_build_ready_for_qa() (lines 136-149) duplicates the exact same ~15 lines of code that already exist in get_next_subtask() (lines 478-493) within the same file. Both load stuck subtask IDs from spec_dir / 'memory' / 'attempt_history.json' with identical try/except/set-comprehension logic. This should be extracted into a private helper like _load_stuck_subtask_ids(spec_dir: Path) -> set[str] to follow DRY principles and ensure any future changes to stuck-subtask detection are made in one place. | The new is_build_ready_for_qa() function (lines 136-149) contains identical stuck-subtask-ID loading logic as the existing get_next_subtask() function (lines 478-493) in the same file. Both load attempt_history.json, parse stuck_subtasks, filter by subtask_id key presence, and return a set. This is a prime candidate for a shared private helper like _load_stuck_subtask_ids(spec_dir: Path) -> set[str] to reduce duplication within this module.

Suggested fix:

Extract a private helper at the top of the file:

def _load_stuck_subtask_ids(spec_dir: Path) -> set[str]:
    """Load IDs of subtasks marked as stuck from attempt_history.json."""
    attempt_history_file = spec_dir / "memory" / "attempt_history.json"
    if not attempt_history_file.exists():
        return set()
    try:
        with open(attempt_history_file, encoding="utf-8") as f:
            attempt_history = json.load(f)
        return {
            entry["subtask_id"]
            for entry in attempt_history.get("stuck_subtasks", [])
            if "subtask_id" in entry
        }
    except (OSError, json.JSONDecodeError, UnicodeDecodeError):
        return set()

Then use it in both `is_build_ready_for_qa()` and `get_next_subtask()`.

🟡 [193ab14ede41] [MEDIUM] Plan file written with raw json.dump instead of write_json_atomic

📁 apps/backend/services/recovery.py:537

The new code writes implementation_plan.json using raw open() + json.dump(), but the established codebase pattern for writing this specific file is to use write_json_atomic from core.file_utils. This utility prevents file corruption by writing to a temp file first and atomically replacing. Searched for write_json_atomic usage — found 8+ call sites across agents/session.py, agents/tools_pkg/tools/subtask.py, agents/tools_pkg/tools/qa.py, implementation_plan/plan.py, and spec/validate_pkg/auto_fix.py — all using it for plan file writes. The recovery.py file's own internal files (attempt_history.json, build_commits.json) use raw writes which is fine since they're less critical, but implementation_plan.json is a shared coordination file read by multiple subsystems and should use the atomic write pattern.

Suggested fix:

Import and use write_json_atomic:

from core.file_utils import write_json_atomic

# Then replace the write block:
if updated:
    write_json_atomic(plan_file, plan, indent=2)

This review was generated by Auto Claude.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/backend This is backend only bug Something isn't working size/L Large (500-999 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant