Skip to content

fix: scope OpenSpec sentinel per-change to prevent stale task queue#151

Open
vishnujayvel wants to merge 1 commit intoasklokesh:mainfrom
vishnujayvel:fix/openspec-stale-queue
Open

fix: scope OpenSpec sentinel per-change to prevent stale task queue#151
vishnujayvel wants to merge 1 commit intoasklokesh:mainfrom
vishnujayvel:fix/openspec-stale-queue

Conversation

@vishnujayvel
Copy link
Copy Markdown

Problem

When using OpenSpec with Loki Mode across multiple change proposals in the same repository, the OpenSpec task queue from a previous change persists into subsequent Loki runs -- even when the new run targets a completely different change or uses no --openspec flag at all.

This causes agents to silently work on wrong tasks, wasting tokens and creating incorrect PRs.

Root cause (3 parts)

  1. Boolean sentinel: .loki/queue/.openspec-populated is a touch file -- it knows whether OpenSpec tasks were loaded, but not which change they came from. populate_openspec_queue() in run.sh (line 8744) checks only [[ -f ".loki/queue/.openspec-populated" ]].

  2. No cleanup between runs: The CLI (autonomy/loki) overwrites openspec-tasks.json when --openspec is provided, but does NOT clear the sentinel or remove stale tasks from pending.json. When --openspec is absent entirely, nothing cleans up leftover OpenSpec state.

  3. Task ID collisions: All OpenSpec changes produce IDs in openspec-N.M format (e.g., openspec-1.1, openspec-2.3). The deduplication check in populate_openspec_queue() uses these IDs, so when switching changes, new tasks with colliding IDs are silently blocked from loading.

Reproduction

# Run 1: Load 55 tasks from change A
loki start --openspec ./openspec/changes/window-management-redesign

# Run 2: Different task entirely, no --openspec
loki start /tmp/settings-menu-prd.md
# BUG: 55 stale window-management tasks still in pending.json, served to agents

Impact

  • Severity: High -- agents silently work on wrong tasks
  • Frequency: Every time a user runs Loki more than once in a repo that has used --openspec
  • Workaround: Manually delete .loki/queue/, .loki/openspec-tasks.json, and .loki/openspec/ between runs

Solution

Three-pronged fix addressing each root cause:

Fix 1: Scoped sentinel (autonomy/run.sh)

The sentinel file now stores the full change path instead of being an empty marker. populate_openspec_queue() reads the stored path and compares it against the current $OPENSPEC_CHANGE_PATH:

  • Match: Skip (same change, already populated)
  • Mismatch: Purge all source: "openspec" tasks from pending.json, then repopulate
  • Missing: Fresh population (first run)
# Before (boolean check)
if [[ -f ".loki/queue/.openspec-populated" ]]; then
    return 0  # always skips, regardless of which change
fi
touch ".loki/queue/.openspec-populated"

# After (scoped check)
stored_change="$(cat ".loki/queue/.openspec-populated" 2>/dev/null)"
if [[ "$stored_change" == "$OPENSPEC_CHANGE_PATH" ]]; then
    return 0  # only skips if same change
fi
# ... purge stale tasks, then repopulate ...
echo "$OPENSPEC_CHANGE_PATH" > ".loki/queue/.openspec-populated"

Fix 2: Stale state cleanup (autonomy/loki)

Two new cleanup paths in cmd_start():

  • When --openspec IS provided: Clear the sentinel before running the adapter, so run.sh will repopulate for the current change
  • When --openspec is NOT provided: Proactively remove all leftover OpenSpec artifacts (sentinel, tasks JSON, normalized PRD, delta context directory) and purge any stale OpenSpec tasks from pending.json

Fix 3: Change-scoped task IDs (autonomy/openspec-adapter.py)

parse_tasks() now accepts a change_name parameter and produces IDs in the format openspec-{change_name}-N.M instead of openspec-N.M. This prevents cross-change collisions at the deduplication layer.

# Before
task_id = f"openspec-{task_id_num}"  # openspec-1.1

# After
task_id = f"openspec-{change_name}-{task_id_num}"  # openspec-add-dark-mode-1.1

Files changed (5)

File Lines Change
autonomy/run.sh +33/-5 Sentinel stores change path; compares on read; purges stale tasks on mismatch
autonomy/loki +30/+0 Clears sentinel before adapter; full OpenSpec cleanup when --openspec absent
autonomy/openspec-adapter.py +10/-3 parse_tasks() accepts change_name, scopes task IDs per-change
tests/test_openspec_adapter.py +13/-5 Updated test_task_ids_hierarchical for new ID format
skills/openspec-integration.md +8/-4 Updated documentation examples with new ID format

Backward compatibility

The purge strategy filters by source == "openspec" (a metadata field), not by ID prefix. This means:

  • Old-format tasks (openspec-N.M) are correctly purged
  • New-format tasks (openspec-{name}-N.M) are correctly purged
  • No ID format parsing is needed during purge

When change_name is empty (e.g., direct parse_tasks() call without context), the old openspec-N.M format is preserved as a fallback.

Test plan

  • bash -n autonomy/run.sh -- shell syntax valid
  • bash -n autonomy/loki -- shell syntax valid
  • python3 -c "import ast; ast.parse(...)" -- Python syntax valid
  • pytest tests/test_openspec_adapter.py -- 28/28 tests pass
  • test_task_ids_hierarchical updated and passes with new ID format
  • Manual: loki start --openspec ./changes/A then loki start --openspec ./changes/B -- verify queue purged and repopulated
  • Manual: loki start --openspec ./changes/A then loki start prd.md (no --openspec) -- verify all OpenSpec artifacts cleaned up

Related


Generated with Claude Code

@vishnujayvel vishnujayvel requested a review from asklokesh as a code owner April 6, 2026 21:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

All contributors have signed the CLA. Thank you.
Posted by the CLA Assistant Lite bot.

@vishnujayvel vishnujayvel force-pushed the fix/openspec-stale-queue branch from b28c63f to 53ff1d7 Compare April 6, 2026 21:22
@vishnujayvel
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@vishnujayvel
Copy link
Copy Markdown
Author

Decision Tree: Sentinel Logic Across All Scenarios

This diagram shows how the fix handles every combination of --openspec presence, sentinel state, and restart conditions. Each leaf node shows the outcome and why it's correct.

flowchart TD
    START["loki start invoked"] --> HAS_FLAG{"--openspec\nprovided?"}

    %% === No --openspec branch ===
    HAS_FLAG -->|No| CLEANUP["CLI cleanup block\n(autonomy/loki:941-965)"]
    CLEANUP --> DEL_SENTINEL["Delete sentinel\nDelete openspec-tasks.json\nDelete openspec/ dir"]
    DEL_SENTINEL --> PURGE_PENDING["Purge source:openspec\nfrom pending.json"]
    PURGE_PENDING --> RUN_SH_NO["run.sh starts\npopulate_openspec_queue()"]
    RUN_SH_NO --> CHECK_TASKS_FILE{"openspec-tasks.json\nexists?"}
    CHECK_TASKS_FILE -->|"No (deleted)"| SKIP_CLEAN["Return early\n--- No stale tasks served ---"]

    %% === --openspec provided branch ===
    HAS_FLAG -->|Yes| EXPORT["Export OPENSPEC_CHANGE_PATH\nRun adapter (regenerates artifacts)"]
    EXPORT --> RUN_SH_YES["run.sh starts\npopulate_openspec_queue()"]
    RUN_SH_YES --> CHECK_TASKS_YES{"openspec-tasks.json\nexists?"}
    CHECK_TASKS_YES -->|No| SKIP_NO_TASKS["Return early\n(adapter error?)"]
    CHECK_TASKS_YES -->|Yes| CHECK_SENTINEL{"Sentinel file\nexists?"}

    %% --- No sentinel (first ever run) ---
    CHECK_SENTINEL -->|No| POPULATE["Populate pending.json\nfrom openspec-tasks.json"]
    POPULATE --> WRITE_SENTINEL["Write change path\nto sentinel"]
    WRITE_SENTINEL --> DONE_FRESH["--- Fresh population done ---"]

    %% --- Sentinel exists ---
    CHECK_SENTINEL -->|Yes| READ_SENTINEL["Read stored path\nfrom sentinel"]
    READ_SENTINEL --> COMPARE{"Stored path ==\ncurrent path?"}

    %% Same change (crash-restart)
    COMPARE -->|"Match\n(same change)"| SKIP_RESUME["Skip repopulation\n--- Progress preserved ---\nCompleted tasks stay gone\nfrom pending.json"]

    %% Different change
    COMPARE -->|"Mismatch\n(change switched)"| PURGE_OLD["Purge all source:openspec\ntasks from pending.json"]
    PURGE_OLD --> REPOPULATE["Repopulate from new\nopenspec-tasks.json"]
    REPOPULATE --> UPDATE_SENTINEL["Overwrite sentinel\nwith new path"]
    UPDATE_SENTINEL --> DONE_SWITCH["--- Clean switch done ---\nOld tasks gone, new loaded"]

    %% Styling
    style SKIP_CLEAN fill:#2d6a2d,color:#fff
    style DONE_FRESH fill:#2d6a2d,color:#fff
    style SKIP_RESUME fill:#2d6a2d,color:#fff
    style DONE_SWITCH fill:#2d6a2d,color:#fff
    style SKIP_NO_TASKS fill:#8b6914,color:#fff
Loading

Scenario Coverage Matrix

# Scenario Sentinel before Sentinel after pending.json effect Correct?
1 First run with --openspec A Missing path/to/A 55 tasks loaded Yes
2 Crash-restart, same --openspec A path/to/A path/to/A (unchanged) Untouched -- 35 remaining tasks preserved, 20 completed stay gone Yes
3 Switch to --openspec B path/to/A path/to/B Old A tasks purged, new B tasks loaded Yes
4 No --openspec after previous run path/to/A Deleted All openspec tasks purged from queue Yes
5 No --openspec, never used openspec Missing Missing No-op (openspec-tasks.json doesn't exist) Yes
6 Direct run.sh invocation (bypass CLI) Any Handled by run.sh Scoped comparison still works standalone Yes

Why the sentinel is NOT deleted when --openspec is provided

The sentinel doubles as a progress checkpoint. On crash-restart (scenario 2):

Run 1: 55 tasks loaded → 20 completed (removed from pending.json) → crash
Run 2: Same --openspec flag
  - Sentinel matches → skip repopulation
  - pending.json still has 35 remaining tasks
  - The 20 completed tasks are NOT re-added

If we deleted the sentinel in the CLI (our earlier approach), run.sh would repopulate from openspec-tasks.json (which still lists all 55 as pending). The 20 completed tasks would be re-queued because their IDs are no longer in pending.json's dedup set. Agents would redo completed work.

Task ID collision prevention

Old format: openspec-1.1 (collides across changes)
New format: openspec-{change-name}-1.1 (scoped per change)

Even if the sentinel logic were bypassed somehow, ID-level scoping prevents cross-change task confusion at the dedup layer.

@asklokesh
Copy link
Copy Markdown
Owner

Hey @vishnujayvel -- great contribution! The root cause analysis is thorough, the three-pronged fix is well-structured, and the decision tree comment you posted is genuinely excellent documentation. The crash-restart fix in the second commit shows good attention to edge cases. Thank you for taking this on.

That said, I found a few issues during review that I'd like addressed before merging. Tagging you to take a look.


Requested Changes

1. Purge only targets pending.json, misses completed/in-progress (bug)

The Python purge blocks only filter pending.json. If OpenSpec tasks were already moved to completed.json or in-progress.json during a previous run, those ghost entries survive the change switch. Status reports and progress tracking would show phantom completed tasks from a completely different change.

The purge needs to clean all three queues (pending, completed, in-progress), not just pending.

2. No way to reload tasks after editing tasks.md for the same change (design gap)

The CLI always re-runs the adapter (regenerating openspec-tasks.json), then run.sh checks the sentinel, sees the same path, and skips loading. If a user edits tasks.md within the same change directory and re-runs loki start --openspec ./changes/A, the adapter produces updated output but it never gets loaded into pending.json. Edited tasks are silently ignored.

Your scenario matrix (scenario 2) accounts for crash-restart, but not intentional re-runs after editing. Consider either:

  • Adding a --force-reload flag that deletes the sentinel before run.sh starts
  • Storing a hash of openspec-tasks.json alongside the path in the sentinel, so content changes are detected

3. 2>/dev/null silently swallows purge failures (serious)

Both purge blocks redirect stderr to /dev/null. If python3 isn't on PATH, or pending.json contains malformed JSON, the purge silently fails. Stale tasks remain in the queue and agents work on wrong tasks -- the exact bug this PR is fixing. The safety net has a hole in it.

At minimum, let errors through to stderr so failures are visible:

# Instead of:
..." 2>/dev/null

# Use:
..." || log_warn "OpenSpec queue cleanup encountered an error"

4. Duplicated purge logic will diverge

The Python purge snippet appears in both autonomy/loki (lines 949-963) and autonomy/run.sh (lines 8758-8770). They're nearly identical but already differ in path construction (one uses os.path.join with an env var, the other uses a hardcoded relative path). Consider extracting to a shared script like autonomy/openspec-purge-queue.py that both locations call.


Suggestions (non-blocking)

  • rm -rf "$LOKI_DIR/openspec/" when --openspec is absent nukes all adapter output (normalized PRD, design docs, scenarios). If a user runs a quick non-openspec task then comes back to the same change, everything must be regenerated. The sentinel deletion already prevents stale task loading; consider keeping the adapter output as a cache.

  • The file=__import__('sys').stderr pattern in inline Python is clever but hard to read. A plain import sys; print(..., file=sys.stderr) is clearer.

  • The PR mentions BMAD and MiroFish have the same sentinel pattern. Worth filing follow-up issues so they don't get forgotten.


Overall this is solid work on a high-severity bug. Looking forward to the next revision. Thanks again for the contribution!

…e-wide purge

Problem: The OpenSpec sentinel (.openspec-populated) was a boolean touch
file that tracked whether tasks were loaded, but not which change or
content. Switching between OpenSpec changes or editing tasks.md left
stale tasks in the queue. Agents would silently execute the wrong plan.

Fix (three layers of defense):

1. Scoped sentinel with content hash -- sentinel stores change path
   (line 1) and md5 hash of openspec-tasks.json (line 2). Detects both
   change switches and tasks.md edits. Backward compatible with old
   single-line format (triggers safe reload on upgrade).

2. Queue-wide jq purge -- new purge_openspec_from_queue() function
   cleans all three queue files (pending, completed, in-progress) using
   jq. Replaces inline Python, removes python3 dependency for purge
   path. Errors surface via log_warn instead of being swallowed.

3. Scoped task IDs -- task IDs now include the change name
   (openspec-{change}-N.M) to prevent dedup collisions across changes.

Design decision: when --openspec is not passed, existing OpenSpec state
is left untouched. Purge only triggers when --openspec IS passed with a
different change path or different content hash.

Tests: 31 shell integration tests (test-openspec-sentinel.sh) covering
all state transitions + 28 existing adapter unit tests passing.

Closes: asklokesh#150
Follow-ups filed: asklokesh#154 (BMAD sentinel), asklokesh#155 (MiroFish sentinel)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@vishnujayvel vishnujayvel force-pushed the fix/openspec-stale-queue branch from 16669f1 to 5029fa6 Compare April 8, 2026 23:18
@vishnujayvel
Copy link
Copy Markdown
Author

Review Response (v2)

Thanks for the thorough review @asklokesh! All 4 requested changes addressed, rebased on v6.75.3, squashed to a single commit.


1. Purge now cleans all 3 queue files

New purge_openspec_from_queue() function (jq-based, no Python dependency) is called on pending.json, completed.json, and in-progress.json. No more ghost tasks in status reports or progress tracking.

2. Content hash detects tasks.md edits

Sentinel now stores two lines: change path (line 1) and md5 hash of openspec-tasks.json (line 2). Editing tasks.md and re-running with the same --openspec path triggers automatic purge + reload.

  • Cross-platform: md5sum (Linux) / md5 -q (macOS) / "none" fallback
  • Backward compatible: old single-line sentinels trigger a safe reload (empty hash != any real hash)

3. No more silent error swallowing

Replaced all inline Python purge blocks with the jq-based shell function. jq stderr goes to a separate .err file (not mixed with JSON output). Errors surface via log_warn. No 2>/dev/null on purge paths.

4. Single purge function, no duplication

purge_openspec_from_queue() defined once in run.sh. The CLI cleanup block in autonomy/loki was removed entirely.

Design decision: When --openspec is not passed, existing state is left untouched. populate_openspec_queue() has an early-return guard for empty OPENSPEC_CHANGE_PATH. Not passing a flag does not imply "undo previous work."

Non-blocking suggestions addressed

Suggestion Response
Keep adapter output as cache Addressed -- no-openspec runs leave all state untouched
__import__('sys') readability Moot -- inline Python removed, replaced with jq
File BMAD/MiroFish issues Done: #154 (BMAD), #155 (MiroFish)

Additional improvements (beyond requested)

  • Documentation: Updated skills/openspec-integration.md with Queue State Management section (sentinel format, state transitions table, purge behavior) and corrected complexity threshold (21+/11+ matching > 20/> 10 in code)
  • Shell integration tests: Added tests/test-openspec-sentinel.sh (36 tests, 13 scenarios) following the existing test-task-queue.sh pattern

Test coverage

28 unit tests (tests/test_openspec_adapter.py) + 36 integration tests (tests/test-openspec-sentinel.sh) = 64 total, all passing

Scenarios covered:

# Scenario Tests
1 Fresh run (no sentinel) 3
2 Crash-restart (path + hash match) 2
3 Change switch (purge all 3 queues) 9
4 Content edit (hash mismatch) 1
5 No --openspec (don't touch) 3
6 Direct run.sh (bypass CLI) 1
7 Legacy sentinel (backward compat) 2
8 Malformed JSON error handling 2
9 Empty queue files 3
10 Nonexistent queue file 3
11 Task ID scoping 1
12 Mixed-source preservation 3
13 Empty OPENSPEC_CHANGE_PATH 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants