Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions commands/run/execute.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Only run this step if gsd_route matches any of these (prefixed or unprefixed):
- `quick --full` or `gsd:quick --full`
- `plan-phase` or `gsd:plan-phase`

`plan-phase` follows the same lifecycle as `quick --full` (init -> plan -> check -> execute -> verify -> publish) so it is handled here with FULL_MODE forced on.

`plan-phase` follows the same lifecycle as `quick --full` (init → plan → check → execute → verify → publish) so it is handled here with FULL_MODE forced on.

**Retry loop initialization:**
Expand Down Expand Up @@ -175,6 +177,25 @@ dh.afterAgentSpawn({
" 2>/dev/null || true
```

4. **Checkpoint: record plan completion (atomic write):**
```bash
# Checkpoint: record plan completion and set resume to plan-check or execution.
# All checkpoint writes use atomicWriteJson() — write to .tmp then rename.
node -e "
const { updateCheckpoint } = require('${REPO_ROOT}/lib/state.cjs');
updateCheckpoint(${ISSUE_NUMBER}, {
pipeline_step: 'plan',
step_progress: { plan_path: '${QUICK_DIR}/${next_num}-PLAN.md', plan_checked: false, revision_count: 0 },
last_agent_output: '${QUICK_DIR}/${next_num}-PLAN.md',
artifacts: [{ path: '${QUICK_DIR}/${next_num}-PLAN.md', type: 'plan', created_at: new Date().toISOString() }],
step_history: [{ step: 'plan', completed_at: new Date().toISOString(), agent_type: 'gsd-planner', output_path: '${QUICK_DIR}/${next_num}-PLAN.md' }],
resume: { action: '${FULL_MODE ? \"run-plan-checker\" : \"spawn-executor\"}', context: { quick_dir: '${QUICK_DIR}', plan_num: ${next_num} } }
});
" 2>/dev/null || true
```

4b. **Publish plan comment (non-blocking):**

4b. **Publish plan comment (non-blocking):**

4. **Publish plan comment (non-blocking):**
Expand Down Expand Up @@ -359,6 +380,24 @@ dh.afterAgentSpawn({
" 2>/dev/null || true
```

8. **Checkpoint: record execution completion (atomic write):**
```bash
# Checkpoint: record execution completion and set resume to verification or PR creation.
node -e "
const { updateCheckpoint } = require('${REPO_ROOT}/lib/state.cjs');
updateCheckpoint(${ISSUE_NUMBER}, {
pipeline_step: 'execute',
step_progress: { tasks_completed: ${TASK_COUNT}, tasks_total: ${TASK_COUNT} },
last_agent_output: '${QUICK_DIR}/${next_num}-SUMMARY.md',
artifacts: [{ path: '${QUICK_DIR}/${next_num}-SUMMARY.md', type: 'summary', created_at: new Date().toISOString() }],
step_history: [{ step: 'execute', completed_at: new Date().toISOString(), agent_type: 'gsd-executor', output_path: '${QUICK_DIR}/${next_num}-SUMMARY.md' }],
resume: { action: '${FULL_MODE ? \"spawn-verifier\" : \"create-pr\"}', context: { quick_dir: '${QUICK_DIR}', plan_num: ${next_num} } }
});
" 2>/dev/null || true
```

8b. **Publish summary comment (non-blocking):**

8b. **Publish summary comment (non-blocking):**

8. **Publish summary comment (non-blocking):**
Expand Down Expand Up @@ -461,6 +500,24 @@ dh.afterAgentSpawn({
" 2>/dev/null || true
```

10. **Checkpoint: record verification completion (atomic write, --full only):**
```bash
# Checkpoint: record verification completion and set resume to PR creation.
node -e "
const { updateCheckpoint } = require('${REPO_ROOT}/lib/state.cjs');
updateCheckpoint(${ISSUE_NUMBER}, {
pipeline_step: 'verify',
step_progress: { verification_path: '${QUICK_DIR}/${next_num}-VERIFICATION.md', must_haves_checked: true },
last_agent_output: '${QUICK_DIR}/${next_num}-VERIFICATION.md',
artifacts: [{ path: '${QUICK_DIR}/${next_num}-VERIFICATION.md', type: 'verification', created_at: new Date().toISOString() }],
step_history: [{ step: 'verify', completed_at: new Date().toISOString(), agent_type: 'gsd-verifier', output_path: '${QUICK_DIR}/${next_num}-VERIFICATION.md' }],
resume: { action: 'create-pr', context: { quick_dir: '${QUICK_DIR}', plan_num: ${next_num} } }
});
" 2>/dev/null || true
```

10b. **Publish verification comment (non-blocking, --full only):**

10b. **Publish verification comment (non-blocking, --full only):**

10. **Publish verification comment (non-blocking, --full only):**
Expand Down Expand Up @@ -817,6 +874,21 @@ fi
" 2>/dev/null || true
```

**b1b. Checkpoint: record milestone phase plan completion (atomic write):**
```bash
# Checkpoint: record milestone plan completion for this phase.
node -e "
const { updateCheckpoint } = require('${REPO_ROOT}/lib/state.cjs');
updateCheckpoint(${ISSUE_NUMBER}, {
pipeline_step: 'plan',
step_progress: { gsd_phase: ${PHASE_NUMBER}, plan_path: '${phase_dir}' },
last_agent_output: '${phase_dir}',
step_history: [{ step: 'plan', completed_at: new Date().toISOString(), agent_type: 'gsd-planner', output_path: '${phase_dir}' }],
resume: { action: 'execute-phase', context: { phase_number: ${PHASE_NUMBER}, phase_dir: '${phase_dir}' } }
});
" 2>/dev/null || true
```

**b2. Publish plan comment (non-blocking):**
```bash
PLAN_FILES=$(ls ${phase_dir}/*-PLAN.md 2>/dev/null)
Expand Down Expand Up @@ -909,6 +981,21 @@ fi
" 2>/dev/null || true
```

**d1b. Checkpoint: record milestone phase execution completion (atomic write):**
```bash
# Checkpoint: record milestone execution completion for this phase.
node -e "
const { updateCheckpoint } = require('${REPO_ROOT}/lib/state.cjs');
updateCheckpoint(${ISSUE_NUMBER}, {
pipeline_step: 'execute',
step_progress: { gsd_phase: ${PHASE_NUMBER} },
last_agent_output: '${phase_dir}',
step_history: [{ step: 'execute', completed_at: new Date().toISOString(), agent_type: 'gsd-executor', output_path: '${phase_dir}' }],
resume: { action: 'verify-phase', context: { phase_number: ${PHASE_NUMBER}, phase_dir: '${phase_dir}' } }
});
" 2>/dev/null || true
```

**d2. Publish summary comment (non-blocking):**
```bash
SUMMARY_FILES=$(ls ${phase_dir}/*-SUMMARY.md 2>/dev/null)
Expand Down Expand Up @@ -997,6 +1084,21 @@ fi
" 2>/dev/null || true
```

**e1b. Checkpoint: record milestone phase verification completion (atomic write):**
```bash
# Checkpoint: record milestone verification completion for this phase.
node -e "
const { updateCheckpoint } = require('${REPO_ROOT}/lib/state.cjs');
updateCheckpoint(${ISSUE_NUMBER}, {
pipeline_step: 'verify',
step_progress: { gsd_phase: ${PHASE_NUMBER}, verification_complete: true },
last_agent_output: '${phase_dir}',
step_history: [{ step: 'verify', completed_at: new Date().toISOString(), agent_type: 'gsd-verifier', output_path: '${phase_dir}' }],
resume: { action: 'next-phase', context: { completed_phase: ${PHASE_NUMBER}, phase_dir: '${phase_dir}' } }
});
" 2>/dev/null || true
```

**e2. Publish verification comment (non-blocking):**
```bash
VERIFICATION_FILES=$(ls ${phase_dir}/*-VERIFICATION.md 2>/dev/null)
Expand Down
14 changes: 14 additions & 0 deletions commands/run/pr-create.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,20 @@ dh.afterAgentSpawn({

Parse PR number and URL from agent response.

**Checkpoint: record PR creation (atomic write):**
```bash
# Checkpoint: record PR creation — final checkpoint before pipeline completion.
node -e "
const { updateCheckpoint } = require('${REPO_ROOT}/lib/state.cjs');
updateCheckpoint(${ISSUE_NUMBER}, {
pipeline_step: 'pr',
step_progress: { branch_pushed: true, pr_number: ${PR_NUMBER}, pr_url: '${PR_URL}' },
step_history: [{ step: 'pr', completed_at: new Date().toISOString(), agent_type: 'general-purpose', output_path: '${PR_URL}' }],
resume: { action: 'cleanup', context: { pr_number: ${PR_NUMBER}, pr_url: '${PR_URL}' } }
});
" 2>/dev/null || true
```

Update state (at `${REPO_ROOT}/.mgw/active/`):
- linked_pr = PR number
- pipeline_stage = "pr-created"
Expand Down
1 change: 1 addition & 0 deletions commands/run/triage.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ migrateProjectState();
# Checkpoint initialization — called once when pipeline execution begins.
# Sets pipeline_step to "triage" with route selection progress.
# Subsequent stages update the checkpoint via updateCheckpoint().
# All checkpoint writes are atomic (write to .tmp then rename).
node -e "
const { updateCheckpoint } = require('./lib/state.cjs');
updateCheckpoint(${ISSUE_NUMBER}, {
Expand Down
83 changes: 82 additions & 1 deletion commands/workflows/state.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,8 @@ File: `.mgw/active/<number>-<slug>.json`
"comments_posted": [],
"linked_pr": null,
"linked_issues": [],
"linked_branches": []
"linked_branches": [],
"checkpoint": null
}
```

Expand Down Expand Up @@ -436,6 +437,84 @@ blocked --> triaged (re-triage after blocker resolved)
Any stage --> failed (unrecoverable error)
```

## Pipeline Checkpoints

Fine-grained pipeline progress tracking within `.mgw/active/<number>-<slug>.json`.
The `checkpoint` field starts as `null` and is initialized when the pipeline first
transitions past triage. Each subsequent stage writes an atomic checkpoint update.

### Checkpoint Schema

```json
{
"checkpoint": {
"schema_version": 1,
"pipeline_step": "triage|plan|execute|verify|pr",
"step_progress": {},
"last_agent_output": null,
"artifacts": [],
"resume": { "action": null, "context": {} },
"started_at": "ISO timestamp",
"updated_at": "ISO timestamp",
"step_history": []
}
}
```

| Field | Type | Merge Strategy | Description |
|-------|------|---------------|-------------|
| `schema_version` | number | — | Checkpoint format version (currently 1) |
| `pipeline_step` | string | overwrite | Current pipeline step: `triage`, `plan`, `execute`, `verify`, `pr` |
| `step_progress` | object | shallow merge | Step-specific progress (e.g., `{ plan_path: "...", plan_checked: false }`) |
| `last_agent_output` | string\|null | overwrite | Path or URL of the last agent's output |
| `artifacts` | array | append-only | `[{ path, type, created_at }]` — never removed, only appended |
| `resume` | object | full replace | `{ action, context }` — what to do if pipeline restarts |
| `started_at` | string | — | ISO timestamp when checkpoint was first created |
| `updated_at` | string | auto | ISO timestamp of last update (set automatically) |
| `step_history` | array | append-only | `[{ step, completed_at, agent_type, output_path }]` — audit trail |

### Atomic Writes

All checkpoint writes use `atomicWriteJson()` from `lib/state.cjs`:

```bash
# atomicWriteJson(filePath, data) — write to .tmp then rename.
# POSIX rename is atomic on the same filesystem, so a crash mid-write
# never leaves a corrupt state file.
```

The `updateCheckpoint()` function uses `atomicWriteJson()` internally. Commands
should always use `updateCheckpoint()` rather than writing checkpoints directly:

```bash
node -e "
const { updateCheckpoint } = require('./lib/state.cjs');
updateCheckpoint(${ISSUE_NUMBER}, {
pipeline_step: 'plan',
step_progress: { plan_path: '...', plan_checked: false },
artifacts: [{ path: '...', type: 'plan', created_at: new Date().toISOString() }],
step_history: [{ step: 'plan', completed_at: new Date().toISOString(), agent_type: 'gsd-planner', output_path: '...' }],
resume: { action: 'spawn-executor', context: { quick_dir: '...' } }
});
" 2>/dev/null || true
```

### Checkpoint Lifecycle

| Pipeline Step | Checkpoint `pipeline_step` | Resume Action |
|--------------|---------------------------|---------------|
| Triage complete | `triage` | `begin-execution` |
| Planner complete | `plan` | `run-plan-checker` or `spawn-executor` |
| Executor complete | `execute` | `spawn-verifier` or `create-pr` |
| Verifier complete | `verify` | `create-pr` |
| PR created | `pr` | `cleanup` |

### Migration

`migrateProjectState()` adds the `checkpoint: null` field to any issue state
files that predate checkpoint support. The field is initialized lazily — it
stays `null` until the pipeline actually runs.

## Slug Generation

Use gsd-tools for consistent slug generation:
Expand Down Expand Up @@ -582,3 +661,5 @@ a `phase_number`. In this case, `/mgw:run` falls back to the quick pipeline.
| Project state | milestone.md, next.md, ask.md |
| Gate result schema | issue.md (populate), run.md (validate) |
| Board status sync | board-sync.md (utility), issue.md (triage transitions), run.md (pipeline transitions) |
| Checkpoint writes | triage.md (init), execute.md (plan/execute/verify), pr-create.md (pr) |
| Atomic writes | lib/state.cjs (`atomicWriteJson`, `updateCheckpoint`) |
37 changes: 34 additions & 3 deletions lib/state.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,9 @@

/**
* Create a new checkpoint object with default values.
* Called when pipeline execution begins (triage → executing transition).
* Called when pipeline execution begins (triage -> executing transition).

* Called when pipeline execution begins (triage → executing transition).
*
* @param {string} [pipelineStep='triage'] - Initial pipeline step
* @returns {object} Fresh checkpoint object
Expand All @@ -295,8 +297,33 @@
};
}

/**
* Write a JSON state file atomically: serialize to a .tmp sibling, then rename.
* This prevents corruption from interrupts (SIGINT, crash, context timeout) by
* ensuring the file is either fully written or not written at all.
*
* @param {string} filePath - Absolute path to the target JSON file
* @param {object} data - Object to serialize as JSON
* @throws {Error} If the write or rename fails
*/
function atomicWriteJson(filePath, data) {
const tmpPath = filePath + '.tmp';
const content = JSON.stringify(data, null, 2);

// Write to temporary file first
fs.writeFileSync(tmpPath, content, 'utf-8');

// Atomic rename (on POSIX, rename is atomic within the same filesystem)
fs.renameSync(tmpPath, filePath);
}

/**
* Merge checkpoint data into an active issue state file.
*
* Performs a shallow merge of the provided data onto the existing checkpoint
* object -- existing fields not present in `data` are preserved. The `artifacts`

* Merge checkpoint data into an active issue state file.
*
* Performs a shallow merge of the provided data onto the existing checkpoint
* object — existing fields not present in `data` are preserved. The `artifacts`
Expand All @@ -305,6 +332,9 @@
*
* If the issue has no checkpoint yet, one is initialized first via initCheckpoint().
*
* Writes are atomic by default: data is written to a .tmp file first, then
* renamed to the target path. This prevents corruption from interrupts.
*
* @param {number|string} issueNumber - Issue number to update
* @param {object} data - Partial checkpoint data to merge
* @param {string} [data.pipeline_step] - Current pipeline step
Expand All @@ -327,7 +357,7 @@
try {
entries = fs.readdirSync(activeDir);
} catch (err) {
throw new Error(`Cannot read active directory: ${err.message}`);

Check failure on line 360 in lib/state.cjs

View workflow job for this annotation

GitHub Actions / lint

There is no `cause` attached to the symptom error being thrown
}

const match = entries.find(f => f.startsWith(prefix) && f.endsWith('.json'));
Expand All @@ -340,7 +370,7 @@
try {
issueState = JSON.parse(fs.readFileSync(filePath, 'utf-8'));
} catch (err) {
throw new Error(`Cannot parse state file for #${issueNumber}: ${err.message}`);

Check failure on line 373 in lib/state.cjs

View workflow job for this annotation

GitHub Actions / lint

There is no `cause` attached to the symptom error being thrown
}

// Initialize checkpoint if it does not exist
Expand Down Expand Up @@ -381,9 +411,9 @@
// Always update the timestamp
cp.updated_at = new Date().toISOString();

// Write back
// Write back atomically (write to .tmp, then rename)
issueState.checkpoint = cp;
fs.writeFileSync(filePath, JSON.stringify(issueState, null, 2), 'utf-8');
atomicWriteJson(filePath, issueState);

return { updated: true, checkpoint: cp };
}
Expand Down Expand Up @@ -644,6 +674,7 @@
resolveActiveMilestoneIndex,
CHECKPOINT_SCHEMA_VERSION,
initCheckpoint,
atomicWriteJson,
updateCheckpoint,
loadCrossRefs,
VALID_LINK_TYPES,
Expand Down
Loading