Skip to content

feat(reliability): graceful degradation for non-critical agent failures#243

Merged
snipcodeit merged 3 commits intomainfrom
issue/234-implement-graceful-degradation-for-n
Mar 6, 2026
Merged

feat(reliability): graceful degradation for non-critical agent failures#243
snipcodeit merged 3 commits intomainfrom
issue/234-implement-graceful-degradation-for-n

Conversation

@snipcodeit
Copy link
Owner

Summary

  • Classify all 9 Task() spawn points in mgw:run pipeline as critical or advisory
  • Advisory agents (comment classifier, plan-checker, verifier) now degrade gracefully on failure: log warning and continue pipeline
  • Critical agents (planner, executor, PR creator) trigger existing retry/fallback/dead-letter flow
  • Add lib/agent-criticality.cjs module with spawn point classification, wrapAdvisoryAgent() wrapper, and isAdvisory()/isCritical() query functions

Closes #234

Milestone Context

  • Milestone: v8 — Agent Reliability & Failure Recovery
  • Phase: 45 — Retry Policies & Fallback Strategies
  • Issue: 6 of 9 in milestone

Changes

New Files

  • lib/agent-criticality.cjs — Agent spawn point criticality classification module
    • CRITICALITY_MAP: maps all 9 spawn points to critical/advisory
    • isAdvisory(spawnPoint) / isCritical(spawnPoint): query functions
    • wrapAdvisoryAgent(fn, spawnPoint, context): wraps async functions with graceful degradation for advisory agents
    • AdvisoryAgentWarning: error subclass for degraded advisory failures
    • Safe import of lib/retry-policy.cjs (dependency from Implement configurable retry policy engine for agent failures #232, may not be merged)

Modified Files

  • commands/run/triage.md — Added <!-- mgw:criticality=advisory --> annotation to comment classifier spawn with degradation pattern
  • commands/run/execute.md — Added criticality annotations to all 7 Task() spawns; updated retry loop to distinguish advisory (warn-and-continue) from critical (retry/dead-letter) failures
  • commands/run/pr-create.md — Added <!-- mgw:criticality=critical --> annotation to PR creator spawn

Classification Matrix

Spawn Point Agent Type Criticality
comment-classifier general-purpose advisory
planner gsd-planner critical
plan-checker gsd-plan-checker advisory
executor gsd-executor critical
verifier gsd-verifier advisory
milestone-planner gsd-planner critical
milestone-executor gsd-executor critical
milestone-verifier gsd-verifier advisory
pr-creator general-purpose critical

Test Plan

  • node -e "require('./lib/agent-criticality.cjs')" exits 0
  • node -e "const ac = require('./lib/agent-criticality.cjs'); console.log(ac.isAdvisory('comment-classifier'), ac.isCritical('executor'))" outputs true true
  • grep -rn 'mgw:criticality' commands/run/ returns 9 matches (4 advisory, 5 critical)
  • Unknown spawn points default to critical: node -e "console.log(require('./lib/agent-criticality.cjs').isCritical('unknown'))" outputs true
  • Module loads even without lib/retry-policy.cjs present (safe import)
  • Verify no functional change when all agents succeed (criticality only affects failure paths)

Stephen Miller and others added 2 commits March 6, 2026 01:26
Classify each Task() spawn point as critical (blocks pipeline) or
advisory (pipeline can continue without). Advisory agents include
comment classifier, plan-checker, and verifier. Critical agents
include planner, executor, and PR creator. Unknown spawn points
default to critical (fail-safe).

Provides wrapAdvisoryAgent() wrapper that catches errors for advisory
agents and logs warnings instead of halting the pipeline.

Closes #234 (partial)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ints

Annotate all 9 Task() spawn points in mgw:run pipeline with
<!-- mgw:criticality=critical|advisory --> comments that classify
each agent's impact on pipeline continuity.

Advisory agents (4): comment-classifier, plan-checker, verifier,
milestone-verifier. These provide quality-of-life improvements but
the pipeline can produce a valid PR without their output.

Critical agents (5): planner, executor, milestone-planner,
milestone-executor, pr-creator. These produce artifacts the pipeline
cannot proceed without.

The retry loop in execute.md now checks agent criticality before
deciding between retry/dead-letter (critical) or warn-and-continue
(advisory). Uses lib/agent-criticality.cjs for classification.

Closes #234 (partial)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@github-actions github-actions bot added slash-commands Changes to slash command files core Changes to core library labels Mar 6, 2026
Resolve conflicts in command files: combine diagnostic hooks (from #240)
with criticality annotations (from #243). Both additions are complementary.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@snipcodeit snipcodeit merged commit 7fc1229 into main Mar 6, 2026
4 of 5 checks passed
@snipcodeit snipcodeit deleted the issue/234-implement-graceful-degradation-for-n branch March 6, 2026 07:43
snipcodeit pushed a commit that referenced this pull request Mar 6, 2026
…ine-ex

Resolve conflicts in command files: combine checkpoint writes (from #244)
with diagnostic hooks and criticality annotations (from #240, #243).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to core library slash-commands Changes to slash command files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement graceful degradation for non-critical agent failures

1 participant