-
-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
backendblocked-by:#140Blocked by issue #140Blocked by issue #140phase:35-retry-architecturePhase 35: Retry ArchitecturePhase 35: Retry ArchitecturereliabilityError handling and retry logicError handling and retry logic
Milestone
Description
Context
Phase 35 — Retry Architecture, integration. After #140 creates retry.cjs, this issue wires it into the two pipeline commands that handle failures. The goal: a transient GitHub API failure no longer immediately kills the pipeline; it retries up to 3 times with backoff. Permanent failures and needs-info cases still surface to the user immediately but with a failure class in the comment so they know whether to wait and retry or fix the issue.
What Already Exists
lib/retry.cjs— created by Create lib/retry.cjs with backoff and failure taxonomy #140; exports: classifyFailure, canRetry, incrementRetry, resetRetryState, getBackoffMscommands/run.md(1282 lines) — failure handling: sets pipeline_stage to failed, applies pipeline-failed label viagh issue edit ${ISSUE_NUMBER} --add-label "pipeline-failed"; no retry logic; no failure classification in posted commentscommands/milestone.md(952 lines) — failed-issue recovery at ~line 907: shows Milestone NOT closed message with results table; no automated retry, no retry_count tracking, no backoff.mgw/active/*.json— issue state files; will gain retry_count, last_failure_class, dead_letter fields after this issuelib/state.cjsmigrateProjectState() (lines 145-178) — will need a migration path to add retry_count: 0, dead_letter: false to existing active issue files
Description
Integrate lib/retry.cjs into commands/run.md and commands/milestone.md.
Technical Approach
- run.md validate_and_load step: add --retry flag handler; if retry.dead_letter === true and --retry flag: call resetRetryState(), clear pipeline-failed label, re-queue
- run.md failure handling: wrap the execute step in retry loop using canRetry() + getBackoffMs(); on failure call classifyFailure(); if transient and canRetry(): sleep(backoff), incrementRetry(), loop; if permanent or retry exhausted: set dead_letter=true, post comment with failure class
- milestone.md recovery prompt: surface failure_class from active issue state in the results table; "Retry" option calls resetRetryState() then re-invokes run for that issue
- migrateProjectState() in state.cjs: add migration step that sets retry_count: 0, dead_letter: false on active issues missing these fields
Done When
-
commands/run.mdretries transient failures up to 3 times with exponential backoff before marking dead_letter=true -
commands/run.mdfailure comment includes failure_class (transient/permanent/needs-info) from classifyFailure() -
commands/milestone.mdrecovery display shows failure_class for each failed issue; Retry option calls resetRetryState() -
lib/state.cjsmigrateProjectState() sets retry_count: 0 and dead_letter: false on active issue files that lack these fields - After integration,
--retryflag in run.md clears dead_letter state and re-queues the issue
GSD Route
Quick task or GSD Phase 35 plan.
Phase Context
Phase 35 of 36 — Retry Architecture. Issues: #140 (blocks this), #141 (this).
Depends on
#140 — lib/retry.cjs must exist before integration.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
backendblocked-by:#140Blocked by issue #140Blocked by issue #140phase:35-retry-architecturePhase 35: Retry ArchitecturePhase 35: Retry ArchitecturereliabilityError handling and retry logicError handling and retry logic