Background
Issue #198 accepted a first-version product contract for a compact incident timeline that explains why an agent session became slow, expensive, failure-prone, or risky without requiring users to inspect raw logs first.
Evidence
User value
Users should be able to understand the main incident path in one glance before opening raw session details.
Adoption rationale
This improves daily debugging value by turning the generic health-score question into actionable evidence: where the run stalled, what repeated, what changed, and when burn diverged.
Suggested scope
- Add an incident timeline summary to the TUI/report surface, scoped to local post-run evidence.
- Include last visible milestone, longest idle gap, repeated failure loop, touched-surface summary, and burn divergence point when available.
- Keep the first view compact: one timeline strip plus a small number of evidence chips or report rows.
- Preserve existing local privacy behavior by summarizing or redacting sensitive paths/arguments where needed.
Non-goals
- Do not add hosted telemetry.
- Do not add live tracing while a model is streaming.
- Do not replace the health score; explain it.
- Do not introduce package/release promises.
Acceptance criteria
- TUI or exported report shows a compact incident timeline summary for sessions with relevant evidence.
- Empty or low-signal sessions degrade gracefully without noisy placeholders.
- The output remains deterministic enough for existing report/CI checks.
- Local-first privacy assumptions are preserved.
Suggested lane
lane/product
Risk
Medium. The main risk is overloading the first screen. Keep the first version compact and evidence-led.
Source
Follow-up split from #198 and Discussion #2 feedback: #2 (comment)
Background
Issue #198 accepted a first-version product contract for a compact incident timeline that explains why an agent session became slow, expensive, failure-prone, or risky without requiring users to inspect raw logs first.
Evidence
User value
Users should be able to understand the main incident path in one glance before opening raw session details.
Adoption rationale
This improves daily debugging value by turning the generic health-score question into actionable evidence: where the run stalled, what repeated, what changed, and when burn diverged.
Suggested scope
Non-goals
Acceptance criteria
Suggested lane
lane/product
Risk
Medium. The main risk is overloading the first screen. Keep the first version compact and evidence-led.
Source
Follow-up split from #198 and Discussion #2 feedback: #2 (comment)