Skip to content

fix: flush session traces to disk on crash#594

Open
anandgupta42 wants to merge 2 commits intomainfrom
fix/crash-trace-flush
Open

fix: flush session traces to disk on crash#594
anandgupta42 wants to merge 2 commits intomainfrom
fix/crash-trace-flush

Conversation

@anandgupta42
Copy link
Copy Markdown
Contributor

@anandgupta42 anandgupta42 commented Mar 30, 2026

What does this PR do?

Fixes session trace loss when altimate-code crashes. When the TUI worker process dies unexpectedly, all in-memory traces are now flushed to disk synchronously so users can find crash logs in subsequent sessions.

Key discovery: Bun Workers don't receive OS signals (SIGINT/SIGTERM/SIGHUP) — only the main thread does. The fix uses a 3-layer approach:

  • thread.ts (main thread): Signal handlers call worker.terminate() + Bun.sleepSync(50) to trigger the worker's exit event
  • worker.ts: process.once("exit") handler flushes all active traces via flushAllTracesSync() with idempotency guard
  • index.ts: Safety net for headless mode — flushes Trace.active on uncaughtException and SIGHUP

Upstream context: sst/opencode has 60+ open crash-related issues (anomalyco/opencode#19023, anomalyco/opencode#14291, anomalyco/opencode#12767).

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Issue for this PR

Closes #593
Related: #588

How did you verify your code works?

  • Empirically verified Bun Worker signal behavior (workers don't receive SIGINT/SIGTERM/SIGHUP)
  • Verified worker.terminate() triggers worker's exit event handler
  • Verified Bun.sleepSync(50) gives worker time to flush before main thread exits
  • Upstream marker check: passes (no upstream files modified)
  • Typecheck: passes (5/5 packages)
  • 6-model consensus code review: all 6 approved (Claude, GPT 5.2 Codex, Gemini 3.1 Pro, Kimi K2.5, MiniMax M2.5, GLM-5)

Checklist

  • My code follows the project's coding standards
  • I have performed a self-review of my code
  • New and existing tests pass locally
  • I have added altimate_change markers for upstream-shared files

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Enhanced application shutdown handling to ensure diagnostic traces are properly preserved during unexpected termination or system signals.

Bun Workers don't receive OS signals — only the main thread does.
When altimate-code crashes, all in-memory traces were lost because
the worker had no crash handlers.

- worker.ts: add `flushAllTracesSync()` with idempotency guard,
  called from `uncaughtException` and `process.once("exit")`
- thread.ts: add signal handlers (SIGINT/SIGTERM/SIGHUP) that call
  `worker.terminate()` + `Bun.sleepSync(50)` to trigger worker exit
- index.ts: safety net for headless mode — flush `Trace.active` on
  `uncaughtException` and SIGHUP

Closes #593

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

Warning

Rate limit exceeded

@anandgupta42 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 11 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 11 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d882fb2f-5f16-45b1-8551-915b46802c0a

📥 Commits

Reviewing files that changed from the base of the PR and between 60039f8 and 60629d4.

📒 Files selected for processing (2)
  • packages/opencode/src/cli/cmd/tui/thread.ts
  • packages/opencode/src/cli/cmd/tui/worker.ts
📝 Walkthrough

Walkthrough

These changes add crash handlers across three modules to flush session traces to disk when the process terminates unexpectedly. Signal handlers in the main thread trigger worker termination, while worker and main processes implement synchronous trace flushing during exit and uncaught exception scenarios, ensuring trace data persists even after crashes.

Changes

Cohort / File(s) Summary
Emergency Signal Handling
packages/opencode/src/cli/cmd/tui/thread.ts
Added emergencyTerminate handler listening for SIGINT, SIGTERM, and SIGHUP signals that calls worker.terminate() and sleeps briefly to allow worker exit logic. Added corresponding process.off() calls during normal shutdown to prevent emergency handler triggering.
Worker Trace Flushing
packages/opencode/src/cli/cmd/tui/worker.ts
Introduced flushAllTracesSync(reason) guarded by hasFlushed flag that iterates over sessionTraces and calls trace.flushSync() while swallowing errors. Added synchronous trace flush on uncaughtException and registered process.once("exit") handler to flush traces with reason "Process exited".
Main Entry Point Trace Flushing
packages/opencode/src/index.ts
Added Trace import and best-effort flushing of active trace via Trace.active?.flushSync() in both uncaughtException and SIGHUP handlers. Modified SIGHUP handler to attempt trace flushing within try/catch before calling process.exit().

Sequence Diagram

sequenceDiagram
    participant OS as Operating System
    participant Main as Main Thread
    participant Worker as Worker Process
    participant Trace as Trace System
    participant Disk as Disk Storage

    OS->>Main: Signal (SIGINT/SIGTERM/SIGHUP)
    activate Main
    Main->>Main: emergencyTerminate handler
    Main->>Worker: terminate()
    activate Worker
    Worker->>Worker: uncaughtException or exit triggered
    Worker->>Trace: flushAllTracesSync()
    activate Trace
    loop for each sessionTrace
        Trace->>Trace: trace.flushSync(reason)
        Trace->>Disk: Write trace snapshot
    end
    deactivate Trace
    Worker->>Disk: Persist trace data
    deactivate Worker
    Main->>Trace: Trace.active?.flushSync()
    activate Trace
    Trace->>Disk: Flush active trace
    deactivate Trace
    Main->>OS: process.exit()
    deactivate Main
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

contributor

Poem

🐰 A rabbit's ode to crashing gracefully:

When signals crash through the morning dew,
And processes tumble, old and new,
Our traces flush to disk with care,
No data lost to thin, thin air! 🌙✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: ensuring session traces are flushed to disk when the application crashes.
Description check ✅ Passed The description covers what changed, why, how it was verified, includes type of change and related issues, though lacking explicit test plan and changelog update checklist items.
Linked Issues check ✅ Passed The PR directly addresses issue #593 by implementing a three-layer signal handling and trace flushing mechanism that ensures traces persist to disk on unexpected termination.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing trace flushing on crash across the three critical paths identified in issue #593.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/crash-trace-flush

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/cli/cmd/tui/thread.ts`:
- Around line 161-170: The signal handler emergencyTerminate currently
terminates the worker and sleeps but doesn't end the main process; update
emergencyTerminate (used by process.on("SIGINT") and process.on("SIGTERM")) to
explicitly terminate the process after cleanup by calling process.exit(code)
(choose 0 for normal shutdown or a nonzero code for errors) or alternatively
restore default handlers (remove the listeners) before exiting; ensure the call
happens after worker.terminate() and Bun.sleepSync(50) so cleanup completes and
the process won't continue running the TUI thread.

In `@packages/opencode/src/cli/cmd/tui/worker.ts`:
- Around line 345-351: The current flushAllTracesSync only iterates
sessionTraces and misses traces removed from that map while endTrace() is still
async; fix by keeping traces reachable until endTrace() completes: when starting
async endTrace() (the endTrace() call site that currently removes entries from
sessionTraces pre-await), add the trace to a new pendingFlushes collection
(e.g., Set or Map named pendingFlushes) or defer deleting from sessionTraces
until endTrace() settles, then remove from pendingFlushes after completion;
update flushAllTracesSync() to iterate both sessionTraces and pendingFlushes and
call trace.flushSync(reason) so traces in-flight are flushed on crash.
- Around line 37-42: The one-shot guard variable hasFlushed prevents later
crash-time flushes because after the uncaughtException handler calls
flushAllTracesSync it flips hasFlushed, making the process "exit" hook a no-op
for any subsequent fatal events; update the logic so the exit handler and
uncaughtException handler each independently ensure a flush without being
disabled by a single boolean—remove or change the one-shot behavior around
hasFlushed and instead make flushAllTracesSync idempotent or track per-event
flushes, ensuring both the uncaughtException handler (where
flushAllTracesSync(...) is already called) and the "exit" hook call
flushAllTracesSync when needed; touch the hasFlushed variable usage near the
uncaughtException handler, the code that sets hasFlushed (lines around 341-344),
and the "exit" hook (around line 354) to implement this fix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f7b7eb9e-3b42-4fe8-a9be-1bb0a78e6325

📥 Commits

Reviewing files that changed from the base of the PR and between 99270e5 and 60039f8.

📒 Files selected for processing (3)
  • packages/opencode/src/cli/cmd/tui/thread.ts
  • packages/opencode/src/cli/cmd/tui/worker.ts
  • packages/opencode/src/index.ts

- thread.ts: re-raise signal after cleanup to restore default
  termination behavior (prevents process from hanging)
- worker.ts: replace one-shot `hasFlushed` guard with "preserve
  first reason" pattern so later flushes still run
- worker.ts: defer `sessionTraces.delete()` until `endTrace()`
  completes so crash flush can still reach in-flight traces

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session traces lost after crash — no crash handlers flush traces to disk

1 participant