fix: flush session traces to disk on crash#594
Conversation
Bun Workers don't receive OS signals — only the main thread does.
When altimate-code crashes, all in-memory traces were lost because
the worker had no crash handlers.
- worker.ts: add `flushAllTracesSync()` with idempotency guard,
called from `uncaughtException` and `process.once("exit")`
- thread.ts: add signal handlers (SIGINT/SIGTERM/SIGHUP) that call
`worker.terminate()` + `Bun.sleepSync(50)` to trigger worker exit
- index.ts: safety net for headless mode — flush `Trace.active` on
`uncaughtException` and SIGHUP
Closes #593
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 11 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThese changes add crash handlers across three modules to flush session traces to disk when the process terminates unexpectedly. Signal handlers in the main thread trigger worker termination, while worker and main processes implement synchronous trace flushing during exit and uncaught exception scenarios, ensuring trace data persists even after crashes. Changes
Sequence DiagramsequenceDiagram
participant OS as Operating System
participant Main as Main Thread
participant Worker as Worker Process
participant Trace as Trace System
participant Disk as Disk Storage
OS->>Main: Signal (SIGINT/SIGTERM/SIGHUP)
activate Main
Main->>Main: emergencyTerminate handler
Main->>Worker: terminate()
activate Worker
Worker->>Worker: uncaughtException or exit triggered
Worker->>Trace: flushAllTracesSync()
activate Trace
loop for each sessionTrace
Trace->>Trace: trace.flushSync(reason)
Trace->>Disk: Write trace snapshot
end
deactivate Trace
Worker->>Disk: Persist trace data
deactivate Worker
Main->>Trace: Trace.active?.flushSync()
activate Trace
Trace->>Disk: Flush active trace
deactivate Trace
Main->>OS: process.exit()
deactivate Main
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/cli/cmd/tui/thread.ts`:
- Around line 161-170: The signal handler emergencyTerminate currently
terminates the worker and sleeps but doesn't end the main process; update
emergencyTerminate (used by process.on("SIGINT") and process.on("SIGTERM")) to
explicitly terminate the process after cleanup by calling process.exit(code)
(choose 0 for normal shutdown or a nonzero code for errors) or alternatively
restore default handlers (remove the listeners) before exiting; ensure the call
happens after worker.terminate() and Bun.sleepSync(50) so cleanup completes and
the process won't continue running the TUI thread.
In `@packages/opencode/src/cli/cmd/tui/worker.ts`:
- Around line 345-351: The current flushAllTracesSync only iterates
sessionTraces and misses traces removed from that map while endTrace() is still
async; fix by keeping traces reachable until endTrace() completes: when starting
async endTrace() (the endTrace() call site that currently removes entries from
sessionTraces pre-await), add the trace to a new pendingFlushes collection
(e.g., Set or Map named pendingFlushes) or defer deleting from sessionTraces
until endTrace() settles, then remove from pendingFlushes after completion;
update flushAllTracesSync() to iterate both sessionTraces and pendingFlushes and
call trace.flushSync(reason) so traces in-flight are flushed on crash.
- Around line 37-42: The one-shot guard variable hasFlushed prevents later
crash-time flushes because after the uncaughtException handler calls
flushAllTracesSync it flips hasFlushed, making the process "exit" hook a no-op
for any subsequent fatal events; update the logic so the exit handler and
uncaughtException handler each independently ensure a flush without being
disabled by a single boolean—remove or change the one-shot behavior around
hasFlushed and instead make flushAllTracesSync idempotent or track per-event
flushes, ensuring both the uncaughtException handler (where
flushAllTracesSync(...) is already called) and the "exit" hook call
flushAllTracesSync when needed; touch the hasFlushed variable usage near the
uncaughtException handler, the code that sets hasFlushed (lines around 341-344),
and the "exit" hook (around line 354) to implement this fix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f7b7eb9e-3b42-4fe8-a9be-1bb0a78e6325
📒 Files selected for processing (3)
packages/opencode/src/cli/cmd/tui/thread.tspackages/opencode/src/cli/cmd/tui/worker.tspackages/opencode/src/index.ts
- thread.ts: re-raise signal after cleanup to restore default termination behavior (prevents process from hanging) - worker.ts: replace one-shot `hasFlushed` guard with "preserve first reason" pattern so later flushes still run - worker.ts: defer `sessionTraces.delete()` until `endTrace()` completes so crash flush can still reach in-flight traces Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
What does this PR do?
Fixes session trace loss when altimate-code crashes. When the TUI worker process dies unexpectedly, all in-memory traces are now flushed to disk synchronously so users can find crash logs in subsequent sessions.
Key discovery: Bun Workers don't receive OS signals (SIGINT/SIGTERM/SIGHUP) — only the main thread does. The fix uses a 3-layer approach:
worker.terminate()+Bun.sleepSync(50)to trigger the worker's exit eventprocess.once("exit")handler flushes all active traces viaflushAllTracesSync()with idempotency guardTrace.activeonuncaughtExceptionand SIGHUPUpstream context: sst/opencode has 60+ open crash-related issues (anomalyco/opencode#19023, anomalyco/opencode#14291, anomalyco/opencode#12767).
Type of change
Issue for this PR
Closes #593
Related: #588
How did you verify your code works?
worker.terminate()triggers worker'sexitevent handlerBun.sleepSync(50)gives worker time to flush before main thread exitsChecklist
🤖 Generated with Claude Code
Summary by CodeRabbit