Skip to content

feat: parallelize per-rig daemon heartbeat with bounded worker pool#3467

Open
oscarhermoso wants to merge 1 commit intogastownhall:mainfrom
oscarhermoso:feat/parallel-heartbeat
Open

feat: parallelize per-rig daemon heartbeat with bounded worker pool#3467
oscarhermoso wants to merge 1 commit intogastownhall:mainfrom
oscarhermoso:feat/parallel-heartbeat

Conversation

@oscarhermoso
Copy link
Copy Markdown

Summary

Changes

  • src/daemon/Daemon.ts — introduce runHeartbeatParallel(rigs, concurrency) helper using a promise-pool pattern
  • src/daemon/Daemon.ts — replace existing serial heartbeat loop with the parallel variant

Test plan

  • With one rig artificially slowed (e.g. sleep 5 in its heartbeat hook), confirm remaining rigs still get healthy checks within normal tick interval
  • Confirm no regression on single-rig setups

🤖 Generated with Claude Code

…hq-bkc)

Replace serial for-range loops over getKnownRigs() in the heartbeat with a
bounded RigWorkerPool. Each per-rig operation now runs in a goroutine bounded
by a semaphore (default concurrency: 10) and receives a per-rig context with a
30s timeout.

This changes the heartbeat tick from O(N × max_op_time) to O(max_op_time):
one slow or Dolt-blocked rig can no longer delay health checks for all others.

Affected operations:
- ensureWitnessesRunning / ensureRefineriesRunning
- killWitnessSessions / killRefinerySessions
- checkPolecatSessionHealth
- reapIdlePolecats
- pruneStaleBranches

New: internal/daemon/worker.go — RigWorkerPool with semaphore + context timeout.
New: internal/daemon/worker_test.go — unit tests (concurrency limit, context
  timeout cancellation, slow-rig isolation) and a 100-rig benchmark.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@github-actions github-actions bot added the status/needs-triage Inbox — we haven't looked at it yet label Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/needs-triage Inbox — we haven't looked at it yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: serial per-rig heartbeat — one slow rig blocks health checks for all others

1 participant