fix(shutdown): Improve gastown shutdown reliability #10

sauerdaniel · 2026-01-12T14:33:15Z

Summary

Improves gastown shutdown reliability
Addresses issues with gt down not properly stopping all components

Test plan

Run gt down
Verify all sessions are properly terminated
Confirm clean shutdown

Changed from nuclear=true to nuclear=false when polecats self-destruct via gt done. The nuclear flag bypasses ALL safety checks including the cleanup_status field that was added as part of ZFC #10 to prevent accidental work loss. Now polecats will validate their self-reported cleanup_status before removing themselves, consistent with how the witness handler handles cleanup. Fixes steveyegge#360

Three related fixes for polecat lifecycle management: 1. Push branch to origin before self-nuke (done.go) - Ensures work is preserved on remote before worktree cleanup - Prevents orphaned local-only branches 2. Respect cleanup_status in selfNukePolecat (done.go) - Changed nuclear=true to nuclear=false - Validates cleanup_status before removal - Prevents destruction with uncommitted/unpushed work 3. Respawn done polecats with hooked work (manager.go, handlers.go) - loadFromBeads now checks hook_bead field - Added FindPolecatsWithHookedWork() and RespawnPolecatWithHookedWork() - Witness can auto-respawn polecats that have pending work Fixes steveyegge#360

Two fixes for daemon-managed agent startup: 1. Boot watchdog CLAUDE.md creation (boot.go, templates.go) - Add CreateBootCLAUDEmd function to templates package - Add EnsureCLAUDEmd method to create context before session spawn - Enables Boot to perform intelligent triage decisions 2. Deacon startup auto-execution (deacon.go) - Execute gt prime directly via SendKeys instead of nudge message - Prevents text appearing in prompt area without execution - Fixes endless restart loop in Claude Code v2.1.4+

1. Attach new patrol wisp to hook for autonomous continuation - Ensures witness continues patrol after session restart 2. Add --hook flag to SessionStart hooks in createPatrolHooks - Properly signals hook attachment during session creation

After polecats push their work branches to origin before self-nuke, the refinery was only deleting local branches after merge, leaving stale remote branches accumulating. Added remote branch deletion in handleSuccess and handleSuccessFromQueue to clean up both local and remote copies after successful merge.

When convoy leg beads complete, they now record output_path metadata so synthesis workflows can discover and aggregate outputs without hunting through worktrees or guessing branch names. Changes: - formula.go: parse output section, include output_path in leg descriptions - convoy.go: add Description field to issueDetails for metadata parsing - synthesis.go: parse output_path from leg descriptions with template fallback Fixes steveyegge#303

Adds support for configuring a separate push URL (fork) when the upstream repository is read-only. This allows polecats to push to a personal fork while still pulling from the upstream repository. Changes: - Added PushURL field to RigConfig and Rig struct - Added PushURL to AddRigOptions - Added ConfigurePushURL function to git package - Configure push URL in bare repo when PushURL is set Usage: gt rig add --git-url=https://github.com/upstream/repo \ --push-url=https://github.com/user/fork \ myrig

Add Community section with link to Discord server for real-time support and collaboration. Fixes steveyegge#305

…/witness-improvements', 'pr/refinery-branch-cleanup', 'pr/synthesis-output-metadata', 'pr/push-url-config' and 'pr/discord-link'

The post-startup nudges were arriving before Claude Code's input was ready, causing only the Enter key to make it through (empty input). Changes: - Pass "gt prime" as CLI argument to Claude Code startup command - Remove unreliable post-startup nudges and timing delays - The SessionStart hook provides a backup propulsion mechanism The CLI prompt approach is more reliable because the prompt is queued before Claude even starts, avoiding timing issues entirely. Fixes: gt-x7p3

The boot role was added but the test expectation wasn't updated, causing TestRoleNames to fail. Fixes: gt-j7wl

Apply the same fix as Mayor (d509f7c) to Deacon, Witness, Refinery, and Polecat. Post-startup nudges arrive before Claude Code's input is ready, causing only the Enter key to make it through (empty input). Changes for each agent: - Pass "gt prime" as CLI argument to startup command - Remove unreliable post-startup nudges and timing delays - Keep SessionStart hook as backup propulsion mechanism The CLI prompt approach is more reliable because the prompt is queued before Claude even starts, avoiding timing issues entirely. Fixes: gt-mghw

Boot agent was getting wrong settings template due to: 1. RoleTypeFor() missing "boot" - fell through to Interactive 2. spawnTmux() not calling EnsureSettingsForRole() Add "boot" to autonomous roles list and call EnsureSettingsForRole() in spawnTmux() to create proper .claude/settings.json for Boot. Fixes: gt-hnjp

Adds per-agent-type health tracking to the Mayor's tmux statusline, showing working/idle counts for Polecats, Witnesses, Refineries, and Deacon. All agent types are always displayed, even when no agents of that type are running (shows as '0/0 😺'). Format: active: 4/4 😺 6/10 👁️ 7/10 🏭 1/1 ⛪

- Abbreviate long rig names (design_forge→df, gastown→gt, etc.) - Update tests for new abbreviations - Addresses issue hq-dn15

- Add AgentCrew to tracked agent types in mayor statusline - Show 👷 icon for crew agents - Display crew count in statusline (e.g., 👷1/5) - Removes crew from skip filter so they're properly tracked Fixes issue where crew agents were not shown in statusline.

Active rigs now appear first (alphabetically), followed by parked/docked rigs (also alphabetically). This makes it easier to see which rigs are operational at a glance.

Move dynamic status content from status-right to status-left to utilize available space and prevent rig name truncation. - SetStatusFormat: Now sets status-right with compact identity - SetDynamicStatus: Now sets status-left with dynamic content - Increased status-left-length to 150 for more space - Removed time from dynamic status (was %H:%M) Fixes hq-s1il

The issue describes Mayor not monitoring convoys, but the root cause is that Deacon's patrol loop never called the existing infrastructure (gt convoy stranded + mol-convoy-feed). This implements the daemon-driven convoy progression approach (suggested option #1 in the issue). Changes: - Added feed-stranded-convoys step to mol-deacon-patrol formula - Deacon now runs gt convoy stranded --json each patrol cycle - For each stranded convoy, dispatches mol-convoy-feed dog - Updated dependency chain - Bumped formula version from 8 to 9

- Removed space between counts and emojis (e.g., "3 😺" → "3😺") - Removed space between emojis and counts/subjects (e.g., "📬 3" → "📬3") - Removed space between hook emoji and text (e.g., "🪝 work" → "🪝work")

IsClaudeRunning was calling IsAgentRunning (which calls GetPaneCommand), then immediately calling GetPaneCommand again. This duplicate subprocess call was slowing down gt startup and daemon heartbeat operations. Changed IsAgentRunning to return (bool, string) - the running status and the pane command it checked. IsClaudeRunning now reuses the command instead of making a redundant tmux subprocess call. Fixes gt-kpii: zombie session detection slows gt up

The notifyRecipient function was using NudgeSession which sends notifications to the input buffer. Changed to use SendNotificationBanner which displays the banner in the message history using echo. This fixes the issue where notification banners appeared in Claude Code's input buffer instead of in the conversation history. Fixes hq-nc9mr Replaces: hq-1qhj

Previously, the witness statusline only showed the crew count when it was greater than 0. Now all agent types (polecats 😺 and crew 👷) are always displayed, even when their count is 0.

For Claude Code sessions, mail notifications now use NudgeSession instead of SendNotificationBanner. This ensures notifications appear in the message history rather than being injected into the input buffer. Fixes: hq-1qhj

Changed the statusline format from "1/10😺" to "😺1/10" to match the documented format in the comment. This ensures the icon appears before the working/total counts for all agent types.

The warning when processes respawn after 'gt down --all' now includes more comprehensive troubleshooting guidance, including checking gt status and mentioning that the gt daemon itself could be the cause.

After the polecat self-nuke fix, branches are now pushed to origin before the polecat's worktree is deleted. The refinery was only deleting local branches after merge, leaving stale remote branches. Fix: Updated handleSuccess and handleSuccessFromQueue to also delete the remote branch from origin after deleting the local branch. Related to: hq-nju99, GitHub issue steveyegge#359

Boot was designed as an ephemeral triage agent that runs on each daemon tick, observes Deacon's state, and exits. However, Boot was getting stuck at interactive prompts after completing triage, which prevented the daemon from spawning fresh Boot instances. Fix: Create CLAUDE.md for Boot that instructs it to: 1. Check Deacon status and heartbeat 2. Take action if needed (nudge/restart Deacon) 3. Exit immediately using `tmux kill-session -t gt-boot` This ensures Boot functions as designed - ephemeral watchdog that runs triage and exits, allowing the daemon to spawn fresh Boot instances on each heartbeat. Related: hq-6p7g4

Fixes hq-lglmw When gt sling assigns work to a polecat, it now automatically attaches the mol-polecat-work molecule to the polecat's agent bead. Changes: - Added attachPolecatWorkMolecule() function that cooks the formula and attaches the molecule to the polecat's agent bead - Added molecule attachment call after hooking work (single sling mode) - Added molecule attachment call after hooking work (batch sling mode) - Implementation is idempotent (checks if already attached) - Non-blocking: logs warnings but doesn't fail sling operation

Issue steveyegge#197: Polecat fails to hook when slinging a bead with a molecule to a rig. Root cause: attachPolecatWorkMolecule was running 'bd cook' from the polecat's worktree (which doesn't have a .beads directory) instead of from the rig directory where the bead database lives. Fix: Use beads.ResolveHookDir() to resolve the correct rig directory for running bd commands, consistent with how the hook command works.

The sling command needs to handle .repo.git symlinks correctly for polecat spawning across all rigs. Related: hq-dp3ss

Fixes bug where work slung to 'done' polecats (no active tmux session) would never get processed. Now when gt sling resolves an existing polecat target and finds no active session, it spawns a fresh polecat instead of failing or leaving the work stuck. This addresses hq-50u3h: 43+ stale convoys were not progressing because polecats in 'done' state had work hooked to them but weren't processing it.

The health tracking loop in runMayorStatusLine was counting all agents regardless of whether their rig was registered in rigs.json. This caused count discrepancies when sessions existed for unregistered rigs. Now the health tracking loop applies the same registeredRigs filter that the earlier rig status loop uses, ensuring consistent counts across all statusline displays. Fixes hq-auhq

Boot was designed to be a watchdog that runs on daemon ticks and manages Deacon lifecycle, but it wasn't functioning because Boot's CLAUDE.md context file was missing from the boot directory. Changes: - Add CreateBootCLAUDEmd function to templates package - Add EnsureCLAUDEmd method to Boot to create CLAUDE.md from template - Update spawnTmux to call EnsureCLAUDEmd before creating session - Add "boot" to RoleNames list This ensures Boot has proper context when spawned by the daemon, enabling it to perform intelligent triage (start/wake/nudge/interrupt decisions) instead of running without instructions. Fixes: hq-6p7g4

Fixes steveyegge#210 - Creating a convoy as mayor results in prefix mismatch The town-level beads database is initialized with issue_prefix=hq, but convoy creation was generating IDs with hq-cv- prefix, causing bd create to fail with prefix mismatch error. Changed convoy ID generation from hq-cv-<hash> to hq-<hash>. Convoys are distinguished by type=convoy attribute, not by special ID prefix.

Comprehensive research on media processing optimization covering: - Performance bottleneck analysis (I/O, CPU, memory) - Parallel processing strategies (pipeline, data, hybrid) - Multi-layer caching architecture (Redis + local SSD) - Format optimization matrix and codec comparisons - Cost reduction opportunities (40-60% estimated savings) - 6-week proof of concept implementation plan - Recommended technology stack and code examples Deliverables complete: Performance audit, optimization recommendations, PoC plan.

Implements GitHub issue steveyegge#220 - Worktree setup hook for injecting local configurations. When polecats are spawned, their worktrees are created from the rig's repo. Previously, there was no way to inject custom configurations during this process. Now users can place executable hooks in <rig>/.runtime/setup-hooks/ to run custom scripts during worktree creation: rig/ .runtime/ setup-hooks/ 01-git-config.sh <- Inject git config 02-copy-secrets.sh <- Copy secrets 99-finalize.sh <- Final setup Features: - Hooks execute in alphabetical order - Non-executable files are skipped with a warning - Hooks run with worktree as working directory - Environment variables: GT_WORKTREE_PATH, GT_RIG_PATH - Hook failures are non-fatal (warn but continue) Example hook to inject git config: #!/bin/sh git config --local user.signingkey ~/.ssh/key.asc git config --local commit.gpgsign true Related to: hq-fq2zg, GitHub issue steveyegge#220

Adds per-agent-type health tracking to the Mayor's tmux statusline, showing working/idle counts for Polecats, Witnesses, Refineries, and Deacon. All agent types are always displayed, even when no agents of that type are running (shows as '0/0 😺'). Format: active: 4/4 😺 6/10 👁️ 7/10 🏭 1/1 ⛪

Fixes steveyegge#291 - gastown is very hard to kill/shutdown/stop Changes: - Add shutdown coordination: daemon checks shutdown.lock and skips heartbeat auto-restarts during shutdown to prevent fighting shutdown - Extend grace period from 100ms to 30 seconds for graceful session exit - Add polling to detect when sessions exit gracefully before force kill - Add orphaned Claude/node process detection in shutdown verification The daemon's heartbeat now checks for shutdown.lock (created by gt down) and skips auto-restart logic when shutdown is in progress. This prevents the daemon from restarting agents that were intentionally killed during shutdown. Sessions now receive Ctrl-C and have up to 30 seconds to exit cleanly, with polling every 500ms to detect graceful exit. Only sessions that don't exit within the grace period are force-killed. Shutdown verification now includes detection of orphaned Claude/node processes that may be left behind when tmux sessions are killed but child processes don't terminate.

The sling refactor (cd2de6e) split the 1560-line sling.go into 7 focused modules, but left duplicate function declarations in the original file. This commit removes the duplicates, keeping only the implementations in the split files. Also fixes related build issues: - Remove unused claude import from boot/boot.go - Fix IsAgentRunning() calls to handle multiple return values - Fix atomic operation on startedAny counter in start.go - Remove duplicate health tracking code in statusline.go - Add missing imports (strings, config) to sling.go

IsClaudeRunning was calling IsAgentRunning (which calls GetPaneCommand), then immediately calling GetPaneCommand again. This duplicate subprocess call was slowing down gt startup and daemon heartbeat operations. Changed IsAgentRunning to return (bool, string) - the running status and the pane command it checked. IsClaudeRunning now reuses the command instead of making a redundant tmux subprocess call. Fixes gt-kpii: zombie session detection slows gt up

Fixes steveyegge#291 - gastown is very hard to kill/shutdown/stop Changes: - Add shutdown coordination: daemon checks shutdown.lock and skips heartbeat auto-restarts during shutdown to prevent fighting shutdown - Extend grace period from 100ms to 30 seconds for graceful session exit - Add polling to detect when sessions exit gracefully before force kill - Add orphaned Claude/node process detection in shutdown verification The daemon's heartbeat now checks for shutdown.lock (created by gt down) and skips auto-restart logic when shutdown is in progress. This prevents the daemon from restarting agents that were intentionally killed during shutdown. Sessions now receive Ctrl-C and have up to 30 seconds to exit cleanly, with polling every 500ms to detect graceful exit. Only sessions that don't exit within the grace period are force-killed. Shutdown verification now includes detection of orphaned Claude/node processes that may be left behind when tmux sessions are killed but child processes don't terminate.

sauerdaniel force-pushed the main branch from 0f3173d to d2d7dfe Compare January 12, 2026 17:12

sauerdaniel added 7 commits January 12, 2026 19:07

fix(witness): improve patrol hook handling

bb7621f

1. Attach new patrol wisp to hook for autonomous continuation - Ensures witness continues patrol after session restart 2. Add --hook flag to SessionStart hooks in createPatrolHooks - Properly signals hook attachment during session creation

docs(readme): add Discord server link

781073c

Add Community section with link to Discord server for real-time support and collaboration. Fixes steveyegge#305

sauerdaniel force-pushed the main branch from 78a9837 to d6dc439 Compare January 12, 2026 19:14

sauerdaniel and others added 19 commits January 12, 2026 20:23

Merge branches 'pr/polecat-lifecycle', 'pr/boot-deacon-watchdog', 'pr…

0174dc4

…/witness-improvements', 'pr/refinery-branch-cleanup', 'pr/synthesis-output-metadata', 'pr/push-url-config' and 'pr/discord-link'

fix(templates): add boot role to TestRoleNames expected list

db8a87a

The boot role was added but the test expectation wasn't updated, causing TestRoleNames to fail. Fixes: gt-j7wl

Merge branch 'polecat/test-rolenames-boot-68252641'

ee92571

Merge branch 'steveyegge:main' into main

07af705

chore: clean up beads formulas directory

8b3aedf

feat(statusline): use rig name abbreviations to save space

d6d5e57

- Abbreviate long rig names (design_forge→df, gastown→gt, etc.) - Update tests for new abbreviations - Addresses issue hq-dn15

feat(statusline): order rigs by activity (parked/stopped to the right)

ad5c32d

Active rigs now appear first (alphabetically), followed by parked/docked rigs (also alphabetically). This makes it easier to see which rigs are operational at a glance.

fix(statusline): always show all agent types even when count is 0

468f215

Previously, the witness statusline only showed the crew count when it was greater than 0. Now all agent types (polecats 😺 and crew 👷) are always displayed, even when their count is 0.

sauerdaniel added 18 commits January 13, 2026 03:48

fix(statusline): place icon before counts to match documented format

f4f0daf

Changed the statusline format from "1/10😺" to "😺1/10" to match the documented format in the comment. This ensures the icon appears before the working/total counts for all agent types.

fix(down): improve respawned process warning message

20d27f3

The warning when processes respawn after 'gt down --all' now includes more comprehensive troubleshooting guidance, including checking gt status and mentioning that the gt daemon itself could be the cause.

fix(sling): Update sling command for .repo.git symlink compatibility

89243b9

The sling command needs to handle .repo.git symlinks correctly for polecat spawning across all rigs. Related: hq-dp3ss

sauerdaniel force-pushed the polecat/organic-mkabz4tm branch from f26d421 to eea3230 Compare January 13, 2026 04:50

sauerdaniel force-pushed the main branch 4 times, most recently from a67da82 to 60ed204 Compare January 20, 2026 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(shutdown): Improve gastown shutdown reliability #10

fix(shutdown): Improve gastown shutdown reliability #10

Uh oh!

sauerdaniel commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(shutdown): Improve gastown shutdown reliability #10

Are you sure you want to change the base?

fix(shutdown): Improve gastown shutdown reliability #10

Uh oh!

Conversation

sauerdaniel commented Jan 12, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants