Skip to content

fix(bin): forward signals from pty wrapper to inner CLI#36

Merged
myobie merged 1 commit into
myobie:mainfrom
schickling-assistant:fix/wrapper-forward-signals
May 3, 2026
Merged

fix(bin): forward signals from pty wrapper to inner CLI#36
myobie merged 1 commit into
myobie:mainfrom
schickling-assistant:fix/wrapper-forward-signals

Conversation

@schickling-assistant
Copy link
Copy Markdown
Contributor

Independent from #35.

Problem

bin/pty is a thin Node wrapper that re-execs the real CLI via
spawnSync(node, [cli.js, ...args]). spawnSync blocks the wrapper's
event loop, so any signal the wrapper receives is delivered to the
wrapper only — the inner cli.js child never sees it.

This is invisible under most service managers because the manager
SIGKILLs the whole cgroup. It bites hard under systemd with
KillMode=process:

  1. systemd tracks the wrapper as MainPID and SIGTERMs it on stop.
  2. Wrapper dies. The inner node cli.js supervisor start (which has
    its own SIGTERM handler that runs Supervisor.stop() and releases
    supervisor.lock) is reparented to PID 1 and keeps running.
  3. systemd starts a new unit invocation. The new wrapper spawns a new
    inner CLI. acquireLock(\"supervisor\") correctly sees the orphan is
    still alive and exits 1 with [supervisor] another supervisor is already running.
  4. systemd's Restart=on-failure loops forever. Reproduced on a NixOS
    host where the restart counter climbed past 500 with the original
    supervisor still alive after an hour.

The supervisor-side stale-lock check (src/sessions.ts:578-623) is
already correct — it inspects the holder pid via kill(pid, 0) and
reclaims dead locks. The bug is upstream: the holder is genuinely alive
because the wrapper never told it to shut down.

Change

bin/pty:

  • spawn instead of spawnSync so the wrapper's event loop stays live.
  • Forward SIGTERM/SIGINT/SIGHUP/SIGQUIT/SIGUSR1/SIGUSR2 to
    the child.
  • Drive the wrapper's own exit from the child's exit event so the
    child has time to flush before status is propagated.
  • Normal exit-code passthrough is unchanged. Signal-death maps to
    128 + signum, matching shell convention.
  • No new dependencies; still node:child_process + node:os.

tests/wrapper-signal-forwarding.test.ts:

  • Spawns the wrapper running supervisor start against an isolated
    PTY_SESSION_DIR, waits for supervisor.pid + supervisor.lock to
    appear, SIGTERMs the wrapper, and asserts:
    1. Wrapper exits within the timeout.
    2. The inner cli.js pid (read from supervisor.pid) dies — proves
      the signal was forwarded.
    3. supervisor.lock is gone — proves the child shut down gracefully
      rather than being SIGKILLed.

Verification

  • npx vitest run tests/wrapper-signal-forwarding.test.ts — passes.
  • Manually re-applied the old spawnSync wrapper and confirmed the
    test fails (orphan inner pid, leaked lock) — i.e. the test catches
    the regression it's meant to catch.
  • npx vitest run (full suite) — 6 pre-existing failures in
    tests/screenshot.test.ts (vim) and tests/shells.test.ts (zsh)
    caused by missing local binaries, unrelated to this change.
    Confirmed the same 6 fail on origin/main without this patch.
  • npx tsc -p tsconfig.build.json — clean.
Posted on behalf of @schickling
field value
agent_name 🪁 cl2-kite
agent_session_id 021feea5-3c35-4d56-a9d5-7d35b4bfcc74
agent_tool Claude Code
agent_tool_version 2.1.121
agent_runtime Claude Code 2.1.121
agent_model claude-opus-4-7
worktree pty/fix/wrapper-forward-signals
machine dev3
tooling_profile dotfiles@unknown-dirty

The bin/pty wrapper used spawnSync, which (a) blocks the wrapper's event
loop so any signal handlers we'd register would never run, and
(b) leaves no path for SIGTERM/SIGINT/etc. to reach the cli.js child.

Under systemd with KillMode=process, the wrapper is the unit's MainPID.
systemd SIGTERMs the wrapper, the wrapper dies, but the inner cli.js
supervisor — which has its own SIGTERM handler that calls
Supervisor.stop() and releases supervisor.lock — never sees the signal.
It gets reparented and keeps running. Every subsequent unit invocation
then exits with 'another supervisor is already running' because the
orphan still holds the lock. Reproduced on a NixOS host where the
restart counter climbed past 500 with the original supervisor still
alive after an hour.

The fix switches to spawn() and forwards SIGTERM/SIGINT/SIGHUP/SIGQUIT/
SIGUSR1/SIGUSR2 to the child, then waits for the child's exit event
before propagating its status. Normal exit-code passthrough is
unchanged. Signal-death is mapped to 128+signum, matching shell
convention.

Tests in tests/wrapper-signal-forwarding.test.ts spawn the wrapper
running 'supervisor start' against an isolated PTY_SESSION_DIR, SIGTERM
the wrapper, and assert the inner cli.js dies and supervisor.lock is
released. Verified the test fails against the old spawnSync wrapper
(inner pid survives, lock leaks).
@myobie myobie merged commit 0ce8431 into myobie:main May 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants