Skip to content

fix(list): keep sessions whose daemon outlives a socket-reachable failure#35

Merged
myobie merged 1 commit into
myobie:mainfrom
schickling-assistant:fix/list-keep-alive-daemons
May 3, 2026
Merged

fix(list): keep sessions whose daemon outlives a socket-reachable failure#35
myobie merged 1 commit into
myobie:mainfrom
schickling-assistant:fix/list-keep-alive-daemons

Conversation

@schickling-assistant
Copy link
Copy Markdown
Contributor

Closes #34.

listSessions() had two cleanup paths that fired even when the recorded
pid was still alive:

  1. The .sock branch deleted the socket file whenever isSocketReachable
    returned false. The probe has a 500ms timeout and trips on a busy
    daemon, transient EAGAIN, or a race with a service restart. Once the
    .sock was gone, the still-alive daemon became invisible to every
    future scan and there was no way to recover it short of kill -9.
  2. The .json branch unconditionally deleted metadata older than 24h.
    Long-lived daemons whose metadata wasn't refreshed (because nothing
    refreshes it) had their .json removed after a day even though they
    kept consuming memory.

This change gates both paths on isProcessAlive(pid):

  • If the socket isn't reachable but the pid is alive, keep the .sock
    and report the session as running. Calling code that needs a
    reachable socket will fail in the usual way; the lister no longer
    destroys recovery state.
  • If .json is older than 24h but the pid is alive, leave the metadata
    alone. The TTL is meant to age out known-dead sessions, not to garbage-
    collect unknown-state ones. The pid probe is cheap (a single
    kill(pid, 0) syscall) and runs at most once per dead-looking session
    per scan.

Tests in tests/list-filters.test.ts lock in both behaviors using the
test runner's own pid as a stand-in for an alive daemon, plus a
matched negative case proving the 24h TTL still fires when the pid is
dead.

Verification

  • npx vitest run tests/list-filters.test.ts — all 18 tests pass
    (including the 2 new ones).
  • npx vitest run tests/gc.test.ts tests/list-filters.test.ts — passes.
  • npx tsc -p tsconfig.build.json — clean.
Posted on behalf of @schickling
field value
agent_name 🏔️ cl2-ridge
agent_session_id d12285ba-b470-4bed-a9e8-0798cdcfc25b
agent_tool Claude Code
agent_tool_version 2.1.121
agent_runtime Claude Code 2.1.121
agent_model claude-opus-4-7
worktree pty/fix/list-keep-alive-daemons
machine dev3
tooling_profile dotfiles@635d85d

…achable

listSessions() previously deleted .sock files whenever a socket was
unreachable, even if the recorded pid was still alive. That made any
transient socket-reachable failure (busy daemon, EAGAIN, race with a
service restart) permanent: once the .sock is removed, the still-alive
daemon becomes invisible to all future scans.

It also called cleanupAll() on .json files older than 24h without
checking whether the daemon process was still running, so long-lived
daemons silently lost their metadata after a day and disappeared from
'pty list' even though they kept consuming RAM.

This commit makes both checks gated on isProcessAlive(pid):
- if pid is alive but socket is unreachable, keep both .sock and metadata,
  report status as running
- if .json is older than 24h, only cleanupAll() when pid is dead

Refs: myobie#34
@myobie myobie merged commit d08d20e into myobie:main May 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pty list loses sessions across supervisor restarts and 24h cleanup, even when daemons are still alive

2 participants