fix(health-monitor): guard Keychain read + fix silent-fail task prompt by alexli-77 · Pull Request #2548 · nanocoai/nanoclaw

alexli-77 · 2026-05-19T03:03:35Z

Summary

container-runner: Keychain read in buildMounts now only overwrites claude.json when the Keychain token is strictly newer than what's already on disk. Prevents the post-spawn Keychain read from rolling back a token that refreshOauthTokenIfNeeded just refreshed via the OAuth endpoint moments earlier.
health-monitor: Rewrote the injectTask prompt to reference mounted paths (/workspace/extra/nanoclaw-logs/, /workspace/extra/nanoclaw-data/) instead of raw macOS security find-generic-password commands, which (a) don't exist inside a Linux container and (b) were triggering the agent's security refusal.

Test plan

Spawn a container for an agent whose token is near-expiry — verify refreshOauthTokenIfNeeded refreshes it and the subsequent Keychain read does not revert it
Trigger a silent-fail detection manually — verify the injected task no longer contains security find-generic-password and the health-monitor agent processes it without refusing

🤖 Generated with Claude Code

Wire @chat-adapter/discord through the Chat SDK bridge so Discord messages flow into the standard channel pipeline. Reads DISCORD_BOT_TOKEN, DISCORD_PUBLIC_KEY, and DISCORD_APPLICATION_ID from .env at adapter construction. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

….0 → 4.27.0 Aligns with upstream channels branch recommendation. The chat dep had to be bumped together because the new adapter requires the new chat.processOptionsLoad type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore: sync with upstream + Discord adapter bump

Detects "can't run" level failures that the existing stuck-container detection misses: sessions that produce a processing_ack=completed but zero messages_out — the signature of a silent 401 auth failure. - src/modules/health-monitor/setup.ts: idempotent DB bootstrap (agent group, messaging group for Discord keepalive channel, wiring, named destination) - src/modules/health-monitor/checks.ts: checkSilentFail() (ack with no output in 2h window, container stopped) + checkTokenExpiry() - src/modules/health-monitor/alert.ts: direct Discord REST alert to keepalive channel + task injection into health-monitor session - src/modules/health-monitor/index.ts: 5-min timer, 1h dedup per issue key, startHealthMonitor() (must run after initDb) - src/index.ts: MODULE-HOOK to start health-monitor after DB init - src/modules/index.ts: import health-monitor module Also adds pre-spawn OAuth token refresh from macOS Keychain in buildMounts() (container-runner.ts) — reads 'Claude Code-credentials' keychain entry before every container spawn so tokens are always fresh. Wrapped in try/catch, no-op on non-macOS. Upstream issues: nanocoai#730 (token expiry, macOS details added to comment), nanocoai#2492 (health-monitor feature proposal). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(health-monitor): host-side silent-fail detection and operator alerting

When the health-monitor detects a token expiring within 60 min, it now attempts an automatic refresh via the Anthropic token endpoint before sending any alert: - POST https://platform.claude.com/v1/oauth/token with the stored refresh_token (RFC 6749 form-encoded) - On success: writes the new access_token + refresh_token to claude.json and updates macOS Keychain so the next pre-spawn read is also fresh; posts a "auto-refreshed" confirmation to the keepalive channel - On failure: posts the original warning with the failure reason and instructions to run `claude login` manually This means token expiry is now fully silent in the normal case — the only time the user gets an alert is when the refresh_token itself has expired (i.e., the user hasn't opened Claude Code in weeks). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ets Cloudflare 403

feat(health-monitor): auto-refresh OAuth token on expiry

…re; fix negative minutes display

… container-runner - Extract core refresh logic to src/oauth-token-refresh.ts so both the health-monitor and the container spawner can use the same code - health-monitor/token-refresh.ts is now a thin wrapper around the shared util - In spawnContainer(), refresh the token before buildMounts() if it's expiring within 60 min — fixes the shutdown case where a token expires while the host is off and a task fires immediately on boot before the health-monitor's 5-min check has a chance to run - Also fixes the remaining main-branch copy of token-refresh.ts which still had the old form-encoded body (Cloudflare 403); shared util uses JSON Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…h util; pre-spawn refresh fix(health-monitor): remove redundant injectTask; shared OAuth refresh util; pre-spawn refresh

- Migration 016: token_status table (one row per agent group, upserted on each sweep — checked_at, expires_at, minutes_left, status, refreshed_at) - src/db/token-status.ts: upsertTokenStatus() + getAllTokenStatuses() - src/modules/health-monitor/token-sweep.ts: sweepAllTokens() — iterates ALL groups, calls refreshOauthTokenIfNeeded() for each, writes results to token_status. Returns array of results for alerting. - Replaces per-alert token refresh in index.ts with sweepAllTokens(). Previously only groups that hit the 60-min alert threshold got checked; now every group is checked and proactively refreshed every 5 minutes. Fixes the root cause of the ag-1778266708996-ipsjnc (Terminal Agent) stale-token issue: all groups are now covered regardless of which one triggered the alert. Query status any time: pnpm exec tsx scripts/q.ts data/v2.db \ "SELECT agent_group_id, datetime(checked_at/1000,'unixepoch','localtime') \ as checked, minutes_left, status FROM token_status" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(health-monitor): token status table + sweep all groups every 5 min

…oding HEALTH_MONITOR_DISCORD_GUILD_ID and HEALTH_MONITOR_KEEPALIVE_CHANNEL_ID are now read from the .env file. If missing, Discord wiring is skipped with a warning rather than failing silently with wrong-server IDs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(health-monitor): read Discord IDs from .env instead of hardcoding

… refresh All agent groups share one macOS Keychain entry. When any group successfully refreshes, the refresh_token rotates and Keychain is updated, but other groups' claude.json files still carry the stale refresh_token. The next sweep would then attempt a refresh with the old RT and get rejected by Anthropic. syncFromKeychain() is now called before refreshOauthTokenIfNeeded() for each group, ensuring the latest refresh_token from Keychain is in place before the API call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(health-monitor): sync claude.json from Keychain before each token refresh

…r on refresh Three changes: 1. readKeychainOauth() reads the Keychain once before the group loop instead of once per group — all groups share the same entry. 2. syncOauthToFile() only writes if the Keychain token is newer (expiresAt comparison), preventing the stale snapshot from overwriting a just-refreshed file mid-sweep. 3. After a successful refresh, restartAgentGroupContainers() stops any running container so the next spawn reads the new token from the mounted claude.json. A Discord alert is posted on restart. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

OWL_RADAR_CHANNEL_ID is now required in .env. OWL_RADAR_MANIFEST_URL and OWL_RADAR_PAGES_URL are optional with sensible defaults for the existing fork. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…token; fix silent-fail task prompt - container-runner: only overwrite claude.json from Keychain when the Keychain token is strictly newer, preventing the Keychain read from rolling back a token that refreshOauthTokenIfNeeded just refreshed via the OAuth endpoint - health-monitor: rewrite injectTask prompt to reference mounted log/data paths instead of raw macOS security commands (which fail inside a Linux container and triggered the agent's security refusal) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gavrielc · 2026-05-23T17:25:18Z

@alexli-77 Appreciate the PR but lots of unrelated changes here. Please reopen with one fix or feature per PR and clean focused change set.

alexli-77 and others added 21 commits May 8, 2026 23:54

Merge remote-tracking branch 'upstream/main'

669b6a2

Merge pull request #1 from alexli-77/sync/upstream-merge

371932f

chore: sync with upstream + Discord adapter bump

Merge pull request #2 from alexli-77/feat/health-monitor

a790963

feat(health-monitor): host-side silent-fail detection and operator alerting

fix(health-monitor): use JSON body for token refresh — form-encoded g…

d4a2372

…ets Cloudflare 403

Merge pull request #3 from alexli-77/feat/health-monitor-token-refresh

8b310e9

feat(health-monitor): auto-refresh OAuth token on expiry

fix(health-monitor): remove redundant injectTask on known token failu…

92870f8

…re; fix negative minutes display

fix(health-monitor): remove redundant injectTask; shared OAuth refres…

21d9e57

…h util; pre-spawn refresh fix(health-monitor): remove redundant injectTask; shared OAuth refresh util; pre-spawn refresh

feat(health-monitor): token status table + sweep all groups every 5 min

05764da

feat(health-monitor): token status table + sweep all groups every 5 min

Merge pull request #6 from alexli-77/feat/env-var-discord-ids

aa0eed8

refactor(health-monitor): read Discord IDs from .env instead of hardcoding

Merge pull request #7 from alexli-77/fix/token-sweep-keychain-sync

c796b75

fix(health-monitor): sync claude.json from Keychain before each token refresh

fix(owl-radar): move hardcoded channel/URLs to env vars

e67f770

OWL_RADAR_CHANNEL_ID is now required in .env. OWL_RADAR_MANIFEST_URL and OWL_RADAR_PAGES_URL are optional with sensible defaults for the existing fork. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

alexli-77 requested review from gabi-simons and gavrielc as code owners May 19, 2026 03:03

gavrielc closed this May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(health-monitor): guard Keychain read + fix silent-fail task prompt#2548

fix(health-monitor): guard Keychain read + fix silent-fail task prompt#2548
alexli-77 wants to merge 21 commits into
nanocoai:mainfrom
alexli-77:fix/health-monitor-keychain-and-task-prompt

alexli-77 commented May 19, 2026

Uh oh!

gavrielc commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexli-77 commented May 19, 2026

Summary

Test plan

Uh oh!

gavrielc commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants