fix(scripts): authenticate notification daemon requests by nnnet · Pull Request #653 · builderz-labs/mission-control

nnnet · 2026-05-06T08:32:18Z

Summary

The notification daemon's curl invocations against /api/notifications/deliver and /api/notifications (stats) are issued without any Authorization header. Both endpoints use requireRole(request, 'operator') (src/lib/auth.ts), which accepts the global API key via Authorization: Bearer … or x-api-key. Without it every poll silently returns 401 and the script reports Delivery failed: HTTP 401 with no hint about why.

Fix

Read MC_API_KEY (with API_KEY fallback so a single .env value works for both MC and the daemon) from env.
Validate it up-front; print an actionable error and exit 1 if unset, instead of silently 401-ing in a loop.
Pass it as Authorization: Bearer … on both the POST deliver call and the GET stats call.
Sync help-text default URL with the script's actual default (3005 → 3000).

No new dependencies. No behavior changes outside the auth path.

Test plan

unset MC_API_KEY API_KEY; ./scripts/notification-daemon.sh → exits 1 with the actionable error message.
MC_API_KEY=<wrong> ./scripts/notification-daemon.sh --dry-run → 401 logged with response body (was silent in 2>/dev/null).
MC_API_KEY=<right> ./scripts/notification-daemon.sh --dry-run → 200 with dry-run report.
bash -n scripts/notification-daemon.sh clean.

Make MC runnable as a single self-contained container that can drive the operator's host-installed Claude Code / Codex / OpenCode CLIs without forcing them to re-authenticate inside the container, and without OpenClaw gateway (which is macOS-only and not available on Linux hosts). Changes: - Dockerfile * Bake claude / codex CLIs into the image as a fallback so the Settings panel reports "Installed" even before the host bind-mounts attach. The host's /home/<user>/.local/bin gets first place in PATH, so an authenticated host install transparently shadows the baked one. * Add tmux + jq to the runtime image: required by /chat (PTY terminals) and by various agent runtime probes. * Reuse the slim image's existing uid 1000 user, renaming it to `nextjs`. This means bind-mounted host files (typical Linux uid 1000) are read/written without chown. - docker-compose.yml * `user: "1000:1000"` so files written into bind-mounted host dirs stay owned by the host user. * Project the host user's $HOME into the container (`.local/bin`, `.bun`, `.claude`, `.claude.json`, `.local/share/claude`) so the container sees the same authenticated CLIs the operator uses on the host. * Mount /mnt and ${HOME} verbatim so file paths the user sees on the host work identically inside the container. * Bump memory limit to 2G (Next.js 16 + node-pty + the task-dispatch loop OOM at the upstream 512M when /chat opens a terminal). * Wire ANTHROPIC_API_KEY and OPENAI_API_KEY from .env so the direct-API dispatch path works without a gateway. * Make NEXT_PUBLIC_CHAT_POLL_INTERVAL_MS a build-time variable. Default 1000ms here gives near-live transcript updates while claude writes the jsonl line-by-line. * Add MC_HOST_SESSION_MODE env (see follow-up commit). - Makefile (new) Replaces the previous mc-up.sh / mc-reset-db.sh helpers with a single entrypoint for the operator: `up`, `down`, `restart`, `recreate`, `build`, `rebuild` (no-cache), `ps`, `logs`, `shell`, `status`, `wait-ready`, `reset-db`, `nuke`. URL is fixed to 127.0.0.1:7012. - .gitignore Ignore WALKTHROUGH.md (operator-local notes file kept on disk, never committed).

The strict CSP set in src/proxy.ts uses `script-src 'nonce-X' 'strict-dynamic'` which blocks every inline <script> that does not carry the matching nonce. next-themes injects a tiny inline script at render time to set the light/dark class before first paint (anti-FOUC). Without an explicit `nonce` prop it ships that script with no nonce attribute, so the browser blocks it. Strict-dynamic then refuses to load any further script (because the boot script is the trust root), and React never hydrates — the chat input field has no working Send button, fetch from client code throws TypeError, etc. Fix: pass the per-request nonce (already available in layout via the x-nonce request header that the proxy/middleware sets) into ThemeProvider's `nonce` prop. next-themes >=0.4 propagates this to its inline script tag. Also leaves two `// console.log('[DEBUG csp] ...')` lines (commented out) in proxy.ts and layout.tsx — useful for the next person who has to debug the nonce flow end-to-end.

When MC runs in Docker and the operator already has a live `claude` CLI on the host attached to a project session, /chat → "Send" used to fail with "spawn claude ENOENT" or "No conversation found" depending on which root cause was hit first. This commit makes the endpoint actually usable for that shared-session case. What was broken: 1. `claude --resume <id>` only finds the transcript if the process cwd matches the project path encoded in `~/.claude/projects/<encoded>/`. MC's runCommand defaulted to /app inside the container, so resume silently picked up no conversation. 2. Decoding the directory name back to a path is unreliable: claude collapses both `/` and `_` to `-`, so `/foo/bar_baz` and `/foo/bar/baz` both round-trip to `-foo-bar-baz`. Naively decoding gave wrong cwd → cd failed → claude never ran. 3. Direct `spawn('claude', ...)` from the Next.js standalone server process produced ENOENT even though the binary was reachable from `which claude` and `node -e 'spawn(...)'` in the same container. Likely a Next 16 standalone runtime / process.env quirk; the exact root cause was not pinned down. 4. There was no policy for the operator-visible race when MC --resumes into a session that already has a live host CLI writing to the same jsonl. Fix: - resolveClaudeSessionCwd() now reads the actual `cwd` field from the first JSON line of the session jsonl. That field is authoritative and survives any encoding ambiguity. - resolveExecutable() walks PATH with fs.access X_OK to pin the absolute path of `claude` before spawn, eliminating bare-name resolution as a variable. - Spawn goes through `sh -c "cd <cwd> && exec <bin> --print --resume <id>"` instead of direct execvp. This sidesteps the ENOENT observed under Next 16 standalone, and shQuote() keeps the cd / bin paths safe. - New env MC_HOST_SESSION_MODE controls how MC handles a session that may have a live host CLI: coexist (default) — both write to the same jsonl. Each side picks up the other's writes on its next prompt. block-active — return 409 if jsonl mtime < 60s ago, so the operator only --resumes idle sessions. nudge — coexist + best-effort utimes() after the reply, so a tail-watching host CLI sees a fresh mtime. - TODO marker for the proper fix to "long wait on Send": replace this blocking `claude --print` call with an SSE endpoint backed by `--output-format stream-json`, so the chat UI can render tokens incrementally.

Two related issues caused the /chat session list to drift from reality. 1. Upstream considered any jsonl touched in the last 90 minutes as "active". For an operator with several claude sessions across projects this surfaces almost everything as live, including ones they finished with hours of think-time ago. Tightened to 15 minutes in both layers (the scanner-side `ACTIVE_THRESHOLD_MS` and the API-side derived recovery window `LOCAL_SESSION_ACTIVE_WINDOW_MS`) so they stay coherent. 2. Once a row was inserted into `claude_sessions`, it lived forever even after the underlying jsonl was deleted. The API still surfaced those orphans because the derived-active recovery in /api/sessions would happily flag them as "active" off the stale `last_message_at` column. After a scan cycle, also DELETE rows whose session_id is not present in the freshly-scanned set. The result message now reports `removed N orphan(s)` when this happens.

Three small UX fixes in the chat workspace: - Stop excluding `session:*` conversations from the polling fallback. Without this, `/chat` would freeze until the operator hit F5 every time SSE dropped. The polling cadence is parameterized via NEXT_PUBLIC_CHAT_POLL_INTERVAL_MS (default 1500ms in code, 1000ms in the docker-compose file) so it can be tuned without touching the component. - Drop the standalone "lastReply" panel that used to show the most recent assistant message above the input row. It duplicated the transcript and, on long replies, pushed the input field below the viewport. The reply now lands in the transcript via onRefreshTranscript() — same SessionMessage rendering, same formatting, same scroll behaviour as the rest of the conversation. This keeps the prompt input row anchored to the bottom regardless of reply size, matching how the host claude CLI itself displays things. - Leave the [DEBUG chat] console.log lines commented in place — these were how we traced the CSP/spawn pipeline end-to-end and the next person debugging a regression here will want them.

Self-contained Docker stack + /chat fixes for shared host claude sessions

When the OpenClaw gateway isn't available (Linux: it's macOS-only), the direct-API dispatch path was Anthropic-only — agents configured for OpenAI or a local model couldn't run. This adds two more direct providers without touching the existing Anthropic path. Routing is by `dispatchModel` prefix on the agent config: `claude-*`, `anthropic/*` → ANTHROPIC_API_KEY `gpt-*`, `o1-*`, `o3-*`, `openai/*` → OPENAI_API_KEY `local/*`, `ollama/*`, `lmstudio/*`, `litellm/*` → LOCAL_LLM_ENDPOINT (+ optional LOCAL_LLM_API_KEY) The "local" path is intentionally generic — it speaks the OpenAI `/v1/chat/completions` REST shape, which is what LMStudio, Ollama, vLLM, and liteLLM proxies all expose. Operators who run several local backends behind a single liteLLM endpoint can point LOCAL_LLM_ENDPOINT at it and fan everything out from one place; the rest of MC stays unchanged. Both Aegis review and the main task dispatch now go through `callDirectly()` which picks the provider. The Aegis review path also now passes through the agent's `agent_config` so per-agent `dispatchModel` overrides reach the reviewer (previously hardcoded to `null`, which forced Anthropic). Token usage from OpenAI-compatible responses (`usage.prompt_tokens` / `usage.completion_tokens`) is recorded in the same `token_usage` table the Anthropic path uses, with the model id verbatim — so cost reports cover all three providers without further plumbing. `docker-compose.yml` adds `LOCAL_LLM_ENDPOINT` (default `http://host.docker.internal:1234/v1`, LMStudio's stock listener on the docker host) and `LOCAL_LLM_API_KEY`.

…host session modes Cover the operator-path features from the docker-stack PR and this branch: - New "Self-contained Operator Setup" section explaining host $HOME bind-mounts, baked CLI fallback, the uid-1000 image constraint and the workaround for non-1000 hosts, the 2G memory floor, and why Makefile defaults to MC_PORT=7012. - "Direct API dispatch" subsection with the dispatchModel → provider routing table (Anthropic / OpenAI / OpenAI-compatible local via LMStudio, Ollama, or liteLLM proxy). - "Shared host Claude Code session" subsection describing the MC_HOST_SESSION_MODE env (coexist / block-active / nudge). - Environment Variables table extended with MC_PORT, ANTHROPIC_API_KEY, OPENAI_API_KEY, LOCAL_LLM_ENDPOINT, LOCAL_LLM_API_KEY, MC_HOST_SESSION_MODE, NEXT_PUBLIC_CHAT_POLL_INTERVAL_MS. No existing content removed or contradicted; all additions go before "Production Hardening" so the canonical hardened-compose flow is unchanged.

Step-by-step demo showing how to wire a 4-agent team with three different providers — architect on Claude Opus, implementor on gpt-4o-mini, linter on a local model via LMStudio, Aegis reviewer on Claude Sonnet — and run a single master task end to end through the dispatch / review loop. The walkthrough is intentionally exhaustive on per-field values (Display Name, Role, Soul, Settings → Agent Runtimes, dispatchModel, temperature, sandbox/network options) so an operator can copy-paste through it without guessing what the form expects. Includes: - .env preparation for all three providers - Workspace + project setup - Per-agent screens with full Soul prompt text - Master task description (login → JWT migration) the architect decomposes - Acceptance checklist (where to look, what to expect) - Troubleshooting for the common LMStudio / OPENAI_API_KEY / Aegis failure modes - Variants for Ollama, liteLLM proxy, Anthropic-only setups Lives under examples/ so it stays out of the production build but is discoverable next to the source.

Add `make dev` workflow that bind-mounts src/, public/, messages/, and configs into a Node 22 container, so day-to-day .ts/.tsx edits hit Turbopack hot-reload without rebuilding the image. Image is rebuilt only when package.json / pnpm-lock.yaml / Dockerfile.dev change. The dev compose shares the same `mission-control_mc-data` volume as production so admin user / workspaces / projects / agents created via `make up` are visible in `make dev` and vice versa (still single-writer SQLite — only run one stack at a time). Targets: dev, dev-down, dev-build, dev-rebuild, dev-logs, dev-shell, dev-ps.

The dispatch path silently fell back to `runOpenClaw` even when no gateway was actually installed, producing `spawn openclaw ENOENT` on every tick until the task was failed after 5 retries. This change adds the missing gateway-availability evidence checks and a CLI fallback for the Claude provider so MC functions standalone on Linux/Docker hosts where OpenClaw is not installed. Changes in src/lib/task-dispatch.ts: - isGatewayAvailable() now requires physical evidence: either a real openclaw.json on disk, or a registered gateway row whose status is in the healthy set (online/healthy/ready). Previously a truthy default config path was sufficient, and an onboarding-seeded gateway row with status='unknown' falsely satisfied the check. - Added isClaudeCliAvailable() + callClaudeViaCli() — when the host Claude Code CLI is bind-mounted into the container (/.local/bin/claude with ~/.claude.json), the anthropic provider routes through it via spawn('claude', ['--print', '--output-format', 'json', '--model', X]). This uses the operator's existing login/plan with no API key. - callDirectly() prefers CLI for anthropic; falls back to direct API only if CLI is absent. - isDirectDispatchAvailable('anthropic') is true when EITHER an API key OR the CLI is present. - requeueStaleTasks() now skips the offline-check entirely when MC is in direct-API mode, since direct-API agents have no heartbeat by design and would otherwise be failed after 5 stale-cycles before any dispatch could run. - scoreAgentForTask() lifts the offline/error/sleeping rejection in direct-API mode for the same reason. Changes in src/app/api/agents/[id]/route.ts: - Save flow no longer attempts to write gateway_config when openclaw.json doesn't exist — guards prevent the ENOENT error that caused agent edits to be silently reverted on Linux without OpenClaw. Changes in src/lib/claude-sessions.ts: - Oversized session jsonl files are logged once per process per filePath at INFO instead of WARN-spammed on every 30s sync tick. Changes in src/components/panels/agent-detail-tabs.tsx: - Defensive coercion in 4 places against nested model.primary objects ({primary: {primary: "..."}}), which crashed Config tab with React error builderz-labs#31 when MC and gateway disagreed on the model schema.

…through Rewrite examples/MULTI-PROVIDER-DEMO.md from a high-level outline into a 13-section step-by-step guide a non-MC operator can follow front-to-back without needing to read the source. Each UI step lists the exact label, the value to enter, and the expected result. Sections: - 0 prepare .env (API keys, host session mode) - 1 create workspace (8 fields, OpenClaw template note) - 2 add project (with expandable card) - 3-6 create 4 agents via the 3-step wizard with full text: Architect (Claude Opus), Aegis (Sonnet), Dev (OpenAI), Linter (Local LLM) - 7 master task with full description copy - 7.A explain Owner / "Awaiting Owner" gate - 8 pipeline execution + how to inspect results Adds a hard-won note at the top: in direct-API mode the agents stay in 'offline' status by design (no heartbeat), but tasks still dispatch via the direct provider — this is normal and not a failure mode.

Add a parallel docker-compose stack that brings up the OpenClaw gateway daemon (github.com/openclaw/openclaw) on host ports 18789/18790 — the same defaults MC has long expected via OPENCLAW_GATEWAY_HOST/PORT in docker-compose-dev.yml. The integration is strictly additive: - docker-compose-openclaw.yml is independent of docker-compose.yml and docker-compose-dev.yml. MC reaches openclaw via host.docker.internal, no shared network or compose merging required. - When openclaw is up and registered in MC's gateways table with status='online', isGatewayAvailable() returns true and dispatch automatically routes through runOpenClaw — agents get persistent PTY-backed sessions with full tool-use. - When openclaw is down (or never started), MC silently falls back to the direct-API/CLI path introduced in the previous commit. Operators can adopt or remove the gateway without changing any MC code. Makefile targets: openclaw-clone git-clones github.com/openclaw/openclaw to ./openclaw-src openclaw-build docker builds the gateway image (5-10 min, one-time) openclaw-up starts the gateway daemon openclaw-down stops it (MC keeps running on direct-API fallback) openclaw-restart / -logs / -ps / -status openclaw-onboard runs the upstream interactive provider/skills wizard openclaw-shell drops into the CLI sidecar openclaw-doctor runs 'openclaw doctor' for diagnostics openclaw-token prints the auto-generated gateway token (for MC .env) Files: docker-compose-openclaw.yml — gateway + cli sidecar, /healthz, persistent .openclaw-data volume, host.docker.internal alias .env.openclaw.example — minimal env template (token + provider keys) examples/OPENCLAW-INTEGRATION.md — 9-step walkthrough: clone, build, onboard, register gateway in MC UI, verify dispatch path, rollback procedure .gitignore — ignore /openclaw-src/, /.openclaw-data/, /.env.openclaw, /.idea/, /.vscode/

…s, openclaw additive integration

Add openclaw npm package to the dev image so src/lib/command.ts:runOpenClaw finds a binary in PATH when MC is configured to dispatch through a sibling openclaw-gateway container. Also propagate OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_TOKEN, and OPENCLAW_ALLOW_INSECURE_PRIVATE_WS into the dev container so the CLI knows where the gateway lives. Status of the experiment (see docs/openclaw-experiment-notes.md to follow): - openclaw gateway daemon comes up via docker-compose-openclaw.yml - /healthz is reachable from MC, gateway-row in MC db transitions to status='online', isGatewayAvailable() returns true - HTTP /v1/chat/completions with Bearer-token auth WORKS — no pairing required, full operator scope. gpt-5-mini round-trip confirmed. - The CLI/WebSocket path used by runOpenClaw requires per-client device pairing approval (operator.admin scope), which is non-trivial in a cross-container topology. Next step is to add an HTTP fallback to task-dispatch.ts so MC routes via /v1/chat/completions when the gateway is online and a token is available, before falling back to the existing runOpenClaw CLI path or the direct-API/CLI path.

End-to-end success: MC dispatches LOGIN-001 through openclaw gateway with ZERO MC code changes. Path: MC task assigned → dispatchAssignedTasks → runOpenClaw → openclaw CLI → ws://host.docker.internal:18789 → gateway → openai/gpt-5-mini → 4322 chars resolution → MC tasks.resolution → status='review' Total wall-clock: ~70s. This commit consolidates the third-variant integration approach (shared docker bridge, separate compose files, persistent pair state): scripts/openclaw-auto-pair.py Patches openclaw's pending → paired pairing files transactionally on the host filesystem. Pairing tokens are documented as 32-byte base64url random secrets in openclaw-src/src/infra/pairing-token.ts (no crypto signing), so producing them client-side is safe and produces the same end-state as the official `openclaw devices approve` flow which is unavailable in our cross-container topology (operator.admin scope is not held by any auto-paired loopback CLI). Idempotent: if MC's deviceId is already in paired.json with a matching token in MC's device-auth.json, the script exits 0 without changes. DeviceId+publicKey match check ensures we only approve the specific pending request that matches MC's identity, not arbitrary pending entries. docker-compose-dev.yml Bind-mount ./.mc-openclaw/:/home/nextjs/.openclaw — gives MC's openclaw CLI a stable identity (private key, deviceId, paired token) that survives container recreate / dev-rebuild. Without this, every restart generates a new deviceId and re-pairing is needed. Makefile (openclaw-pair-mc target) One-shot wrapper: trigger pairing request from MC, run auto-pair.py, bind MC agent display names ("Architect (Claude Opus)") to openclaw agent ids declared in openclaw.json ("architect"), verify with a health call. Agent binding is config-only (writes agents.config JSON, no code change). Makefile (openclaw-unpair-mc target) Tear-down counterpart for CONFIRM=yes — clears MC-side identity files and removes MC's paired entry from the gateway side. Useful when experimenting or rotating identities. examples/OPENCLAW-INTEGRATION.md Adds Step 7.5 (Auto-pair) with explanation of why pairing is needed and how the script bypasses the standard interactive admin-approval flow safely. .gitignore Adds /.mc-openclaw/ to the openclaw-related local-only paths. What is NOT changed: - src/lib/task-dispatch.ts: untouched. The existing runOpenClaw path works as designed once a paired CLI is in place. - src/lib/command.ts: untouched. - src/lib/openclaw-gateway.ts: untouched. - Any MC API route, scheduler, or UI component: untouched. Verified test sequence (LOGIN-001 dispatch through gateway): $ make openclaw-up && make dev $ make openclaw-pair-mc # one-shot, idempotent $ # ...drag LOGIN-001 in UI or reset to status='assigned' $ # within 60s task transitions through in_progress to review $ # tasks.resolution contains the gpt-5-mini agent's response

…builds for upgrades Refactor both stacks so openclaw runs from a bind-mounted ./openclaw-src/ clone instead of a baked image. Updating openclaw is now `git pull` plus one builder run, no docker image rebuild required. Architecture before: - openclaw-gateway: built from openclaw-src via Dockerfile (baked dist + node_modules into image, ~3.5 GB image). Update = full image rebuild. - MC dev: `npm install -g openclaw` baked at image build time. Pinned to whatever was on npm at build time. Update = full image rebuild. - Result: MC frontend showed "update available v2026.4.27 (installed v2026.4.26)" because the gateway was rebuilt from a fresh clone but MC CLI was still on an older npm publish. Architecture after: - Stock node:24-bookworm image for both gateway and CLI sidecar — no custom build needed. - ./openclaw-src/ bind-mounted at /app inside both. The gateway runs `node /app/dist/index.js gateway`. dist + node_modules live on the host under openclaw-src/ (~3.1 GB, .gitignored). - New `openclaw-builder` compose service (profile=build) compiles dist + installs node_modules into the bind-mount via a one-shot container that has bun + pnpm. First run ~5 min, incremental rebuilds faster. - MC dev: dropped `npm install -g openclaw` from Dockerfile.dev. A shim at /usr/local/bin/openclaw runs `node /opt/openclaw-src/dist/index.js` where /opt/openclaw-src is a read-only bind-mount of the same host clone. Both gateway and MC CLI always run the exact same dist. - Both containers run as uid 1000:1000 so files written into bind-mounts are readable on the host without sudo. - Plugin runtime stage moved from a Docker named volume (root-owned by default, blocked uid 1000) into ./.openclaw-data/plugin-runtime-deps/ so it inherits the same uid 1000 ownership as the rest of state. Update workflow: $ make openclaw-update → git pulls openclaw-src to latest, runs openclaw-builder to recompile dist into the bind-mount, restarts the gateway. Takes ~30s-2min for incremental updates. No docker rebuild, no image churn. Verified after refactor: - MC CLI and gateway both report v2026.4.27 (the same dist). - LOGIN-001 dispatched end-to-end via openclaw gateway in 61s, outcome=success, resolution=3538 chars from openai/gpt-5-mini, then transitioned into quality_review (Aegis cycle picked it up). - make openclaw-pair-mc remained idempotent across the recreate (paired identity persisted in ./.mc-openclaw/).

Two operator-visible regressions after the openclaw integration: 1. WebSocket spam in /logs: "Handshake failed on root path. Retrying WebSocket via /gateway-ws." "Max reconnection attempts reached." The MC frontend reads gateway.host from the gateways table to build the WebSocket URL. Our row stored "host.docker.internal" (correct for the MC backend's HTTP probe) but a browser running on the host can't resolve that name — it's only injected into containers via Docker's `extra_hosts: host-gateway` mapping. The browser was hitting ws://host.docker.internal:18789, getting ENOTFOUND/refused, and exhausting the reconnect budget every few seconds. Fix: set NEXT_PUBLIC_GATEWAY_URL=ws://127.0.0.1:18789 in docker-compose-dev.yml. The /api/gateways/connect route already honours this env var as a browser-facing override (see src/app/api/gateways/connect/route.ts:156). MC backend continues to probe the gateway through host.docker.internal:18789 unchanged. 2. Orchestration → Command tab dropdown showing agents but not letting any be selected: <option ... disabled={!a.session_key}> (src/components/panels/orchestration-bar.tsx:275) Our agents were created in MC's setup wizard with session_key=null, and the Command UI disables non-null-session_key options. The /agents page list looked correct but every option was greyed out. Fix: make the openclaw-pair-mc target also write session_key="mc-<id>" for each MC agent it knows about (architect/aegis/dev/linter). The value lines up with the openclaw agent ids declared in openclaw.json, so the dropdown becomes selectable and downstream openclaw routing has a valid identifier to work with. Also: tighten the venv probe in the openclaw-pair-mc target so it falls back to system python3 cleanly when ./.venv is absent, instead of printing a stderr warning.

… shim MC's source uses an older openclaw CLI shape (`gateway sessions_send --session X --message Y`) for the wake-agent and agent-message endpoints. That shape is gone in openclaw 2026.4.x — `sessions_send` is no longer a gateway subcommand, only an internal RPC name. The current public RPC for sending into a session is `chat.send`, which additionally requires an `idempotencyKey` and a `deliver` flag. Symptom: clicking "Wake up" on a Linter (offline) agent in /agents returned `Failed to wake agent`. The MC frontend Orchestration → Command tab Send button would also fail silently with the same root cause. Fix without modifying MC code: route every `openclaw` invocation in the MC dev container through `scripts/openclaw-cli-shim.py`, which detects known retired shapes and rewrites them on the fly: legacy: gateway sessions_send --session X --message Y modern: gateway call chat.send --params '{ "sessionKey":"X","message":"Y", "idempotencyKey":"mc-shim-<pid>-<ts>","deliver":false }' --json Also handled: sessions_history (-> sessions.history with key) and sessions_list (-> sessions.list). The shim is bind-mounted from ./scripts/openclaw-cli-shim.py into the container at /usr/local/lib/openclaw-cli-shim.py via docker-compose-dev.yml, so edits to the rewriter don't need an image rebuild — same live-update ergonomics we now have for openclaw-src/dist itself. Verified: $ curl -X POST .../api/agents/Linter%20%28Local%20LLM%29/wake -d '{"message":"Wake up Linter"}' {"success":true,"session_key":"mc-linter","stdout":"{\"runId\":\"mc-shim-...\",\"status\":\"started\"}"} Pass-through of unknown shapes (`gateway call ...`, `agent ...`, `devices ...`, etc.) is unchanged — the shim just `os.execvp`s straight to the real `node /opt/openclaw-src/dist/index.js` for those.

…Send Two operator-visible issues seen on /agents and the Orchestration → Command tab after the openclaw integration: 1. /agents → Command → Send returned "Validation failed" for any message. Root cause is a pre-existing MC bug, not openclaw-related: the form in src/components/panels/orchestration-bar.tsx:94 sends { to, content: message, from } while POST /api/agents/message validates against { to, message, from } // src/lib/validation.ts:137 So zod rejects every payload as "Validation failed" before the request ever reaches the agent. Strictly the smallest possible MC code edit: rename the request field `content` -> `message` to match the API contract. The Wake button on /agents already worked because it goes through a separate endpoint that reads `body.message` itself. 2. /logs continued to spam: "WebSocket error occurred" "Max reconnection attempts reached. Please reconnect manually." even after fixing NEXT_PUBLIC_GATEWAY_URL. Gateway logs show why: [ws] closed before connect ... peer=192.168.48.1 remote=192.168.48.1 The browser opens a WS to ws://127.0.0.1:18789, but from the gateway's side that connection arrives from the docker bridge IP, NOT loopback, so openclaw treats the browser as an unpaired device and aborts the handshake with code 1006. There's no clean way to pair every browser session non-interactively. Fix: set NEXT_PUBLIC_GATEWAY_OPTIONAL=true in docker-compose-dev.yml. The MC websocket client already has special-case behaviour for that flag (src/lib/websocket.ts:771): it gives up reconnecting silently and falls back to HTTP polling for live updates. Backend dispatch through openclaw is unaffected — the gateway is still the dispatch path; only the browser's live event stream is off.

Add a local-only auto-approval worker for pending Control UI device requests so Connect succeeds after restart and new request IDs. Tune security scan messaging for Docker localhost topology and HTTPS-only flags to reduce false local warnings.

…otstrap

Make up/restart/down/status now run against MC_MODE and include OpenClaw when OPENCLAW_ENABLED=1, with compatibility aliases preserved. Update env examples, deployment docs, and integration plan notes to document the minimal operator workflow.

Project Telegram dmPolicy/allowlists/owner allowlists from .env/.env.openclaw into gateway and MC CLI state idempotently, while preserving legacy TELEGRAM_NUMERIC_USER_ID behavior. Add an env toggle to hide non-actionable doctor security info lines without masking real warnings/errors.

Drop MC_OPENCLAW_DOCTOR_HIDE_INFO plumbing so Mission Control always returns full OpenClaw doctor output, including informational security lines.

…ompose Two compose bugs blocking sandbox skill execution: 1. Path-equivalence for state dir. Gateway runs in a container but asks the host docker daemon to bind-mount ${state}/sandboxes/agent-X to /workspace in each sandbox. With ./.openclaw-data:/home/node/.openclaw, the source path the gateway passed was /home/node/... — nonexistent on the host, so docker silently mounted an empty dir and skills / AGENTS.md were invisible inside /workspace. Now state is also mounted at the same absolute path the host has; legacy /home/node/.openclaw alias preserved. 2. Drop ${VAR:-} declarations for TELEGRAM_*, OPENCLAW_GATEWAY_TOKEN, OPENCLAW_TOOLS_PROFILE in environment blocks. Compose merges environment ON TOP OF env_file, so empty fallbacks were blanking real values coming from .env.openclaw whenever the top-level .env (which compose interpolation reads) didn't define them.

The proxy is a separate, independent microservice — its source lives in its own repo (github.com/nnnet/gpu-coordinator-proxy) and is checked out here as ./gpu-coordinator-proxy-src/, mirroring the openclaw-src/ pattern we already use. Both clones are gitignored; only the compose entry that references them lives in this tree. The service fronts LMStudio (:1235) and Ollama (:11435) on separate ports with a shared VRAM lock so a 20B-class model in one runtime doesn't collide with a 20B in the other on a single GPU. Behaviour is fully configurable via env (defaults are safest): GPU_AUTO_FREE_ENABLED default 1 (master switch) GPU_FREE_STRATEGY default spare-target (keep target warm) alternative wipe-all (cold reload, gated) GPU_WIPE_ALL_ALLOWED default 0 (safety gate for wipe-all) GPU_FREE_SETTLE_MS default 800 (driver reclaim pause) If the sibling clone is missing the service block can be commented out without affecting anything else in the MC stack.

…anner Mission Control's `mission-control-dev` container couldn't reach the host docker daemon, so the in-MC `openclaw doctor` (called by src/app/api/openclaw/doctor/route.ts via runOpenClaw) failed its isDockerAvailable() check and emitted "Sandbox mode is enabled but Docker is not available" — surfacing as a permanent warning banner in the UI even though docker on the host was healthy and the gateway sandbox flow worked. Container-side docker access: - Add docker.io to Dockerfile.dev so `docker` is on PATH inside the dev container; bind /var/run/docker.sock in docker-compose-dev.yml. - Make uid 1000 reach the socket via `group_add` driven by a new DOCKER_SOCKET_GID env, auto-detected by the Makefile via `stat -c %g /var/run/docker.sock` (falls back to 994 for stock Debian/Ubuntu hosts; manually overridable on Fedora/Arch/colima/Rancher Desktop where the gid differs). Mask host's stale ~/.openclaw from the container view: - openclaw doctor's findOtherStateDirs() scans /home/*/.openclaw and trips "Multiple state directories detected" whenever the dev container's broad ${HOME}:${HOME}:rw bind exposes the host user's pre-fork .openclaw dir. - Bind-mount an empty stub file (.docker-mask/openclaw-stub.empty) over ${HOME}/.openclaw so existsDir() returns false (the path is now a regular file, not a directory). Tmpfs is unsuitable here — it would still appear as a directory. Doctor-banner suppression in the openclaw CLI shim: - openclaw doctor unconditionally prints `Run "openclaw doctor --fix" to apply changes.` whenever shouldRepair is false, regardless of whether there is anything actually fixable (see openclaw-src flows/doctor-health-contributions.ts:580-582). The word "fix" trips MC's parseOpenClawDoctorOutput mentionsWarnings regex. - The Plugins panel always prints "Errors: 0" — the substring "error" trips MC's level=error escalation regex. - Both treated as upstream quirks. The shim (which already exists for legacy CLI compat) now intercepts plain `doctor` invocations, drops the footer line, and rewrites "Errors: 0" → "Errs: 0" only when the count is zero. Real plugin error counts pass through untouched so a genuine banner still surfaces if something breaks. Brew-enabled gateway + sandbox images (carried over from prior pending work, finally committed): - Dockerfile.openclaw.dockercli: gateway image now ships docker CLI + Linuxbrew so skills.install RPC can `brew install <formula>` for skills declaring brew deps. - Dockerfile.openclaw.sandbox: overlays brew on the upstream openclaw-sandbox:bookworm-slim base (kept separate from openclaw-src to preserve read-only update flow via `make openclaw-update`). Makefile UX simplification (589 lines → 124): - Replace the legacy multi-mode lifecycle wrapper with a minimal docker compose driver: `make`, `make build [SVC...]`, `make up [SVC...]`, `make down`, `make logs`, `make ps`, `make clean`. Positional service args via $(filter-out $(firstword $(MAKECMDGOALS)),$(MAKECMDGOALS)). - MODE=dev|prod toggles which compose file pair is used. - The pre-rewrite Makefile is preserved as Makefile.legacy for anyone who still relies on the older recipe names. Drop tracked .beads/dolt-monitor.pid.lock — it's a runtime lock file and is already covered by .gitignore.

The deliver_notifications() and get_delivery_stats() helpers in scripts/notification-daemon.sh issue HTTP requests against /api/notifications/deliver and /api/notifications without any Authorization header. Both endpoints use requireRole(request, 'operator') (src/lib/auth.ts), which accepts the global API key via the `Authorization: Bearer ...` or `x-api-key` header. Without it every poll silently returns 401 and the daemon reports "delivery failed" without explaining why. Add an MC_API_KEY env var (with API_KEY as a fallback so a single .env value works for both MC and this daemon), validate it before the run/stats path, and pass it as a Bearer header on both the POST deliver call and the GET stats call. Also fix the help text default URL (3005 -> 3000) to match the script's actual MISSION_CONTROL_URL default. Behavior change: the daemon now prints an actionable error and exits 1 if MC_API_KEY is not set, instead of running batches that silently 401. No new dependencies; no other behavior changes.

0xbrainkid

Thanks for fixing the notification daemon auth path; that specific shell change looks directionally right, and the targeted doctor test passes: pnpm vitest run src/lib/__tests__/openclaw-doctor.test.ts.

Blocking before merge: this PR contains a large amount of unrelated, machine-specific runtime/config material that should not be committed as part of a notification-daemon auth fix. In particular .opencode/opencode.json enables many third-party plugins and includes local absolute paths such as /home/uadmin/.local/bin/codebase-memory-mcp and /mnt/9/gt/.../node_modules/@playwright/mcp/cli.js. Committing that repo-level config will be broken for other contributors and may cause nondeterministic external plugin execution for anyone using opencode in this repo.

Please reduce this PR to the auth fix and directly related docs/tests, or move the opencode/beads/docker/demo additions into separate intentionally reviewed PRs with portable, opt-in config. Once the unrelated machine-local config is removed, I can re-review the notification-daemon change.

Note: full typecheck could not be used as a clean gate in this checkout because the worktree dependency set is missing unrelated deps/types (next-intl, xterm, node-pty, radix/cva), causing broad failures outside this PR.

0xNyk · 2026-05-07T04:21:54Z

Thanks — the scripts/notification-daemon.sh auth fix itself is exactly right (read MC_API_KEY / API_KEY from env, validate up front, pass as Authorization: Bearer, sync the help-text default URL). I'd merge that as a single-file PR tomorrow.

But the title says "authenticate notification daemon requests" and the diff has 57 files / 6,450 added lines covering:

.beads/ directory (51 lines, hooks, README, config)
.opencode/ directory (237-line opencode config + READMEs)
.docker-mask/, .dockerignore, .env.example, .env.openclaw.example, .gitignore
.vibe/development-plan-experiment-openclaw-integration.md (555 lines)
AGENTS.md (150 lines), Makefile (124 lines), Makefile.legacy (589 lines)
4 Dockerfiles, 3 docker-compose files, nginx config
docs/deployment.md (215 lines), telegram-onboarding doc, ops-cheatsheet.md
examples/MULTI-PROVIDER-DEMO.md (747 lines), examples/OPENCLAW-INTEGRATION.md (238 lines)
scripts/openclaw-auto-approve-control-ui.mjs, scripts/openclaw-auto-pair.py, scripts/openclaw-cli-shim.py
src/app/api/agents/[id]/route.ts, src/app/api/sessions/continue/route.ts (+222), src/app/api/sessions/route.ts, src/app/api/settings/route.ts, src/app/api/spawn/route.ts
src/components/chat/chat-workspace.tsx, src/components/panels/agent-detail-tabs.tsx, src/components/panels/orchestration-bar.tsx
src/lib/claude-sessions.ts, src/lib/openclaw-doctor.ts, src/lib/security-scan.ts, src/lib/task-dispatch.ts (+280)
src/proxy.ts

This PR is structurally three or four other PRs stacked together (and most of those overlap with #647, #648, #649 which I just reviewed/merged separately).

Could you split this into:

fix(scripts): authenticate notification daemon requests — just scripts/notification-daemon.sh and any minimal supporting changes (.env.example if you want to add MC_API_KEY). I'd merge that immediately.
The .beads/ / .opencode/ / .vibe/ / AGENTS.md additions — these look like personal/local tooling configs that probably shouldn't ship at all (and are already in your other compose PRs' .gitignore).
Whatever's left over after merging fix(chat,csp): nonce hydration + chat session continuity with host Claude CLI #647/feat(dispatch): direct multi-provider dispatch (Anthropic / OpenAI / local OpenAI-compatible) #648/feat(openclaw): additive Docker integration with env-driven hardening + doctor cleanup #649 — please rebase first to see what actually remains as new work.

Closing pending the focused split.

nnnet added 30 commits April 29, 2026 21:31

Merge pull request #1 from nnnet/feat/docker-stack

df0c255

Self-contained Docker stack + /chat fixes for shared host claude sessions

Merge feat/multi-provider-dispatch: dev compose, direct dispatch fixe…

b6c4921

…s, openclaw additive integration

bd init: initialize beads issue tracking

9c85264

mcp in opencode

a2fcae7

Fix prod Mission Control OpenClaw linkage and gateway startup warnings

cc2aa74

fix: harden prod OpenClaw linkage and expose dedicated control UI

6b1708c

fix: route OpenClaw control UI through WS-safe proxy

fd585c1

fix(openclaw): bootstrap control UI localhost origin allowlist

6213efe

Fix local OpenClaw Control UI pairing flow

119ec22

Add a local-only auto-approval worker for pending Control UI device requests so Connect succeeds after restart and new request IDs. Tune security scan messaging for Docker localhost topology and HTTPS-only flags to reduce false local warnings.

fix(openclaw): stabilize control-ui pairing and telegram allowlist bo…

72d14cf

…otstrap

refactor make/openclaw startup to env-driven runtime config

c470da4

chore(make): normalize prod/dev lifecycle and upgrade workflows

31dedd8

nnnet added 21 commits May 4, 2026 11:23

docs: add daily ops cheatsheet for make workflows

be6ddb0

chore(make): unify env-driven update/rebuild/upgrade workflows

2f0385b

feat(make): redesign universal Make UX with scope and mode flags

069cd5d

fix(make): prevent default restart from OpenClaw cold-start hangs

a3b148d

fix(make): make restart deterministic across scopes

ec2f7b7

fix(make): standardize lifecycle grammar to positional mode tokens

340ba78

fix(openclaw): project telegram owner policy in MC shim state

dc2e381

revert(openclaw): remove doctor info suppression toggle

5f996f1

Drop MC_OPENCLAW_DOCTOR_HIDE_INFO plumbing so Mission Control always returns full OpenClaw doctor output, including informational security lines.

fix(openclaw): enforce env-driven hardening defaults

7a1e819

Make OpenClaw tool posture env-driven

f1c6aed

chore(openclaw): prebuild dockercli image and run gateway as node

d0ddc19

chore: allow auto_backup default via env

b8f7dfe

chore: remove unused @playwright/mcp

573f92e

chore(openclaw): harden gateway and add lmstudio/openai models

2434bad

chore(openclaw): harden gateway and add lmstudio/openai models

b2a75fe

nnnet requested a review from 0xNyk as a code owner May 6, 2026 08:32

0xbrainkid suggested changes May 7, 2026

View reviewed changes

0xNyk closed this May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scripts): authenticate notification daemon requests#653

fix(scripts): authenticate notification daemon requests#653
nnnet wants to merge 51 commits into
builderz-labs:mainfrom
nnnet:fix/notification-daemon-auth

nnnet commented May 6, 2026

Uh oh!

0xbrainkid left a comment

Uh oh!

0xNyk commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nnnet commented May 6, 2026

Summary

Fix

Test plan

Uh oh!

0xbrainkid left a comment

Choose a reason for hiding this comment

Uh oh!

0xNyk commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants