Diagnose with AI for Radar OSS (local, keyless, BYO-agent) by nadaverell · Pull Request #947 · skyhook-io/radar

nadaverell · 2026-06-19T06:02:58Z

Diagnose with AI — local, keyless, BYO-agent investigations for Radar OSS

Radar can see what's broken (the Issues queue, resource health, events). This adds the why and the fix: an AI SRE assistant that investigates a Kubernetes resource, explains the root cause in plain language, proposes remediation, and — on explicit confirmation — applies it. It runs entirely on the operator's machine against their own agent CLI (Claude Code, Codex, or Cursor) and Radar's in-process MCP: no Radar cloud, no API key, no account. Standalone Radar stays agent-free unless a supported CLI is already installed.

The presentational components are built to lift into @skyhook-io/k8s-ui so Radar Hub can reuse them for a managed-key, audited cloud variant — this PR is the OSS half plus the embedder slot the hub fills.

Investigation engine (`internal/ai/`, `internal/server/ai_diagnose.go`)

Agent discovery; headless streaming against MCP into a structured Diagnosis; process-group lifecycle so no agent child outlives a run.
Health-aware framing. The server captures the resource's live operational issue signal and, separately, its static audit posture at run start and threads it into the kickoff prompt — so the agent confirms-or-finds instead of being told it's "unhealthy" and manufacturing a problem.
Honest verdicts. The result is one of: a structured root cause + remediation; a green all-clear ("no problems found" — what the agent can actually assert); or a first-class inconclusive state ("investigated but couldn't determine — RBAC-denied, missing data, ambiguous"). The prompt separates verified healthy from couldn't tell ("absence of evidence is not health"), and the parser normalizes verdict precedence (finding › inconclusive › healthy) so the result is never self-contradictory. The recommended remediation step carries a one-clause reason (why it's the safe pick). Confidence is a coarse band (Low/Med/High), shown as discrete pips — deliberately not a fake percentage.
Durable, server-side runs (runs.go): investigations are jobs owned by the server, not the browser tab — they survive panel close / navigation / refresh. SSE replay via monotonic seq + Last-Event-ID; soft cap of 3 concurrent; focus-existing-run on duplicate; context-switch cancels + marks stale. (In-memory today — cleared on restart; the UI says so.)

Multi-agent + isolation

An Agent interface abstracts the CLI; Claude Code, Codex, and Cursor backends implement it. The server builds the set from every supported CLI on PATH.
Isolation (Codex): Isolated (default — --ignore-user-config, empty cwd, minimal env) vs My setup (the user's full config + their own MCP servers + local file reads — a deliberate trusted mode, surfaced with an explicit warning). Both keep the shell read-only, and Codex's built-in web_search is disabled — it reaches the public internet server-side (outside --sandbox), which would break the "reads only this cluster" guarantee. Optional model + reasoning effort (low/medium/high).
Cursor: drives cursor-agent headlessly with --sandbox enabled and a per-run workspace pointed at Radar's read-only MCP mount. Cursor has no flag to suppress the user's global MCP servers, so it cannot be made hermetically read-only like Claude/Codex — this is disclosed honestly in a Cursor-specific consent card (separate from the Claude/Codex consent, so prior approval can't bypass it). Free-form model override; no isolation toggle (cursor has none). Its --resume is workspace-scoped, so each run gets a stable scratch dir under a private per-process root.

Security: read-only mount + hardened apply

Investigations use a read-only MCP mount (write tools are not even discoverable; a regression test fails if one leaks). The user-confirmed apply runs in a fresh write-enabled session bound to the confirmed fix — so untrusted cluster data read during investigation can't steer the write. Apply without a confirmed fix is rejected.
POST endpoints are loopback-origin-checked; the feature is gated to no-auth standalone Radar.

The investigation experience (`web/src/components/diagnose/`)

Built to feel like a sharp SRE working beside you, not a spinner that dumps text:

Live transcript — reasoning reveals line-by-line, tool calls fade in with a live verb ("Reading logs…"), payloads render readably inline with a syntax-highlighted dialog. An elapsed timer + soft stall hint keep a long run legible (and a genuine hang distinguishable from progress).
Staged verdict — the result unfolds in beats (formulating root cause → weighing remediation) rather than dumping; the all-clear / inconclusive paths get their own calm treatment. Every verdict (root cause / remediation / healthy / inconclusive) animates on reveal with a semantic-colored border ring + glow. Sticky-bottom scroll follows while streaming, detaches on scroll-up, with jump-to-latest.
Comprehension aids — an "Explain simply" chip on the verdict fires a plain-language follow-up; the recommended step shows why it's recommended.
Acting, safely — one-click Apply opens a confirm dialog that previews the change, shows confidence, and gates GitOps/Helm-managed resources behind an explicit acknowledgment (a direct change is reverted on the next sync — the durable fix is a PR to the Git source, tracked in SKY-1075). After an apply, Radar auto-re-checks health to verify the outcome rather than asserting success. Follow-up composer for multi-turn; a hand-off that copies a command to resume the real agent session in the user's own CLI.
Adaptive entry point — in the resource detail header chrome (prominent "Diagnose" on a problem, quiet icon when fine), per-row on the Issues queue, and a top-bar entry. While an investigation is live, the resource's button shows an "Investigating…" spinner (clicking focuses the running run) and the top-bar entry shows a running count — runs poll while any is in flight, so the indicator is accurate even with the panel closed.
Resilient to lost runs — runs are in-memory, so an evicted (retention cap) or post-restart run 404s on its stream; the view renders a clear "no longer available — re-run Diagnose" state instead of a silent blank.

Architecture: DiagnoseProvider (always-mounted controller; layout state lives in a separate memoized context so run-polling doesn't re-render the app shell). The panel is an absolute slot in the app's body frame — the column under the header, right of the nav rail — so it docks below the global navbar and pushes only the content (navbar + rail stay static), sharing one right-inset contract with the resource/Helm drawers so they sit beside it rather than under it. Expand turns it into a focused, decision-first workspace: the transcript becomes scrollable evidence on the left while the latest structured verdict — root cause + remediation + Apply — pins to a right rail that stays in view as you read; the recent-investigations list is the always-present master pane on the left. Docked stays a sidecar; it overlays instead of crushing the app on small screens. InvestigationView is a pure subscriber that reconstructs turns from the event stream; presentational parts stay pure for the k8s-ui lift.

Trust framing + config

The first-run consent card leads with the BYO thesis — your own agent, on your machine, nothing to Radar — and is honest that data goes to the agent's model provider under the user's account. A documented seam marks where Radar Cloud overrides this copy for its (different, honest) cloud trust story.
Agent · isolation · model · effort live in a Settings "AI Diagnosis" section (staged, committed on Save). The panel header shows the active config with a gear into Settings.

Testing

Go unit tests pin: SSE replay/eviction, the begin-turn concurrency cap race, read-only mount tool exclusion, verdict precedence (no contradictory healthy+root-cause), recommended_index + recommended_reason parsing, apply-prompt binding, both agents' stream parsers, isolation flag construction, agent-name routing, focus-existing-run key, and per-resource managed-by/health detection.
Verified live against kind clusters: both agents investigate broken resources and confirm health without confabulating on healthy ones; durable runs survive panel close; the read-only mount holds; the adaptive entry, staged reveal, elapsed timer, and verified-apply behave; the header reflows under the docked panel without overlap.

Not included / follow-ups

SCM-connect → suggest a PR instead of a direct apply on managed resources (the durable fix; the acknowledgment gate ships as the interim) — Linear SKY-1075.
Persisting runs to disk (in-memory today) and the cloud (managed-key) variant via the k8s-ui lift, incl. the embedder-overridable consent/trust copy.
Apply is prompt-bound, not server-side per-resource tool-scoped — the main residual hardening before promoting past local standalone.

Note

High Risk
Spawns local agent processes with cluster access via MCP; apply turns enable write tools (mitigated by read-only mount, fresh-session apply, and fix binding). Context-switch teardown is critical to avoid cross-cluster writes.

Overview
Adds Diagnose with AI for standalone OSS Radar: investigations run on the user's machine via their own agent CLI (Claude Code, Codex, Cursor) against Radar's localhost MCP—no cloud or API key. Gated to no-auth standalone when /mcp is enabled and a supported CLI is on PATH.

Introduces internal/ai with an Agent interface, per-backend spawn/stream parsing, health-aware prompts, structured verdict parsing (healthy / inconclusive / root cause), and a RunManager for server-side jobs that survive tab close, with SSE replay, concurrency caps, and cancel-on-cluster-switch (via new OnBeforeContextSwitch).

Security: read-only investigations hit /mcp-readonly (write tools not registered; tested); apply uses a fresh write session bound to user-confirmed fix text. Loopback origin checks on mutating diagnose POSTs.

HTTP: /api/agents, /api/diagnose/runs/* (start, list, turn, stop, stream). Server attaches GitOps/Helm managed-by and issue/audit health at run start for prompts and Apply warnings.

UI: diagnose panel as a docked body-frame slot (contentGutter shared with resource drawers), entry points in header/Issues queue, Settings AI section, and DiagnoseCustomization embed slot in RadarApp for Hub.

^{Reviewed by Cursor Bugbot for commit 697a890. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor · 2026-06-24T07:47:38Z

+                    finalize();
+                  }
+                }, STEP),
+              );


Busy state outlives agent

Medium Severity

When a structured diagnosis uses the staged verdict animation, setBusy(false) runs only inside delayed finalize() callbacks (up to ~4s). The server turn is already finished, but the UI still shows Stop, blocks the follow-up composer, and treats the run as in-flight until the timers fire.

^{Reviewed by Cursor Bugbot for commit 1ee9e8d. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 6 total unresolved issues (including 5 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 72e8f08. Configure here.}

cursor · 2026-06-24T19:45:11Z

+		}
+		cleanup = func() { _ = os.RemoveAll(dir) }
+		cmd.Dir = dir
+		cmd.Env = codexEnv()


Codex isolated breaks session resume

Medium Severity

In isolated Codex mode each turn uses a fresh temp working directory and ignores the per-run workdir from RunManager. Follow-up and apply turns resume the CLI session from a different cwd than the first turn, which can break multi-turn investigations under the default isolated setting.

^{Reviewed by Cursor Bugbot for commit 72e8f08. Configure here.}

Adds an AI SRE assistant that investigates a Kubernetes resource, explains the root cause, proposes remediation, and — on explicit confirmation — applies it. Runs entirely on the operator's machine against their OWN agent CLI (Claude Code or Codex) and Radar's in-process MCP: no Radar cloud, no API key, no account. Standalone Radar stays agent-free unless a supported CLI is already installed. Presentational parts are built to lift into @skyhook-io/k8s-ui so Radar Hub can reuse them for a managed-key cloud variant. Engine (internal/ai, internal/server/ai_diagnose.go): agent discovery; headless streaming against MCP into a structured Diagnosis (root cause, remediation, recommended_index, confidence); durable server-side runs (SSE replay, concurrency cap, focus-existing, context-switch cancel/stale). Health-aware framing — the server captures the resource's operational issue signal and (separately) audit posture at start and threads it into the prompt, so the agent confirms-or-finds instead of manufacturing a problem on a healthy resource; a healthy verdict renders a green all-clear card. Multi-agent + isolation: Agent interface with Claude Code and Codex backends; Codex Isolated vs My-setup modes (shell always read-only); optional model + Codex effort. Security: investigations use a read-only MCP mount (write tools not discoverable); the user-confirmed apply runs in a fresh write-enabled session bound to the confirmed fix; loopback origin checks; gated to no-auth standalone Radar. Experience (web/src/components/diagnose): live-streaming transcript (line-by-line reasoning reveal, tool-call fade-in, readable payloads + rich dialog), staged verdict reveal (root cause -> remediation) with sticky-bottom scroll; adaptive entry point in the resource detail header chrome (prominent Diagnose on a problem, quiet icon when fine), per-row on Issues, top-bar entry; config in a Settings "AI Diagnosis" section; hand-off that copies a command to resume the real agent session. Header uses container queries so it reflows when the docked panel pushes the app narrower.

High-confidence fixes from the product review, no behavior/scope changes that need a design call: Correctness: - Wire the run stream's onClosed: clear the spinner and mark an open turn terminal when a run is evicted/gone, instead of letting it shimmer forever. - Claude hand-off command now prefixes `cd ~` so the cwd-scoped --resume resolves wherever the user pastes it (Codex is cwd-independent, unchanged). - Never render a blank done turn: fall back to the agent's narration, or an explicit "finished without a clear result" note (gate mirrors ResultCard's branches, including follow-up vs structured). Honesty / trust: - Confidence shown as three discrete pips (tier of 3), not a continuous bar that faked a precise percentage; "Confidence not stated" when the model omits it. - All-clear card reads "No problems found" (what the agent can actually assert), not "Healthy"; trust disclaimers no longer truncate. - ConsentCard: headline is read-only only in Isolated mode; adds that the agent sends data to its own model provider under the user's account, not to Radar. - GitOps "managed by" warning drops the "reconcile" jargon for plain language. Legibility: - Running status shows elapsed seconds + a soft "still working" stall hint after ~30s with no new activity, so a long run reads as progress and a hang is legible. - Stale banner gets a "Re-run on current cluster" button (was instruction-only). - Recent-investigations list distinguishes Stopped (neutral) from Failed (red) and labels terminal states in text, not just a colored dot. Polish: - Isolated marked "(recommended)"; effort options de-duplicated ("Balanced" was on two); top-bar tooltip names the agent + "runs locally"; panel uses shadow-drawer; the diagnose slot health prop is typed, not stringly-typed.

…ust UX Acting on the product-review decisions (the items signed off as clear): Honesty: - First-class "inconclusive" verdict — the agent can now say "I investigated but couldn't determine this" (RBAC-denied, missing data, ambiguous) instead of defaulting to a false all-clear. New schema field + prompt retune that separates "verified healthy" from "couldn't tell"; parser normalizes verdict precedence (finding > inconclusive > healthy) so the result can't be self-contradictory; a distinct neutral InconclusiveCard renders it. Test pins the precedence. - recommended_reason: the agent explains WHY the recommended step is the safe pick (reversible / lowest blast radius); shown under the Recommended label. - "Explain simply" chip on the verdict — one-tap plain-language follow-up (reuses the follow-up machinery) so glossing is user-controllable, not prompt-only. Apply safety: - Verify the write: after an apply, automatically re-check health as a follow-up (replay-safe via a ref armed only on a live user apply) rather than asserting success blindly. Robust substitute for parsing write-tool results. - GitOps/Helm-managed resources require an explicit acknowledgment ("I understand {controller} may revert this") before Apply enables; copy hedges on auto-sync (we detect the owner, not whether sync is on). TODO left for the real fix: connect the user's SCM and open a PR instead of a direct change. Trust framing: - ConsentCard leads with the BYO thesis (your own agent, on your machine, nothing to Radar — and that data goes to the model provider under your account); the top-bar/entry tooltips already surface it on hover. Documented seam for Radar Cloud to override this copy (different, honest trust story). - "My setup" (non-isolated Codex) now shows an amber callout — it's a capability cliff (your MCP servers + local file reads), not a peer toggle. Layout / honesty: - Panel push is now panel-width-aware: reflow only while the app keeps a usable floor left of the (resizable) panel, else overlay — instead of a static viewport cutoff that ignored panel width and could crush the app to near-zero. - Recent-investigations copy no longer over-promises durability (runs are in-memory; kept until Radar restarts).

… tokens - Apply confirm + per-step Apply use Wrench (was the whimsical Wand2) — gravity on a cluster-mutating action; ties to the Remediation section's icon. - Panel resize handle: faint resting tint + cursor-ew-resize (consistent with the workload drawer) + a touch wider, so it's discoverable without being heavy. - Settings footer hint is now accurate per change type: "Applies to new investigations" for AI-only edits vs "Applies on next launch" for host config. - Recent-investigations status dots use the design-system StatusDot/tone instead of hand-rolled bg-*-400 (DESIGN.md): done=healthy, error=unhealthy, stale=degraded, stopped=neutral.

…ative-title)

Adds cursor-agent as a third BYO agent backend alongside Claude Code and Codex. Drives the user's installed cursor-agent CLI headlessly against Radar's MCP, with the same read-only-investigation / confirmed-apply model. Containment: investigation turns point at the read-only MCP mount (strips all write tools); --sandbox enabled blocks Cursor's own shell. Cursor can't suppress the user's global MCP servers, so it gets its own consent disclosure (separate from Claude/Codex consent) rather than the isolated/my-setup framing. Cursor's --resume is workspace-scoped, so each run gets a stable scratch dir under a private per-process root (collision-free across restarts/processes), cleaned on eviction, context-switch, and shutdown.

Panel docks below the top navbar (matching maximized) and reflows only the content region — the navbar and nav rail stay static. Provider no longer pushes layout; it exposes contentGutter, applied to the connected content. Dark-mode drawer shadow lightened with a hairline edge.

The AI panel was a document-level fixed overlay that DOM-measured the chrome (querySelector header/nav) and pushed the whole app. Move it into the column (the body frame, under the header / right of the rail) as an absolute slot, sharing that frame and its conventions with the resource/Helm drawers: - Panel is absolute in the column (top: header height); maximized fills the frame. No more querySelector chrome measurement. - Drawers gain an optional rightInset (= docked panel width) so they sit beside the panel, not under it. Shared right-side-surface contract; optional prop keeps @skyhook-io/k8s-ui backward-compatible. - All body states reflow in one body-frame wrapper (consistent push across connected/auth/error states, not just connected). - Layout state split into its own memoized context (useDiagnoseLayout) so the panel's 4s run-poll no longer re-renders the whole app shell.

…ator + verdict reveal Codex backend: - Disable the built-in web_search tool — it reaches the public internet server-side (outside --sandbox), breaking the contained read-only story. - Clamp 'minimal' reasoning effort to 'low': codex's always-on image_gen tool is rejected by the API at minimal (400), and can't be disabled. Removed 'Minimal' from the effort picker. Run lifecycle: - A run that's been evicted (retention cap) or lost on restart returns 404 on its event stream; EventSource was retrying it silently → blank panel. Surface the permanent failure and render a 'no longer available — re-run' state. UI polish: - Per-resource Diagnose button shows a live 'Investigating…' spinner while an investigation is running for that resource (clicking focuses the live run); global ✦ button gets a running count + pulse. Runs now poll while any run is live (not only while the panel is open) so the indicator stays accurate closed. - Verdict reveal now animates on ALL outcomes (root cause / remediation / healthy / inconclusive), each tinted to its semantic color, with a colored border ring and an uneven, longer breathing glow (was a single accent pulse on root-cause only).

Expanding an investigation used to just add margins — a max-w-3xl transcript stranded in empty gutters with a fixed w-72 recent-list rail. Make maximized a real workspace: - Two panes: the transcript (reasoning + tool calls) scrolls as evidence on the left; the latest structured verdict (root cause + remediation + Apply) pins to a rail on the right so the conclusion and action stay in view while you read the evidence. The pinned turn's card is suppressed inline (TurnView hideVerdict) so it isn't double-rendered; follow-up answers + apply outcomes stay inline. - Recent-investigations history is collapsible (PanelLeft toggle), not a permanent rail — shown by default only when there's no active run (the list is the content), otherwise off so the investigation gets the room. - ResultCard is exported for reuse in the rail. Docked mode is unchanged (sidecar); drawer-launch still opens docked, workspace only on Expand. Degrades cleanly: no structured verdict (e.g. an API error) → no rail, just the answer.

…ed run Reopening a completed investigation replayed the 'formulating root cause → weighing remediation' beats (a ~4s fake delay) before showing a verdict that was already known. The guard meant to detect replay (revealBufRef empty) was already flushed by the time the done event arrived, so it always read as live. Detect replay reliably from the run's status at subscribe time: if it wasn't 'running' when opened, we're rebuilding history — show every verdict immediately. The choreography now only plays for a verdict watched landing live.

In full-wide there's room for it — it's the master pane of the master-detail layout. Drops the collapsible/toggle behavior (and its header button) so the recent-investigations list is always visible when maximized.

… paths, dead code - Add the /mcp-readonly/.well-known/* 404 shim (the /mcp mount already had one): without it Claude Code's OAuth probe gets a 405, flips the server to needs-auth, and injects synthetic authenticate tools into every read-only investigation turn. - Forgive a nonzero agent exit only when a STRUCTURED verdict parsed (root cause / remediation / healthy / inconclusive) — free-text alone means the CLI died mid-stream and now surfaces as an error instead of a calm "done". - Cap a turn's wall-clock time (default 15m, RADAR_AI_TURN_TIMEOUT) so a wedged CLI can't hold one of the 3 concurrency slots forever; timeout renders as a friendly error, not "context deadline exceeded". - Make Run.finalize idempotent (context-switch + later eviction double-called it, appending a second "closed" sentinel to the replay log). - Disclose in the isolated-Codex consent copy that its sandboxed shell can still read local files (write/network stay blocked) — the My-setup copy already said so, the isolated copy overclaimed. - Guard rapid Diagnose clicks across two resources with a start sequence token so a late createRun response can't steal focus from the newer run. - Remove the dead "token" stream-event chain end-to-end (no backend ever emitted it; narration flows via "thinking"), the inert --include-partial-messages Claude flag, the unused verdict-glow CSS, and needless exports in parts.tsx.

nadaverell requested a review from hisco as a code owner June 19, 2026 06:02

nadaverell changed the title ~~Resource-action slot for embedders (renderDiagnoseAction)~~ Diagnose with AI for Radar OSS (local, keyless, BYO-agent) Jun 21, 2026