Diagnose with AI for Radar OSS (local, keyless, BYO-agent)#947
Diagnose with AI for Radar OSS (local, keyless, BYO-agent)#947nadaverell wants to merge 15 commits into
Conversation
ad1ab13 to
b8749b3
Compare
ba696a6 to
b224c09
Compare
46d4309 to
1ee9e8d
Compare
| finalize(); | ||
| } | ||
| }, STEP), | ||
| ); |
There was a problem hiding this comment.
Busy state outlives agent
Medium Severity
When a structured diagnosis uses the staged verdict animation, setBusy(false) runs only inside delayed finalize() callbacks (up to ~4s). The server turn is already finished, but the UI still shows Stop, blocks the follow-up composer, and treats the run as in-flight until the timers fire.
Reviewed by Cursor Bugbot for commit 1ee9e8d. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 6 total unresolved issues (including 5 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 72e8f08. Configure here.
| } | ||
| cleanup = func() { _ = os.RemoveAll(dir) } | ||
| cmd.Dir = dir | ||
| cmd.Env = codexEnv() |
There was a problem hiding this comment.
Codex isolated breaks session resume
Medium Severity
In isolated Codex mode each turn uses a fresh temp working directory and ignores the per-run workdir from RunManager. Follow-up and apply turns resume the CLI session from a different cwd than the first turn, which can break multi-turn investigations under the default isolated setting.
Reviewed by Cursor Bugbot for commit 72e8f08. Configure here.
Adds an AI SRE assistant that investigates a Kubernetes resource, explains the root cause, proposes remediation, and — on explicit confirmation — applies it. Runs entirely on the operator's machine against their OWN agent CLI (Claude Code or Codex) and Radar's in-process MCP: no Radar cloud, no API key, no account. Standalone Radar stays agent-free unless a supported CLI is already installed. Presentational parts are built to lift into @skyhook-io/k8s-ui so Radar Hub can reuse them for a managed-key cloud variant. Engine (internal/ai, internal/server/ai_diagnose.go): agent discovery; headless streaming against MCP into a structured Diagnosis (root cause, remediation, recommended_index, confidence); durable server-side runs (SSE replay, concurrency cap, focus-existing, context-switch cancel/stale). Health-aware framing — the server captures the resource's operational issue signal and (separately) audit posture at start and threads it into the prompt, so the agent confirms-or-finds instead of manufacturing a problem on a healthy resource; a healthy verdict renders a green all-clear card. Multi-agent + isolation: Agent interface with Claude Code and Codex backends; Codex Isolated vs My-setup modes (shell always read-only); optional model + Codex effort. Security: investigations use a read-only MCP mount (write tools not discoverable); the user-confirmed apply runs in a fresh write-enabled session bound to the confirmed fix; loopback origin checks; gated to no-auth standalone Radar. Experience (web/src/components/diagnose): live-streaming transcript (line-by-line reasoning reveal, tool-call fade-in, readable payloads + rich dialog), staged verdict reveal (root cause -> remediation) with sticky-bottom scroll; adaptive entry point in the resource detail header chrome (prominent Diagnose on a problem, quiet icon when fine), per-row on Issues, top-bar entry; config in a Settings "AI Diagnosis" section; hand-off that copies a command to resume the real agent session. Header uses container queries so it reflows when the docked panel pushes the app narrower.
High-confidence fixes from the product review, no behavior/scope changes that
need a design call:
Correctness:
- Wire the run stream's onClosed: clear the spinner and mark an open turn
terminal when a run is evicted/gone, instead of letting it shimmer forever.
- Claude hand-off command now prefixes `cd ~` so the cwd-scoped --resume
resolves wherever the user pastes it (Codex is cwd-independent, unchanged).
- Never render a blank done turn: fall back to the agent's narration, or an
explicit "finished without a clear result" note (gate mirrors ResultCard's
branches, including follow-up vs structured).
Honesty / trust:
- Confidence shown as three discrete pips (tier of 3), not a continuous bar that
faked a precise percentage; "Confidence not stated" when the model omits it.
- All-clear card reads "No problems found" (what the agent can actually assert),
not "Healthy"; trust disclaimers no longer truncate.
- ConsentCard: headline is read-only only in Isolated mode; adds that the agent
sends data to its own model provider under the user's account, not to Radar.
- GitOps "managed by" warning drops the "reconcile" jargon for plain language.
Legibility:
- Running status shows elapsed seconds + a soft "still working" stall hint after
~30s with no new activity, so a long run reads as progress and a hang is legible.
- Stale banner gets a "Re-run on current cluster" button (was instruction-only).
- Recent-investigations list distinguishes Stopped (neutral) from Failed (red)
and labels terminal states in text, not just a colored dot.
Polish:
- Isolated marked "(recommended)"; effort options de-duplicated ("Balanced" was
on two); top-bar tooltip names the agent + "runs locally"; panel uses
shadow-drawer; the diagnose slot health prop is typed, not stringly-typed.
…ust UX
Acting on the product-review decisions (the items signed off as clear):
Honesty:
- First-class "inconclusive" verdict — the agent can now say "I investigated but
couldn't determine this" (RBAC-denied, missing data, ambiguous) instead of
defaulting to a false all-clear. New schema field + prompt retune that separates
"verified healthy" from "couldn't tell"; parser normalizes verdict precedence
(finding > inconclusive > healthy) so the result can't be self-contradictory; a
distinct neutral InconclusiveCard renders it. Test pins the precedence.
- recommended_reason: the agent explains WHY the recommended step is the safe pick
(reversible / lowest blast radius); shown under the Recommended label.
- "Explain simply" chip on the verdict — one-tap plain-language follow-up (reuses
the follow-up machinery) so glossing is user-controllable, not prompt-only.
Apply safety:
- Verify the write: after an apply, automatically re-check health as a follow-up
(replay-safe via a ref armed only on a live user apply) rather than asserting
success blindly. Robust substitute for parsing write-tool results.
- GitOps/Helm-managed resources require an explicit acknowledgment ("I understand
{controller} may revert this") before Apply enables; copy hedges on auto-sync
(we detect the owner, not whether sync is on). TODO left for the real fix:
connect the user's SCM and open a PR instead of a direct change.
Trust framing:
- ConsentCard leads with the BYO thesis (your own agent, on your machine, nothing
to Radar — and that data goes to the model provider under your account); the
top-bar/entry tooltips already surface it on hover. Documented seam for Radar
Cloud to override this copy (different, honest trust story).
- "My setup" (non-isolated Codex) now shows an amber callout — it's a capability
cliff (your MCP servers + local file reads), not a peer toggle.
Layout / honesty:
- Panel push is now panel-width-aware: reflow only while the app keeps a usable
floor left of the (resizable) panel, else overlay — instead of a static viewport
cutoff that ignored panel width and could crush the app to near-zero.
- Recent-investigations copy no longer over-promises durability (runs are in-memory;
kept until Radar restarts).
… tokens - Apply confirm + per-step Apply use Wrench (was the whimsical Wand2) — gravity on a cluster-mutating action; ties to the Remediation section's icon. - Panel resize handle: faint resting tint + cursor-ew-resize (consistent with the workload drawer) + a touch wider, so it's discoverable without being heavy. - Settings footer hint is now accurate per change type: "Applies to new investigations" for AI-only edits vs "Applies on next launch" for host config. - Recent-investigations status dots use the design-system StatusDot/tone instead of hand-rolled bg-*-400 (DESIGN.md): done=healthy, error=unhealthy, stale=degraded, stopped=neutral.
Adds cursor-agent as a third BYO agent backend alongside Claude Code and Codex. Drives the user's installed cursor-agent CLI headlessly against Radar's MCP, with the same read-only-investigation / confirmed-apply model. Containment: investigation turns point at the read-only MCP mount (strips all write tools); --sandbox enabled blocks Cursor's own shell. Cursor can't suppress the user's global MCP servers, so it gets its own consent disclosure (separate from Claude/Codex consent) rather than the isolated/my-setup framing. Cursor's --resume is workspace-scoped, so each run gets a stable scratch dir under a private per-process root (collision-free across restarts/processes), cleaned on eviction, context-switch, and shutdown.
Panel docks below the top navbar (matching maximized) and reflows only the content region — the navbar and nav rail stay static. Provider no longer pushes layout; it exposes contentGutter, applied to the connected content. Dark-mode drawer shadow lightened with a hairline edge.
The AI panel was a document-level fixed overlay that DOM-measured the chrome (querySelector header/nav) and pushed the whole app. Move it into the column (the body frame, under the header / right of the rail) as an absolute slot, sharing that frame and its conventions with the resource/Helm drawers: - Panel is absolute in the column (top: header height); maximized fills the frame. No more querySelector chrome measurement. - Drawers gain an optional rightInset (= docked panel width) so they sit beside the panel, not under it. Shared right-side-surface contract; optional prop keeps @skyhook-io/k8s-ui backward-compatible. - All body states reflow in one body-frame wrapper (consistent push across connected/auth/error states, not just connected). - Layout state split into its own memoized context (useDiagnoseLayout) so the panel's 4s run-poll no longer re-renders the whole app shell.
…ator + verdict reveal Codex backend: - Disable the built-in web_search tool — it reaches the public internet server-side (outside --sandbox), breaking the contained read-only story. - Clamp 'minimal' reasoning effort to 'low': codex's always-on image_gen tool is rejected by the API at minimal (400), and can't be disabled. Removed 'Minimal' from the effort picker. Run lifecycle: - A run that's been evicted (retention cap) or lost on restart returns 404 on its event stream; EventSource was retrying it silently → blank panel. Surface the permanent failure and render a 'no longer available — re-run' state. UI polish: - Per-resource Diagnose button shows a live 'Investigating…' spinner while an investigation is running for that resource (clicking focuses the live run); global ✦ button gets a running count + pulse. Runs now poll while any run is live (not only while the panel is open) so the indicator stays accurate closed. - Verdict reveal now animates on ALL outcomes (root cause / remediation / healthy / inconclusive), each tinted to its semantic color, with a colored border ring and an uneven, longer breathing glow (was a single accent pulse on root-cause only).
Expanding an investigation used to just add margins — a max-w-3xl transcript stranded in empty gutters with a fixed w-72 recent-list rail. Make maximized a real workspace: - Two panes: the transcript (reasoning + tool calls) scrolls as evidence on the left; the latest structured verdict (root cause + remediation + Apply) pins to a rail on the right so the conclusion and action stay in view while you read the evidence. The pinned turn's card is suppressed inline (TurnView hideVerdict) so it isn't double-rendered; follow-up answers + apply outcomes stay inline. - Recent-investigations history is collapsible (PanelLeft toggle), not a permanent rail — shown by default only when there's no active run (the list is the content), otherwise off so the investigation gets the room. - ResultCard is exported for reuse in the rail. Docked mode is unchanged (sidecar); drawer-launch still opens docked, workspace only on Expand. Degrades cleanly: no structured verdict (e.g. an API error) → no rail, just the answer.
…ed run Reopening a completed investigation replayed the 'formulating root cause → weighing remediation' beats (a ~4s fake delay) before showing a verdict that was already known. The guard meant to detect replay (revealBufRef empty) was already flushed by the time the done event arrived, so it always read as live. Detect replay reliably from the run's status at subscribe time: if it wasn't 'running' when opened, we're rebuilding history — show every verdict immediately. The choreography now only plays for a verdict watched landing live.
In full-wide there's room for it — it's the master pane of the master-detail layout. Drops the collapsible/toggle behavior (and its header button) so the recent-investigations list is always visible when maximized.
… paths, dead code - Add the /mcp-readonly/.well-known/* 404 shim (the /mcp mount already had one): without it Claude Code's OAuth probe gets a 405, flips the server to needs-auth, and injects synthetic authenticate tools into every read-only investigation turn. - Forgive a nonzero agent exit only when a STRUCTURED verdict parsed (root cause / remediation / healthy / inconclusive) — free-text alone means the CLI died mid-stream and now surfaces as an error instead of a calm "done". - Cap a turn's wall-clock time (default 15m, RADAR_AI_TURN_TIMEOUT) so a wedged CLI can't hold one of the 3 concurrency slots forever; timeout renders as a friendly error, not "context deadline exceeded". - Make Run.finalize idempotent (context-switch + later eviction double-called it, appending a second "closed" sentinel to the replay log). - Disclose in the isolated-Codex consent copy that its sandboxed shell can still read local files (write/network stay blocked) — the My-setup copy already said so, the isolated copy overclaimed. - Guard rapid Diagnose clicks across two resources with a start sequence token so a late createRun response can't steal focus from the newer run. - Remove the dead "token" stream-event chain end-to-end (no backend ever emitted it; narration flows via "thinking"), the inert --include-partial-messages Claude flag, the unused verdict-glow CSS, and needless exports in parts.tsx.
5f80ec5 to
697a890
Compare


Diagnose with AI — local, keyless, BYO-agent investigations for Radar OSS
Radar can see what's broken (the Issues queue, resource health, events). This adds the why and the fix: an AI SRE assistant that investigates a Kubernetes resource, explains the root cause in plain language, proposes remediation, and — on explicit confirmation — applies it. It runs entirely on the operator's machine against their own agent CLI (Claude Code, Codex, or Cursor) and Radar's in-process MCP: no Radar cloud, no API key, no account. Standalone Radar stays agent-free unless a supported CLI is already installed.
The presentational components are built to lift into
@skyhook-io/k8s-uiso Radar Hub can reuse them for a managed-key, audited cloud variant — this PR is the OSS half plus the embedder slot the hub fills.Investigation engine (
internal/ai/,internal/server/ai_diagnose.go)Diagnosis; process-group lifecycle so no agent child outlives a run.runs.go): investigations are jobs owned by the server, not the browser tab — they survive panel close / navigation / refresh. SSE replay via monotonic seq +Last-Event-ID; soft cap of 3 concurrent; focus-existing-run on duplicate; context-switch cancels + marks stale. (In-memory today — cleared on restart; the UI says so.)Multi-agent + isolation
Agentinterface abstracts the CLI; Claude Code, Codex, and Cursor backends implement it. The server builds the set from every supported CLI onPATH.--ignore-user-config, empty cwd, minimal env) vs My setup (the user's full config + their own MCP servers + local file reads — a deliberate trusted mode, surfaced with an explicit warning). Both keep the shell read-only, and Codex's built-inweb_searchis disabled — it reaches the public internet server-side (outside--sandbox), which would break the "reads only this cluster" guarantee. Optional model + reasoning effort (low/medium/high).cursor-agentheadlessly with--sandbox enabledand a per-run workspace pointed at Radar's read-only MCP mount. Cursor has no flag to suppress the user's global MCP servers, so it cannot be made hermetically read-only like Claude/Codex — this is disclosed honestly in a Cursor-specific consent card (separate from the Claude/Codex consent, so prior approval can't bypass it). Free-form model override; no isolation toggle (cursor has none). Its--resumeis workspace-scoped, so each run gets a stable scratch dir under a private per-process root.Security: read-only mount + hardened apply
The investigation experience (
web/src/components/diagnose/)Built to feel like a sharp SRE working beside you, not a spinner that dumps text:
Architecture:
DiagnoseProvider(always-mounted controller; layout state lives in a separate memoized context so run-polling doesn't re-render the app shell). The panel is an absolute slot in the app's body frame — the column under the header, right of the nav rail — so it docks below the global navbar and pushes only the content (navbar + rail stay static), sharing one right-inset contract with the resource/Helm drawers so they sit beside it rather than under it. Expand turns it into a focused, decision-first workspace: the transcript becomes scrollable evidence on the left while the latest structured verdict — root cause + remediation + Apply — pins to a right rail that stays in view as you read; the recent-investigations list is the always-present master pane on the left. Docked stays a sidecar; it overlays instead of crushing the app on small screens.InvestigationViewis a pure subscriber that reconstructs turns from the event stream; presentational parts stay pure for the k8s-ui lift.Trust framing + config
Testing
recommended_index+recommended_reasonparsing, apply-prompt binding, both agents' stream parsers, isolation flag construction, agent-name routing, focus-existing-run key, and per-resource managed-by/health detection.kindclusters: both agents investigate broken resources and confirm health without confabulating on healthy ones; durable runs survive panel close; the read-only mount holds; the adaptive entry, staged reveal, elapsed timer, and verified-apply behave; the header reflows under the docked panel without overlap.Not included / follow-ups
Note
High Risk
Spawns local agent processes with cluster access via MCP; apply turns enable write tools (mitigated by read-only mount, fresh-session apply, and fix binding). Context-switch teardown is critical to avoid cross-cluster writes.
Overview
Adds Diagnose with AI for standalone OSS Radar: investigations run on the user's machine via their own agent CLI (Claude Code, Codex, Cursor) against Radar's localhost MCP—no cloud or API key. Gated to no-auth standalone when
/mcpis enabled and a supported CLI is onPATH.Introduces
internal/aiwith anAgentinterface, per-backend spawn/stream parsing, health-aware prompts, structured verdict parsing (healthy / inconclusive / root cause), and aRunManagerfor server-side jobs that survive tab close, with SSE replay, concurrency caps, and cancel-on-cluster-switch (via newOnBeforeContextSwitch).Security: read-only investigations hit
/mcp-readonly(write tools not registered; tested); apply uses a fresh write session bound to user-confirmed fix text. Loopback origin checks on mutating diagnose POSTs.HTTP:
/api/agents,/api/diagnose/runs/*(start, list, turn, stop, stream). Server attaches GitOps/Helm managed-by and issue/audit health at run start for prompts and Apply warnings.UI: diagnose panel as a docked body-frame slot (
contentGuttershared with resource drawers), entry points in header/Issues queue, Settings AI section, andDiagnoseCustomizationembed slot inRadarAppfor Hub.Reviewed by Cursor Bugbot for commit 697a890. Bugbot is set up for automated code reviews on this repo. Configure here.