GET /api/vitals: slim scale + capacity endpoint#1094
Conversation
7adcc71 to
e958724
Compare
e958724 to
30068ca
Compare
| } | ||
| resp.Memory = m | ||
| } | ||
| resp.MetricsServerAvailable = usage.ok |
There was a problem hiding this comment.
Empty node cache probes metrics
Medium Severity
The vitals handler still calls the memoized metrics-server probe when the node lister returns zero nodes, unlike getDashboardMetrics, which returns before probing. That can set metricsServerAvailable true while node totals and CPU/memory summaries are absent, and mark completeness complete during informer catch-up when counts are still empty.
Reviewed by Cursor Bugbot for commit 30068ca. Configure here.
05862dd to
f82f48d
Compare
82fcaa6 to
197e6be
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 197e6be. Configure here.
| } | ||
| if podsReadable && podsPartial { | ||
| restricted = append(restricted, "pods") | ||
| } |
There was a problem hiding this comment.
Duplicate pods restricted entry
Low Severity
If the caller can list pods and pod scope is partial, but cache.Pods() is nil, vitals appends "pods" to completeness.restricted twice—once in the else branch and again for partial scope—yielding duplicate sentinel values in JSON.
Reviewed by Cursor Bugbot for commit 197e6be. Configure here.
Purpose-built for Radar Cloud's fleet fan-out, which previously projected this corner out of the full /api/dashboard payload and discarded the rest. Returns node/pod phase counts (namespace-scoped to the caller's access), CPU/memory MetricSummary, and the metrics-server-availability flag. Completeness is typed, never stringly: accessRestricted, pending and restricted (k8score ResourceType values, filtered to vitals-relevant kinds), and a derived complete flag — including a "nodes" restricted sentinel the dashboard never had (consumers previously inferred node RBAC blindness from zero totals). Metrics-server usage is memoized for 15s per RBAC scope so fleet polling doesn't turn every request into a live probe; requests/capacity stay informer-derived.
197e6be to
24005b9
Compare


Summary
New endpoint returning per-cluster scale and capacity — node/pod phase counts, CPU/memory usage+requests vs allocatable, metrics-server availability — without the home-page payload. Purpose-built for Radar Cloud's fleet fan-out (which previously fetched the full
/api/dashboardper cluster and discarded ~90% of it), equally usable by any consumer that wants vitals without the kitchen sink.What changed
GET /api/vitals(internal/server/vitals.go), mounted beside/api/dashboardin the authed API group. Pod counts are namespace-scoped to the caller's access; node counts gate on the caller's own node RBAC like the dashboard does.completeness.{accessRestricted, pending[], restricted[], complete}— sentinel lists carryk8scoreResourceType values filtered to vitals-relevant kinds, andcompleteis derived server-side. Includes a"nodes"restricted sentinel the dashboard never had (consumers previously had to infer node RBAC blindness from zero totals). Metrics availability is deliberately not part of completeness — it's a capability, reported separately.omitempty— fleet consumers get fixed fields, not fields that vanish at zero.Testing
go test ./internal/server/green, including two new tests: happy path against the smoke fixture (counts, completeness, no false metrics claim) and node-RBAC-denial surfacingrestricted: ["nodes"]+complete: falsethrough the real permission-cache gate.Part of the endpoint rework for cloud; the hub-side switch of its vitals fan-out to this endpoint lands separately.
Note
Medium Risk
New authenticated API surface with nuanced RBAC (pods vs namespace visibility, partial scopes) and cross-user metrics memoization; behavior is well-tested but wrong gates could leak aggregate cluster data.
Overview
Adds
GET /api/vitals, a lightweight alternative to the full dashboard for fleet polling: node/pod phase counts, CPU/memory capacity vs requests vs live usage, and explicitcompletenessmetadata (accessRestricted,pending,restricted, derivedcomplete). Pod data is gated on pod list RBAC (not just namespace visibility); node metrics stay behind node list like the dashboard.Shared metrics logic moves into
node_metrics.go(fetchNodeUsage,listPodsScoped,computeCapacityRequests); dashboard metrics now call those helpers instead of inlining the same math.For vitals, only the metrics-server usage probe is memoized 15s per kube-context + user (cleared on context switch); informer-derived capacity/requests stay fresh each request.
Tests cover namespace scoping sentinels, capacity math, vitals happy path, node RBAC denial, deployments-only users, and partial pod namespace access.
Reviewed by Cursor Bugbot for commit 24005b9. Bugbot is set up for automated code reviews on this repo. Configure here.