Skip to content

GET /api/vitals: slim scale + capacity endpoint#1094

Open
nadaverell wants to merge 1 commit into
mainfrom
feat/vitals-endpoint
Open

GET /api/vitals: slim scale + capacity endpoint#1094
nadaverell wants to merge 1 commit into
mainfrom
feat/vitals-endpoint

Conversation

@nadaverell

@nadaverell nadaverell commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Summary

New endpoint returning per-cluster scale and capacity — node/pod phase counts, CPU/memory usage+requests vs allocatable, metrics-server availability — without the home-page payload. Purpose-built for Radar Cloud's fleet fan-out (which previously fetched the full /api/dashboard per cluster and discarded ~90% of it), equally usable by any consumer that wants vitals without the kitchen sink.

What changed

  • GET /api/vitals (internal/server/vitals.go), mounted beside /api/dashboard in the authed API group. Pod counts are namespace-scoped to the caller's access; node counts gate on the caller's own node RBAC like the dashboard does.
  • Typed completeness contract: completeness.{accessRestricted, pending[], restricted[], complete} — sentinel lists carry k8score ResourceType values filtered to vitals-relevant kinds, and complete is derived server-side. Includes a "nodes" restricted sentinel the dashboard never had (consumers previously had to infer node RBAC blindness from zero totals). Metrics availability is deliberately not part of completeness — it's a capability, reported separately.
  • Metrics probe memoized 15s per RBAC scope: live usage hits metrics-server (8s timeout) — fine per page-load, wasteful for fleet polling. Requests/capacity stay informer-derived and fresh.
  • Stable JSON shape: pod phase counts have no omitempty — fleet consumers get fixed fields, not fields that vanish at zero.

Testing

go test ./internal/server/ green, including two new tests: happy path against the smoke fixture (counts, completeness, no false metrics claim) and node-RBAC-denial surfacing restricted: ["nodes"] + complete: false through the real permission-cache gate.

Part of the endpoint rework for cloud; the hub-side switch of its vitals fan-out to this endpoint lands separately.


Note

Medium Risk
New authenticated API surface with nuanced RBAC (pods vs namespace visibility, partial scopes) and cross-user metrics memoization; behavior is well-tested but wrong gates could leak aggregate cluster data.

Overview
Adds GET /api/vitals, a lightweight alternative to the full dashboard for fleet polling: node/pod phase counts, CPU/memory capacity vs requests vs live usage, and explicit completeness metadata (accessRestricted, pending, restricted, derived complete). Pod data is gated on pod list RBAC (not just namespace visibility); node metrics stay behind node list like the dashboard.

Shared metrics logic moves into node_metrics.go (fetchNodeUsage, listPodsScoped, computeCapacityRequests); dashboard metrics now call those helpers instead of inlining the same math.

For vitals, only the metrics-server usage probe is memoized 15s per kube-context + user (cleared on context switch); informer-derived capacity/requests stay fresh each request.

Tests cover namespace scoping sentinels, capacity math, vitals happy path, node RBAC denial, deployments-only users, and partial pod namespace access.

Reviewed by Cursor Bugbot for commit 24005b9. Bugbot is set up for automated code reviews on this repo. Configure here.

@nadaverell nadaverell requested a review from hisco as a code owner July 3, 2026 21:57
Comment thread internal/server/vitals.go
Comment thread internal/server/vitals.go Outdated
@nadaverell nadaverell force-pushed the feat/vitals-endpoint branch from 7adcc71 to e958724 Compare July 3, 2026 22:05
Comment thread internal/server/vitals.go
Comment thread internal/server/vitals.go
@nadaverell nadaverell force-pushed the feat/vitals-endpoint branch from e958724 to 30068ca Compare July 3, 2026 22:18
Comment thread internal/server/vitals.go
}
resp.Memory = m
}
resp.MetricsServerAvailable = usage.ok

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty node cache probes metrics

Medium Severity

The vitals handler still calls the memoized metrics-server probe when the node lister returns zero nodes, unlike getDashboardMetrics, which returns before probing. That can set metricsServerAvailable true while node totals and CPU/memory summaries are absent, and mark completeness complete during informer catch-up when counts are still empty.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 30068ca. Configure here.

@nadaverell nadaverell force-pushed the feat/vitals-endpoint branch 3 times, most recently from 05862dd to f82f48d Compare July 4, 2026 00:18
Comment thread internal/server/vitals.go
@nadaverell nadaverell force-pushed the feat/vitals-endpoint branch 4 times, most recently from 82fcaa6 to 197e6be Compare July 4, 2026 00:39

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 197e6be. Configure here.

Comment thread internal/server/vitals.go
}
if podsReadable && podsPartial {
restricted = append(restricted, "pods")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate pods restricted entry

Low Severity

If the caller can list pods and pod scope is partial, but cache.Pods() is nil, vitals appends "pods" to completeness.restricted twice—once in the else branch and again for partial scope—yielding duplicate sentinel values in JSON.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 197e6be. Configure here.

Purpose-built for Radar Cloud's fleet fan-out, which previously
projected this corner out of the full /api/dashboard payload and
discarded the rest. Returns node/pod phase counts (namespace-scoped to
the caller's access), CPU/memory MetricSummary, and the
metrics-server-availability flag.

Completeness is typed, never stringly: accessRestricted, pending and
restricted (k8score ResourceType values, filtered to vitals-relevant
kinds), and a derived complete flag — including a "nodes" restricted
sentinel the dashboard never had (consumers previously inferred node
RBAC blindness from zero totals). Metrics-server usage is memoized for
15s per RBAC scope so fleet polling doesn't turn every request into a
live probe; requests/capacity stay informer-derived.
@nadaverell nadaverell force-pushed the feat/vitals-endpoint branch from 197e6be to 24005b9 Compare July 4, 2026 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant