Skip to content

feat(dashboard): render per-prefill / per-decode / total measured power#411

Closed
arygupt wants to merge 1 commit into
masterfrom
feat/dashboard-prefill-decode-power
Closed

feat(dashboard): render per-prefill / per-decode / total measured power#411
arygupt wants to merge 1 commit into
masterfrom
feat/dashboard-prefill-decode-power

Conversation

@arygupt
Copy link
Copy Markdown
Collaborator

@arygupt arygupt commented Jun 1, 2026

What

Adds three new selectable Y-axis metrics to the inference chart, surfacing the per-stage measured-power telemetry the runner already emits but the dashboard could not render:

New metric Source field (already on AggDataEntry) Mirrors
Measured Prefill Power per GPU (W) prefill_avg_power_w measuredAvgPower
Measured Decode Power per GPU (W) decode_avg_power_w measuredAvgPower
Measured J per Input Token (J/tok) joules_per_input_token measuredJPerOutputToken

Net result in the gated "Measured Energy" dropdown: prefill / decode / total power + J input / output / total — i.e. the per-prefill/per-decode/per-total split for disaggregated runs.

Why

The disagg per-stage power data flows all the way through the pipeline (DB metrics JSONB + unofficial-run artifact path → rowToAggDataEntry → chart point) but only three measured metrics were ever wired as selectable Y-axis options (total power, J/output, J/total). The per-stage fields existed on the row but were never wrapped into metrics. This closes that gap.

How

Purely additive / mechanical — copies the existing measuredAvgPower trio pattern across every site:

  • inference-chart-config.json (both interactivity + e2e charts; J/input roofline lower_right/lower_left to match J/output)
  • Y_AXIS_METRICS, YAxisMetricKey, ChartDefinition, InferenceData (types)
  • createChartDataPoint + roofline machinery (both type unions, roof-reset, markRooflinePoints)
  • useInterpolatedTrendData lightweight trend-point builder
  • ChartControls gated "Measured Energy" group (stays gated: true)
  • chart-utils.test.ts — 7 new cases (emit / omit-legacy / zero-preservation / per-stage independence / full-disagg row)

No runner, ETL, or DB change — source fields already exist on AggDataEntry; packages/constants already lists the keys.

Still gated

The metrics remain behind the existing ↑ ↑ ↓ ↓ feature gate, so merging does not expose measured power publicly. Ungating is a separate, deliberate product decision (follow-up).

How to preview (no DB needed)

On the Vercel preview this PR posts:

  1. Open …/inference?unofficialrun=26607091549 (case-insensitive; fetches the GB300 run's GitHub artifacts directly — needs GITHUB_TOKEN in the preview env, which already powers the PR unofficial-run visualizer).
  2. Press ↑ ↑ ↓ ↓ to unlock "Measured Energy".
  3. In the Y-axis dropdown pick Measured Prefill Power per GPU, then Measured Decode Power per GPU, then Measured Average Power per GPU; filter to the dsv4/gb300 points → screenshot each. Expect prefill > decode per-GPU watts (compute- vs memory-bound).

Verification

  • tsc --noEmit — clean (the Y_AXIS_METRICS/YAxisMetricKey/ChartDefinition/InferenceData sites all agree).
  • Full app unit suite — 1996 passed (incl. 7 new cases).
  • oxlint on changed files — clean.

Out of scope (follow-ups)

  • Combined prefill+decode+total overlay as one multi-series chart.
  • Per-worker workers[] visualization (data is carried to the point, not yet rendered).
  • Ungating measured power for the public site.

🤖 Generated with Claude Code


Note

Low Risk
Additive frontend chart and type wiring with optional-field guards; no API, auth, or data-pipeline changes.

Overview
Adds three selectable measured-energy Y-axis metrics on the inference dashboard—prefill power, decode power, and J per input token—so disaggregated runs can show per-stage telemetry that already exists on benchmark rows but was not chartable.

The change mirrors the existing total measured-power pattern: map prefill_avg_power_w, decode_avg_power_w, and joules_per_input_token into chart points in createChartDataPoint and trend interpolation, register them in chart config and Y_AXIS_METRICS, extend roofline marking, and list them under the gated Measured Energy dropdown (prefill/decode/total power plus input/output/total J). Legacy rows without per-stage fields stay omitted (not zero-filled). Unit tests cover emit, omit, zero preservation, and full disagg rows.

No runner, DB, or ETL changes; the feature gate for measured energy is unchanged.

Reviewed by Cursor Bugbot for commit 0e85ae7. Bugbot is set up for automated code reviews on this repo. Configure here.

Wire the per-stage measured-power telemetry the runner already emits
(prefill_avg_power_w, decode_avg_power_w, joules_per_input_token) as three
new selectable Y-axis metrics, mirroring the existing measuredAvgPower trio:

- Measured Prefill Power per GPU (W)
- Measured Decode Power per GPU (W)
- Measured J per Input Token (J/tok)

Added across both chart configs (interactivity + e2e), Y_AXIS_METRICS,
YAxisMetricKey, ChartDefinition, InferenceData, createChartDataPoint, the
roofline machinery (type unions / roof-reset / markRooflinePoints), the
lightweight trend-point builder, and the gated "Measured Energy" dropdown
group (stays behind the existing feature gate).

Purely additive — source fields already exist on AggDataEntry; no runner or
DB change. Closes the gap where disagg per-stage power was ingested but not
renderable. Validated against GB300 disagg data (run 26607091549).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@arygupt arygupt requested a review from adibarra as a code owner June 1, 2026 03:09
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
inferencemax-app Ready Ready Preview, Comment Jun 1, 2026 3:09am

Request Review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

Claude finished @arygupt's task in 2m 40s —— View job


🟢 LGTM — no blocking issues found

This is a clean, mechanical extension of the existing measured-power trio. I verified every sync point ends up consistent:

Sync points checked (all 11 in agreement):

  • Y_AXIS_METRICS + YAxisMetric type (chart-utils.ts:151-160)
  • YAxisMetricKey (types.ts:265-270)
  • ChartDefinition field set (types.ts:379-394)
  • InferenceData wrapper fields (types.ts:233-238)
  • createChartDataPoint source-field guards (chart-utils.ts:407-422)
  • calculateRoofline / computeAllRooflines union types (chart-utils.ts:581-663)
  • markRooflinePoints reset block + if/else chain (chart-utils.ts:706-795)
  • useInterpolatedTrendData lightweight builder (useInterpolatedTrendData.ts:83-91)
  • inference-chart-config.json (both interactivity + e2e blocks; J/input roofline = lower_right / lower_left to mirror J/output)
  • ChartControls.tsx Measured Energy group (gated: true preserved)
  • Tests: 7 new cases in chart-utils.test.ts:1297-1369 cover emit / omit-legacy / zero-preservation / per-stage independence / full-disagg row

Unofficial-run path: Source fields (prefill_avg_power_w, decode_avg_power_w, joules_per_input_token) are already mapped by benchmark-transform.ts:61-63 (covered by benchmark-transform.test.ts), and the unofficial-run provider uses the same transformBenchmarkRows. So overlay rooflines/points pick up the new metrics for free — overlay support is structurally present.

typeof === 'number' guards preserve 0 W values (explicitly asserted by the new "preserves a zero per-stage power value" test).

Feature gate stays in place — merging does not expose measured power publicly.

Minor pre-existing observation (not introduced by this PR, not blocking): chart-display-helpers.tsx:117 uses startsWith('y_j') to gate the Joules disagg caveat, which doesn't match any y_measured* metric — including the existing y_measuredJPerOutputToken/y_measuredJPerTotalToken. Since the new y_measuredJPerInputToken follows the same pre-existing pattern, there's no new regression here. If/when the Measured Energy group is ungated, that's the place where a y_measured caveat branch would need to be added.

  • Read changed files
  • Verify mechanical-copy claim across 11 sync points
  • Check unofficial-run path coverage
  • Check tests (7 new cases)
  • Post review
    • Branch: feat/dashboard-prefill-decode-power

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant