feat(api): add /api/run endpoint exposing run-identity metadata by matthewkotila · Pull Request #997 · ai-dynamo/aiperf

matthewkotila · 2026-05-26T23:14:35Z

`/api/run`: how a "regression fix" turned into a design choice

Branch: mkotila/api-run-endpoint · Commit: 047d59dd · Tickets: NVBugs 6204764 / JIRA DYN-699

The story

A QA test broke. The reporter framed it as a regression: curl /api/config | jq .cli_command used to return the redacted launch command, now it returns null. The bug ticket arrived with a one-paragraph "AI suggestion" recommending the obvious fix — stitch cli_command back onto /api/config and call it done.

The obvious fix was wrong. Here's why.

The break came from PR #912 (the YAML-native v2 config refactor). Before that refactor, a single UserConfig class held everything: declarative spec (endpoint, loadgen, tokenizer, …) and per-run identity (benchmark_id, cli_command, …) jammed together at one level. /api/config dumped the whole grab-bag. PR #912 split it cleanly — declarative parts into BenchmarkConfig, identity into a new BenchmarkRun envelope. That split is the right shape: a CLI command isn't part of "the config," it's run-execution metadata — the thing that produced a config. The endpoint was migrated to return only BenchmarkConfig, which is honest, but it silently dropped cli_command and benchmark_id along with it.

So you see the trap. Option A from the bug ticket — stitch cli_command back onto /api/config — would re-create the category error PR #912 just cleaned up. You'd paper over the architectural improvement to preserve a stale API shape.

Option B is the move: add a new endpoint, /api/run, that exposes the BenchmarkRun envelope. cli_command lives there because that's where the field actually lives in the model. /api/config stays pure. As a bonus, the response is shaped as RunInfo — the same model already used in profile_export_aiperf.json — so the live API and the on-disk export can't drift.

The cost is one line in QA's test (/api/config → /api/run). No in-tree consumers to migrate; the web dashboard that would have read it was on an unmerged branch and never landed.

What shipped

File	Change
`src/aiperf/common/models/export_models.py`	`RunInfo.from_run(run)` classmethod — single projection from `BenchmarkRun`
`src/aiperf/exporters/metrics_json_exporter.py`	Replaced private `_build_run_metadata` with the new classmethod (one source of truth)
`src/aiperf/api/routers/core.py`	New `GET /api/run` → `RunInfo`
`tests/unit/api/routers/test_core.py`	+5 tests: `/api/config` no-grab-bag guard; `/api/run` shape + no-secrets; end-to-end redaction test (`sys.argv` with `--api-key sk-secret-…` → asserts secret not in response)
`docs/reference/api-endpoints.md` (new)	Full HTTP API surface documented for the first time
`docs/index.yml`	Fern index registration for the new page

Verification

Check	Result
`pytest tests/unit/api/ tests/unit/exporters/`	347 passed, 1 skipped
`ruff format --check` + `ruff check` (touched files)	clean
`tools/check_docs_index.py`	OK: all 102 docs listed
Live FastAPI startup log	`/api/run` present in route table
`pre-commit run --all-files` (host)	passed
Every claim in `api-endpoints.md` re-checked against code	passed (2 minor edits applied)

Decisions worth remembering

Endpoint name /api/run — bare singular noun matches /api/config's convention; RunInfo is the internal model name, not the route's job.
Reference doc over tutorial — the API surface had been undocumented since PR feat(api): add FastAPI service with plugin-based router system #717 landed without the api-service.md from ajc/websocket-dashboard-ui; a compact reference table closes the gap without expanding scope.
Single commit — code + tests + docs in one self-contained unit.

Follow-ups

QA test: migrate test_api_key_redaction from /api/config → /api/run. The new in-tree test_run_does_not_leak_api_key_in_cli_command mirrors the same contract.
Comment on NVBugs 6204764 / DYN-699 with fix summary + the QA migration note.
If 0.9.0 backport is needed: cherry-pick 047d59dd.

Summary by CodeRabbit

New Features
- Added /api/run endpoint exposing active run identity/metadata with automatic redaction of API keys and sensitive CLI/credential values; omits unset optional fields and returns 503 when no active run.
Bug Fixes
- /api/config no longer exposes run-identity fields.
Documentation
- Added comprehensive HTTP API reference with endpoints, Prometheus/metrics, health checks, results/artifacts, OpenAPI and curl examples.

The YAML-native v2 config refactor (PR #912) correctly moved cli_command, benchmark_id, and sweep_id from UserConfig onto BenchmarkRun, leaving /api/config as a pure BenchmarkConfig projection. External consumers reading cli_command from /api/config broke as a result (NVBugs 6204764 / JIRA DYN-699). Restoring cli_command on /api/config would re-create the category error the refactor cleaned up (run-execution metadata on a declarative config response). Instead, expose run-identity via a dedicated endpoint that matches the underlying model and shares its shape with the run_info block in profile_export_aiperf.json, so live and on-disk readers see identical fields. - GET /api/run -> RunInfo (benchmark_id, sweep_id, trial, random_seed, run_label, variation_*, redacted cli_command). - RunInfo.from_run() classmethod as the single projection function shared by /api/run and the JSON exporter. - Tests: /api/config must not expose run-identity; /api/run must not expose config or secrets; end-to-end redaction test monkey-patches sys.argv with --api-key <secret> and asserts the secret never reaches the API response (mirrors QA's test_api_key_redaction). - New docs/reference/api-endpoints.md documenting all HTTP API routes (the surface has been undocumented on main since #717 landed without the api-service tutorial that only existed on the ajc/websocket-dashboard-ui branch).

github-actions · 2026-05-26T23:14:45Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@6c1f03d6947af25a416e73fd308baa9af08371b6

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@6c1f03d6947af25a416e73fd308baa9af08371b6

Last updated for commit: 6c1f03d • Browse code

github-actions · 2026-05-26T23:15:38Z

Fern Docs Preview: https://nvidia-preview-d8166f00-bec4-4d13-b0c4-274920879af7.docs.buildwithfern.com/aiperf/dev

coderabbitai · 2026-05-26T23:19:20Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5cdbf449-c86d-46ae-886a-65cc284533ad

📥 Commits

Reviewing files that changed from the base of the PR and between 047d59d and 1ab2383.

📒 Files selected for processing (2)

src/aiperf/api/routers/core.py
tests/unit/api/routers/test_core.py

Walkthrough

This PR adds a /api/run endpoint that returns RunInfo projected from the current BenchmarkRun (omitting unset fields), centralizes projection in RunInfo.from_run(), updates the metrics exporter to use it, and adds tests plus API documentation and navigation.

Changes

HTTP API Run Identity Endpoint

Layer / File(s)	Summary
RunInfo data model and type safety `src/aiperf/common/models/export_models.py`	Adds `RunInfo.from_run()` classmethod to construct run identity blocks from `BenchmarkRun` objects; introduces `from __future__ import annotations`, `TYPE_CHECKING`-guarded imports, and fixes a forward-reference return annotation.
GET /api/run endpoint implementation `src/aiperf/api/routers/core.py`	Adds the `/api/run` FastAPI endpoint returning `RunInfo` (with `response_model_exclude_none=True`) or raising HTTP 503 when no active run exists; updates module and router docstrings and imports.
Exporter integration with RunInfo model `src/aiperf/exporters/metrics_json_exporter.py`	Refactors `MetricsJsonExporter._generate_content` to use `RunInfo.from_run(self._run)` and removes the module-local `_build_run_metadata` helper.
Tests, API redaction verification, and documentation `tests/unit/api/routers/test_core.py`, `docs/index.yml`, `docs/reference/api-endpoints.md`	Adds tests validating `/api/run` returns only identity fields, omits unset optional fields, returns 503 when no run is active, verifies CLI secret redaction (e.g., `--api-key`), and adds API reference doc plus navigation entry.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped to the endpoint at /api/run,

where identities bloom and secrets are done.
The CLI trims its whispers to <redacted> light,
Metrics and health hum day and night,
A little rabbit guards the API’s sight.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a new /api/run endpoint that exposes run-identity metadata, which is the primary objective of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 86.67% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/unit/api/routers/test_core.py (1)

40-97: ⚡ Quick win

Add a negative-path test for /api/run returning 503 when no run is active.

The router has an explicit 503 branch for missing run context, but this contract is not covered here yet. A single test would lock that behavior and prevent regressions.

Proposed test addition

 class TestRunEndpoint:
@@
+    def test_run_without_active_run_returns_503(
+        self,
+        api_test_client: TestClient,
+        mock_fastapi_service: FastAPIService,
+    ) -> None:
+        mock_fastapi_service.run = None
+        response = api_test_client.get("/api/run")
+        assert response.status_code == 503
+        assert response.json() == {"detail": "No active benchmark run."}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/api/routers/test_core.py` around lines 40 - 97, Add a
negative-path test to TestRunEndpoint that verifies GET /api/run returns 503
when no run is active: create a FastAPIService with no run context (e.g.,
FastAPIService(run=None, service_id="no-run-test")), wrap its app in TestClient,
call client.get("/api/run") and assert response.status_code == 503 and the body
indicates a missing run; place this alongside the other TestRunEndpoint tests to
lock the router's 503 behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/api/routers/test_core.py`:
- Around line 40-97: Add a negative-path test to TestRunEndpoint that verifies
GET /api/run returns 503 when no run is active: create a FastAPIService with no
run context (e.g., FastAPIService(run=None, service_id="no-run-test")), wrap its
app in TestClient, call client.get("/api/run") and assert response.status_code
== 503 and the body indicates a missing run; place this alongside the other
TestRunEndpoint tests to lock the router's 503 behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f684b964-faae-408d-a6cb-aeeb1d10fc69

📥 Commits

Reviewing files that changed from the base of the PR and between 03167d0 and 047d59d.

📒 Files selected for processing (6)

docs/index.yml
docs/reference/api-endpoints.md
src/aiperf/api/routers/core.py
src/aiperf/common/models/export_models.py
src/aiperf/exporters/metrics_json_exporter.py
tests/unit/api/routers/test_core.py

codecov · 2026-05-26T23:23:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Address review feedback from #997. response_model_exclude_none=True on the /api/run decorator so the live response matches profile_export_aiperf.json's exclude_none projection. The route's docstring promised shape parity; without this flag, unset Optional fields were serialized as null in the API but omitted on disk. Tests: - assert unset Optional fields (sweep_id, random_seed, variation_*) are absent from the response, not serialized as null - lock the 503 "no active benchmark run" contract by mutating mock_fastapi_service.run = None, defensively guarding a branch the type system makes unreachable today

dynamo-ops

Previous review comments have been addressed. Approving.

dynamo-ops

Previous review comments have been addressed. Approving.

…se/0.9.0 (#1003) Co-authored-by: Matthew Kotila <[email protected]>

matthewkotila requested review from FrankD412 and ajcasagrande May 26, 2026 23:14

github-actions Bot added the feat label May 26, 2026

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

dynamo-ops reviewed May 26, 2026

View reviewed changes

Comment thread src/aiperf/api/routers/core.py Outdated

matthewkotila commented May 27, 2026

View reviewed changes

Comment thread docs/reference/api-endpoints.md

dynamo-ops approved these changes May 27, 2026

View reviewed changes

FrankD412 approved these changes May 27, 2026

View reviewed changes

Merge branch 'main' into mkotila/api-run-endpoint

6c1f03d

FrankD412 enabled auto-merge (squash) May 27, 2026 02:11

FrankD412 merged commit ce24a54 into main May 27, 2026
29 of 30 checks passed

FrankD412 deleted the mkotila/api-run-endpoint branch May 27, 2026 02:13

dynamo-ops approved these changes May 27, 2026

View reviewed changes

debermudez mentioned this pull request May 27, 2026

feat(api): cherry-pick #997 (/api/run run-identity endpoint) to release/0.9.0 #1003

Merged

2 tasks

saturley-hall pushed a commit that referenced this pull request May 27, 2026

feat(api): cherry-pick #997 (/api/run run-identity endpoint) to relea…

f2c751c

…se/0.9.0 (#1003) Co-authored-by: Matthew Kotila <[email protected]>

matthewkotila mentioned this pull request May 27, 2026

fix(redact): tighten sensitive-token list to stop matching LLM token-count flags #1006

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): add /api/run endpoint exposing run-identity metadata#997

feat(api): add /api/run endpoint exposing run-identity metadata#997
FrankD412 merged 3 commits into
mainfrom
mkotila/api-run-endpoint

matthewkotila commented May 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

dynamo-ops left a comment

Uh oh!

Uh oh!

dynamo-ops left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

matthewkotila commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

/api/run: how a "regression fix" turned into a design choice

The story

What shipped

Verification

Decisions worth remembering

Follow-ups

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

dynamo-ops left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dynamo-ops left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matthewkotila commented May 26, 2026 •

edited by coderabbitai Bot

Loading

`/api/run`: how a "regression fix" turned into a design choice

github-actions Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading

codecov Bot commented May 26, 2026 •

edited

Loading