Skip to content

feat(api): add /api/run endpoint exposing run-identity metadata#997

Merged
FrankD412 merged 3 commits into
mainfrom
mkotila/api-run-endpoint
May 27, 2026
Merged

feat(api): add /api/run endpoint exposing run-identity metadata#997
FrankD412 merged 3 commits into
mainfrom
mkotila/api-run-endpoint

Conversation

@matthewkotila
Copy link
Copy Markdown
Contributor

@matthewkotila matthewkotila commented May 26, 2026

/api/run: how a "regression fix" turned into a design choice

Branch: mkotila/api-run-endpoint · Commit: 047d59dd · Tickets: NVBugs 6204764 / JIRA DYN-699

The story

A QA test broke. The reporter framed it as a regression: curl /api/config | jq .cli_command used to return the redacted launch command, now it returns null. The bug ticket arrived with a one-paragraph "AI suggestion" recommending the obvious fix — stitch cli_command back onto /api/config and call it done.

The obvious fix was wrong. Here's why.

The break came from PR #912 (the YAML-native v2 config refactor). Before that refactor, a single UserConfig class held everything: declarative spec (endpoint, loadgen, tokenizer, …) and per-run identity (benchmark_id, cli_command, …) jammed together at one level. /api/config dumped the whole grab-bag. PR #912 split it cleanly — declarative parts into BenchmarkConfig, identity into a new BenchmarkRun envelope. That split is the right shape: a CLI command isn't part of "the config," it's run-execution metadata — the thing that produced a config. The endpoint was migrated to return only BenchmarkConfig, which is honest, but it silently dropped cli_command and benchmark_id along with it.

So you see the trap. Option A from the bug ticket — stitch cli_command back onto /api/config — would re-create the category error PR #912 just cleaned up. You'd paper over the architectural improvement to preserve a stale API shape.

Option B is the move: add a new endpoint, /api/run, that exposes the BenchmarkRun envelope. cli_command lives there because that's where the field actually lives in the model. /api/config stays pure. As a bonus, the response is shaped as RunInfo — the same model already used in profile_export_aiperf.json — so the live API and the on-disk export can't drift.

The cost is one line in QA's test (/api/config/api/run). No in-tree consumers to migrate; the web dashboard that would have read it was on an unmerged branch and never landed.

What shipped

File Change
src/aiperf/common/models/export_models.py RunInfo.from_run(run) classmethod — single projection from BenchmarkRun
src/aiperf/exporters/metrics_json_exporter.py Replaced private _build_run_metadata with the new classmethod (one source of truth)
src/aiperf/api/routers/core.py New GET /api/runRunInfo
tests/unit/api/routers/test_core.py +5 tests: /api/config no-grab-bag guard; /api/run shape + no-secrets; end-to-end redaction test (sys.argv with --api-key sk-secret-… → asserts secret not in response)
docs/reference/api-endpoints.md (new) Full HTTP API surface documented for the first time
docs/index.yml Fern index registration for the new page

Verification

Check Result
pytest tests/unit/api/ tests/unit/exporters/ 347 passed, 1 skipped
ruff format --check + ruff check (touched files) clean
tools/check_docs_index.py OK: all 102 docs listed
Live FastAPI startup log /api/run present in route table
pre-commit run --all-files (host) passed
Every claim in api-endpoints.md re-checked against code passed (2 minor edits applied)

Decisions worth remembering

  • Endpoint name /api/run — bare singular noun matches /api/config's convention; RunInfo is the internal model name, not the route's job.
  • Reference doc over tutorial — the API surface had been undocumented since PR feat(api): add FastAPI service with plugin-based router system #717 landed without the api-service.md from ajc/websocket-dashboard-ui; a compact reference table closes the gap without expanding scope.
  • Single commit — code + tests + docs in one self-contained unit.

Follow-ups

  • QA test: migrate test_api_key_redaction from /api/config/api/run. The new in-tree test_run_does_not_leak_api_key_in_cli_command mirrors the same contract.
  • Comment on NVBugs 6204764 / DYN-699 with fix summary + the QA migration note.
  • If 0.9.0 backport is needed: cherry-pick 047d59dd.

Summary by CodeRabbit

  • New Features

    • Added /api/run endpoint exposing active run identity/metadata with automatic redaction of API keys and sensitive CLI/credential values; omits unset optional fields and returns 503 when no active run.
  • Bug Fixes

    • /api/config no longer exposes run-identity fields.
  • Documentation

    • Added comprehensive HTTP API reference with endpoints, Prometheus/metrics, health checks, results/artifacts, OpenAPI and curl examples.

Review Change Stack

The YAML-native v2 config refactor (PR #912) correctly moved
cli_command, benchmark_id, and sweep_id from UserConfig onto
BenchmarkRun, leaving /api/config as a pure BenchmarkConfig
projection. External consumers reading cli_command from /api/config
broke as a result (NVBugs 6204764 / JIRA DYN-699).

Restoring cli_command on /api/config would re-create the category
error the refactor cleaned up (run-execution metadata on a
declarative config response). Instead, expose run-identity via a
dedicated endpoint that matches the underlying model and shares
its shape with the run_info block in profile_export_aiperf.json,
so live and on-disk readers see identical fields.

- GET /api/run -> RunInfo (benchmark_id, sweep_id, trial,
  random_seed, run_label, variation_*, redacted cli_command).
- RunInfo.from_run() classmethod as the single projection function
  shared by /api/run and the JSON exporter.
- Tests: /api/config must not expose run-identity; /api/run must
  not expose config or secrets; end-to-end redaction test
  monkey-patches sys.argv with --api-key <secret> and asserts the
  secret never reaches the API response (mirrors QA's
  test_api_key_redaction).
- New docs/reference/api-endpoints.md documenting all HTTP API
  routes (the surface has been undocumented on main since #717
  landed without the api-service tutorial that only existed on
  the ajc/websocket-dashboard-ui branch).
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@6c1f03d6947af25a416e73fd308baa9af08371b6

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@6c1f03d6947af25a416e73fd308baa9af08371b6

Last updated for commit: 6c1f03dBrowse code

@github-actions github-actions Bot added the feat label May 26, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5cdbf449-c86d-46ae-886a-65cc284533ad

📥 Commits

Reviewing files that changed from the base of the PR and between 047d59d and 1ab2383.

📒 Files selected for processing (2)
  • src/aiperf/api/routers/core.py
  • tests/unit/api/routers/test_core.py

Walkthrough

This PR adds a /api/run endpoint that returns RunInfo projected from the current BenchmarkRun (omitting unset fields), centralizes projection in RunInfo.from_run(), updates the metrics exporter to use it, and adds tests plus API documentation and navigation.

Changes

HTTP API Run Identity Endpoint

Layer / File(s) Summary
RunInfo data model and type safety
src/aiperf/common/models/export_models.py
Adds RunInfo.from_run() classmethod to construct run identity blocks from BenchmarkRun objects; introduces from __future__ import annotations, TYPE_CHECKING-guarded imports, and fixes a forward-reference return annotation.
GET /api/run endpoint implementation
src/aiperf/api/routers/core.py
Adds the /api/run FastAPI endpoint returning RunInfo (with response_model_exclude_none=True) or raising HTTP 503 when no active run exists; updates module and router docstrings and imports.
Exporter integration with RunInfo model
src/aiperf/exporters/metrics_json_exporter.py
Refactors MetricsJsonExporter._generate_content to use RunInfo.from_run(self._run) and removes the module-local _build_run_metadata helper.
Tests, API redaction verification, and documentation
tests/unit/api/routers/test_core.py, docs/index.yml, docs/reference/api-endpoints.md
Adds tests validating /api/run returns only identity fields, omits unset optional fields, returns 503 when no run is active, verifies CLI secret redaction (e.g., --api-key), and adds API reference doc plus navigation entry.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped to the endpoint at /api/run,

where identities bloom and secrets are done.
The CLI trims its whispers to <redacted> light,
Metrics and health hum day and night,
A little rabbit guards the API’s sight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a new /api/run endpoint that exposes run-identity metadata, which is the primary objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 86.67% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/api/routers/test_core.py (1)

40-97: ⚡ Quick win

Add a negative-path test for /api/run returning 503 when no run is active.

The router has an explicit 503 branch for missing run context, but this contract is not covered here yet. A single test would lock that behavior and prevent regressions.

Proposed test addition
 class TestRunEndpoint:
@@
+    def test_run_without_active_run_returns_503(
+        self,
+        api_test_client: TestClient,
+        mock_fastapi_service: FastAPIService,
+    ) -> None:
+        mock_fastapi_service.run = None
+        response = api_test_client.get("/api/run")
+        assert response.status_code == 503
+        assert response.json() == {"detail": "No active benchmark run."}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/api/routers/test_core.py` around lines 40 - 97, Add a
negative-path test to TestRunEndpoint that verifies GET /api/run returns 503
when no run is active: create a FastAPIService with no run context (e.g.,
FastAPIService(run=None, service_id="no-run-test")), wrap its app in TestClient,
call client.get("/api/run") and assert response.status_code == 503 and the body
indicates a missing run; place this alongside the other TestRunEndpoint tests to
lock the router's 503 behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/api/routers/test_core.py`:
- Around line 40-97: Add a negative-path test to TestRunEndpoint that verifies
GET /api/run returns 503 when no run is active: create a FastAPIService with no
run context (e.g., FastAPIService(run=None, service_id="no-run-test")), wrap its
app in TestClient, call client.get("/api/run") and assert response.status_code
== 503 and the body indicates a missing run; place this alongside the other
TestRunEndpoint tests to lock the router's 503 behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f684b964-faae-408d-a6cb-aeeb1d10fc69

📥 Commits

Reviewing files that changed from the base of the PR and between 03167d0 and 047d59d.

📒 Files selected for processing (6)
  • docs/index.yml
  • docs/reference/api-endpoints.md
  • src/aiperf/api/routers/core.py
  • src/aiperf/common/models/export_models.py
  • src/aiperf/exporters/metrics_json_exporter.py
  • tests/unit/api/routers/test_core.py

Comment thread src/aiperf/api/routers/core.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Address review feedback from #997.

response_model_exclude_none=True on the /api/run decorator so the
live response matches profile_export_aiperf.json's exclude_none
projection. The route's docstring promised shape parity; without
this flag, unset Optional fields were serialized as null in the
API but omitted on disk.

Tests:
- assert unset Optional fields (sweep_id, random_seed, variation_*)
  are absent from the response, not serialized as null
- lock the 503 "no active benchmark run" contract by mutating
  mock_fastapi_service.run = None, defensively guarding a branch
  the type system makes unreachable today
Comment thread docs/reference/api-endpoints.md
Copy link
Copy Markdown

@dynamo-ops dynamo-ops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review comments have been addressed. Approving.

@FrankD412 FrankD412 enabled auto-merge (squash) May 27, 2026 02:11
@FrankD412 FrankD412 merged commit ce24a54 into main May 27, 2026
29 of 30 checks passed
@FrankD412 FrankD412 deleted the mkotila/api-run-endpoint branch May 27, 2026 02:13
Copy link
Copy Markdown

@dynamo-ops dynamo-ops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review comments have been addressed. Approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants