feat(api): add /api/run endpoint exposing run-identity metadata#997
Conversation
The YAML-native v2 config refactor (PR #912) correctly moved cli_command, benchmark_id, and sweep_id from UserConfig onto BenchmarkRun, leaving /api/config as a pure BenchmarkConfig projection. External consumers reading cli_command from /api/config broke as a result (NVBugs 6204764 / JIRA DYN-699). Restoring cli_command on /api/config would re-create the category error the refactor cleaned up (run-execution metadata on a declarative config response). Instead, expose run-identity via a dedicated endpoint that matches the underlying model and shares its shape with the run_info block in profile_export_aiperf.json, so live and on-disk readers see identical fields. - GET /api/run -> RunInfo (benchmark_id, sweep_id, trial, random_seed, run_label, variation_*, redacted cli_command). - RunInfo.from_run() classmethod as the single projection function shared by /api/run and the JSON exporter. - Tests: /api/config must not expose run-identity; /api/run must not expose config or secrets; end-to-end redaction test monkey-patches sys.argv with --api-key <secret> and asserts the secret never reaches the API response (mirrors QA's test_api_key_redaction). - New docs/reference/api-endpoints.md documenting all HTTP API routes (the surface has been undocumented on main since #717 landed without the api-service tutorial that only existed on the ajc/websocket-dashboard-ui branch).
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@6c1f03d6947af25a416e73fd308baa9af08371b6Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@6c1f03d6947af25a416e73fd308baa9af08371b6Last updated for commit: |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR adds a /api/run endpoint that returns RunInfo projected from the current BenchmarkRun (omitting unset fields), centralizes projection in RunInfo.from_run(), updates the metrics exporter to use it, and adds tests plus API documentation and navigation. ChangesHTTP API Run Identity Endpoint
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/api/routers/test_core.py (1)
40-97: ⚡ Quick winAdd a negative-path test for
/api/runreturning503when no run is active.The router has an explicit
503branch for missing run context, but this contract is not covered here yet. A single test would lock that behavior and prevent regressions.Proposed test addition
class TestRunEndpoint: @@ + def test_run_without_active_run_returns_503( + self, + api_test_client: TestClient, + mock_fastapi_service: FastAPIService, + ) -> None: + mock_fastapi_service.run = None + response = api_test_client.get("/api/run") + assert response.status_code == 503 + assert response.json() == {"detail": "No active benchmark run."}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/api/routers/test_core.py` around lines 40 - 97, Add a negative-path test to TestRunEndpoint that verifies GET /api/run returns 503 when no run is active: create a FastAPIService with no run context (e.g., FastAPIService(run=None, service_id="no-run-test")), wrap its app in TestClient, call client.get("/api/run") and assert response.status_code == 503 and the body indicates a missing run; place this alongside the other TestRunEndpoint tests to lock the router's 503 behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/api/routers/test_core.py`:
- Around line 40-97: Add a negative-path test to TestRunEndpoint that verifies
GET /api/run returns 503 when no run is active: create a FastAPIService with no
run context (e.g., FastAPIService(run=None, service_id="no-run-test")), wrap its
app in TestClient, call client.get("/api/run") and assert response.status_code
== 503 and the body indicates a missing run; place this alongside the other
TestRunEndpoint tests to lock the router's 503 behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f684b964-faae-408d-a6cb-aeeb1d10fc69
📒 Files selected for processing (6)
docs/index.ymldocs/reference/api-endpoints.mdsrc/aiperf/api/routers/core.pysrc/aiperf/common/models/export_models.pysrc/aiperf/exporters/metrics_json_exporter.pytests/unit/api/routers/test_core.py
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Address review feedback from #997. response_model_exclude_none=True on the /api/run decorator so the live response matches profile_export_aiperf.json's exclude_none projection. The route's docstring promised shape parity; without this flag, unset Optional fields were serialized as null in the API but omitted on disk. Tests: - assert unset Optional fields (sweep_id, random_seed, variation_*) are absent from the response, not serialized as null - lock the 503 "no active benchmark run" contract by mutating mock_fastapi_service.run = None, defensively guarding a branch the type system makes unreachable today
dynamo-ops
left a comment
There was a problem hiding this comment.
Previous review comments have been addressed. Approving.
dynamo-ops
left a comment
There was a problem hiding this comment.
Previous review comments have been addressed. Approving.
…se/0.9.0 (#1003) Co-authored-by: Matthew Kotila <[email protected]>
/api/run: how a "regression fix" turned into a design choiceBranch:
mkotila/api-run-endpoint· Commit:047d59dd· Tickets: NVBugs 6204764 / JIRA DYN-699The story
A QA test broke. The reporter framed it as a regression:
curl /api/config | jq .cli_commandused to return the redacted launch command, now it returnsnull. The bug ticket arrived with a one-paragraph "AI suggestion" recommending the obvious fix — stitchcli_commandback onto/api/configand call it done.The obvious fix was wrong. Here's why.
The break came from PR #912 (the YAML-native v2 config refactor). Before that refactor, a single
UserConfigclass held everything: declarative spec (endpoint,loadgen,tokenizer, …) and per-run identity (benchmark_id,cli_command, …) jammed together at one level./api/configdumped the whole grab-bag. PR #912 split it cleanly — declarative parts intoBenchmarkConfig, identity into a newBenchmarkRunenvelope. That split is the right shape: a CLI command isn't part of "the config," it's run-execution metadata — the thing that produced a config. The endpoint was migrated to return onlyBenchmarkConfig, which is honest, but it silently droppedcli_commandandbenchmark_idalong with it.So you see the trap. Option A from the bug ticket — stitch
cli_commandback onto/api/config— would re-create the category error PR #912 just cleaned up. You'd paper over the architectural improvement to preserve a stale API shape.Option B is the move: add a new endpoint,
/api/run, that exposes theBenchmarkRunenvelope.cli_commandlives there because that's where the field actually lives in the model./api/configstays pure. As a bonus, the response is shaped asRunInfo— the same model already used inprofile_export_aiperf.json— so the live API and the on-disk export can't drift.The cost is one line in QA's test (
/api/config→/api/run). No in-tree consumers to migrate; the web dashboard that would have read it was on an unmerged branch and never landed.What shipped
src/aiperf/common/models/export_models.pyRunInfo.from_run(run)classmethod — single projection fromBenchmarkRunsrc/aiperf/exporters/metrics_json_exporter.py_build_run_metadatawith the new classmethod (one source of truth)src/aiperf/api/routers/core.pyGET /api/run→RunInfotests/unit/api/routers/test_core.py/api/configno-grab-bag guard;/api/runshape + no-secrets; end-to-end redaction test (sys.argvwith--api-key sk-secret-…→ asserts secret not in response)docs/reference/api-endpoints.md(new)docs/index.ymlVerification
pytest tests/unit/api/ tests/unit/exporters/ruff format --check+ruff check(touched files)tools/check_docs_index.py/api/runpresent in route tablepre-commit run --all-files(host)api-endpoints.mdre-checked against codeDecisions worth remembering
/api/run— bare singular noun matches/api/config's convention;RunInfois the internal model name, not the route's job.api-service.mdfromajc/websocket-dashboard-ui; a compact reference table closes the gap without expanding scope.Follow-ups
test_api_key_redactionfrom/api/config→/api/run. The new in-treetest_run_does_not_leak_api_key_in_cli_commandmirrors the same contract.047d59dd.Summary by CodeRabbit
New Features
Bug Fixes
Documentation