ROCm docs style IA and style guide updates#50
Conversation
sinarafati-amd
left a comment
There was a problem hiding this comment.
overall looks good. left some few comments to be addressed before merging
| # Standalone gap analysis on existing traces | ||
| python -m Magpie benchmark gap-analysis --trace-dir results/benchmark_vllm_<timestamp>/ | ||
|
|
There was a problem hiding this comment.
According to Cursor, there is no gap-analysis subcommand. Magpie/main.py implements standalone gap analysis via --trace-dir on benchmark.
Suggested fix:
python -m Magpie benchmark --trace-dir results/benchmark_vllm_<timestamp>/|
|
||
| Magpie's benchmark mode runs end-to-end performance tests against LLM inference frameworks—vLLM, SGLang, and Atom—and collects throughput and latency metrics in a structured JSON report. Benchmarks can run inside a Docker container, directly on the host, or on a remote Ray cluster, and optionally capture torch profiler traces for deeper analysis with TraceLens and gap analysis. Use this mode to measure inference performance on AMD Instinct™ GPUs and identify the GPU kernels that dominate runtime. | ||
|
|
||
| Review these topics for more information: |
There was a problem hiding this comment.
| Review these topics for more information: | |
| For more information, see the following topics: |
|
|
||
| ## Benchmark report | ||
|
|
||
| The primary summary file is **`benchmark_report.json`** in the run workspace (see `WorkspaceManager.save_report`). It aggregates throughput, latency, and optional `gap_analysis` / `tracelens_analysis` sections. A typical shape (abbreviated, with `...` marking elided values): |
There was a problem hiding this comment.
It is not clear what "see WorkspaceManager.save_report" is directing the user to?
| } | ||
| ``` | ||
|
|
||
| ## More info |
There was a problem hiding this comment.
Not sure why we need both More Info and Related Sources.
| - [Automatic GPU selection in Magpie's benchmark mode](automatic-gpu.md) — how Magpie picks idle GPUs before launching and how to override or disable selection | ||
| - [Persistent server reuse (local) in Magpie's benchmark mode](persistent-server-reuse.md) — keep a server alive across runs to avoid model reload overhead | ||
| - [Profiling options in Magpie's benchmark mode](profiling-options.md) — configure torch profiler, TraceLens, and gap analysis | ||
| - [Analyze and compare kernels with Magpie](../analyze-compare.md) — kernel evaluation modes (orthogonal to Benchmark) |
There was a problem hiding this comment.
| - [Analyze and compare kernels with Magpie](../analyze-compare.md) — kernel evaluation modes (orthogonal to Benchmark) | |
| - [Analyze and compare kernels with Magpie](../analyze-compare.md) — kernel evaluation modes independent of benchmark mode |
| python -m Magpie benchmark gap-analysis \ | ||
| --trace-dir results/benchmark_vllm_<timestamp>/torch_trace \ | ||
| --start-pct 50 --end-pct 80 \ | ||
| --top-k 15 \ | ||
| --categories kernel gpu \ | ||
| --ignore-categories gpu_user_annotation |
|
|
||
| # Magpie troubleshooting | ||
|
|
||
| This topic covers errors and debugging techniques. Each section presents symptoms and their solutions in a table so you can quickly find the issue you're seeing. For benchmark configuration problems not listed here, enable verbose logging with `--log-level DEBUG` and check the output before filing a bug report. |
There was a problem hiding this comment.
The api-reference.md line 29 mentions --verbose / -v for Debug logging, but the later Config settings discusses logging at line 162, but without any levels or options discussed. It is not clear from the api-reference, what the correct approach is for detailed logging for debug purposes?
Edited and restructured docs according to our Style Guide and Diataxis information architecture guidance.