Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ python -m Magpie.mcp
| **Compare** | Multi-kernel comparison and ranking | ✅ |
| **Benchmark** | Framework-level benchmarking (vLLM/SGLang/Atom) with trace analysis | ✅ |

> 📖 See [Benchmark mode](docs/how-to/benchmark.md) for vLLM/SGLang/Atom usage.
> 📖 See [Benchmark mode](docs/how-to/benchmarking/benchmark.md) for vLLM/SGLang/Atom usage.
> 📖 See [Analyze vs Compare](docs/how-to/analyze-compare.md) for kernel evaluation modes.

## Configuration
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ python -m sphinx -T -b html docs docs/_build/html
| `reference/compatibility-matrix.md` | Compatibility Matrix | Verified hardware/software versions. Contains `TODO (verify)` markers. |
| `reference/api-reference.md` | API Reference | CLI commands and options, configuration schema, and MCP tools. |
| `how-to/analyze-compare.md` | How-to | Analyze vs compare kernel modes. |
| `how-to/benchmark.md` | How-to | vLLM/SGLang/Atom benchmarking, TraceLens, gap analysis. |
| `how-to/benchmarking/benchmark.md` | How-to | vLLM/SGLang/Atom benchmarking, TraceLens, gap analysis. |
| `how-to/ray.md` | How-to | Remote execution on a Ray cluster. |
| `how-to/mcp-and-skills.md` | How-to | MCP server and agent skill installation. |
| `how-to/kernel-source-finder.md` | How-to | Locating kernel sources from traces. |
Expand Down
13 changes: 7 additions & 6 deletions docs/about/license.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# License
---
myst:
html_meta:
"description": "The full MIT License text for Magpie, an open-source GPU kernel evaluation framework developed by AMD-AGI."
"keywords": "Magpie, MIT license, open source, AMD-AGI, license text"
---

Magpie is released under the MIT License. The full license text below matches
the [`LICENSE`](https://github.com/AMD-AGI/Magpie/blob/main/LICENSE) file in the
Magpie GitHub repository.
# License

```text
MIT License

Copyright (c) 2026 AMD-AGI
Expand All @@ -26,4 +28,3 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```
84 changes: 84 additions & 0 deletions docs/conceptual/benchmarking-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
myst:
html_meta:
"description": "Learn how Magpie's benchmark mode pipeline is structured, including components, execution flow, and integration with TraceLens and gap analysis."
"keywords": "Magpie, benchmark architecture, BenchmarkMode, TraceLens, gap analysis, vLLM, SGLang, ROCm, GPU, LLM inference"
---

# Magpie benchmarking mode architecture

Magpie's benchmark mode drives end-to-end performance evaluation of LLM inference frameworks—vLLM, SGLang, and Atom—by launching a server, running a client workload, and collecting throughput and latency metrics into a structured JSON report. Benchmarks can run inside a Docker container (the default), directly on the host, or on a remote Ray cluster, and they optionally capture torch profiler traces for downstream analysis with TraceLens and gap analysis. This page describes the components that make up the benchmark pipeline, the execution flow from configuration to report generation, and how the pieces connect.

## Architecture

Magpie benchmark mode is composed of the following key components that work together to run, profile, and analyze inference framework benchmarks.

### Components

Benchmark mode consists of the following Python modules.

| Component | File | Description |
|-----------|------|-------------|
| `BenchmarkMode` | `benchmarker.py` | Main orchestrator |
| `BenchmarkConfig` | `config.py` | Configuration dataclasses |
| `TraceLensAnalyzer` | `tracelens.py` | TraceLens CLI integration |
| `GapAnalyzer` | `gap_analysis.py` | Kernel bottleneck analysis |
| `BenchmarkResult` | `result.py` | Result data structures |

### Execution flow

Each benchmark run proceeds through the following stages.

1. **Configuration Loading**: Parse YAML config into `BenchmarkConfig`
2. **Runtime Setup**: For `run_mode: docker`, prepare a container with InferenceX; for `local`, use the host environment
3. **Server Launch**: Start vLLM/SGLang server (in container or on host per `run_mode`)
4. **Client Execution**: Run benchmark client with profiling enabled
5. **Trace Collection**: Torch profiler traces saved to workspace
6. **TraceLens Analysis**: Run TraceLens CLI commands inside the runtime image
for Docker inference mode, or on host for local/classic mode (if enabled)
7. **Gap Analysis**: Analyze kernel bottlenecks within time window (if enabled)
8. **Result Generation**: Aggregate metrics and generate reports

### Architecture diagram

The following diagram shows how Magpie orchestrates the benchmark pipeline.

```
┌─────────────────────────────────────────────────────────────────────┐
│ Benchmark Mode │
├─────────────────────────────────────────────────────────────────────┤
│ ┌───────────────┐ ┌───────────────┐ ┌────────────────────┐ │
│ │BenchmarkConfig│ → │ BenchmarkMode │ → │ BenchmarkResult │ │
│ │ (YAML) │ │ │ │ (JSON + CSV) │ │
│ └───────────────┘ └───────────────┘ └────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Runtime: docker │ local │ ray │ │
│ │ ┌─────────────┐ ┌─────────────────────────────────┐ │ │
│ │ │ InferenceX │ → │ vLLM / SGLang Server + Client │ │ │
│ │ │ scripts │ │ + Torch Profiler │ │ │
│ │ └─────────────┘ └─────────────────────────────────┘ │ │
│ │ Ray: Magpie driver → RayJobExecutor → GPU worker runs the │ │
│ │ same stack (local/docker on worker; NFS for cache/ │ │
│ │ results). See ray.md │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ ▼ ▼ │
│ ┌────────────────────────┐ ┌─────────────────────────────────┐ │
│ │ Gap Analysis │ │ TraceLens Analysis │ │
│ │ • Time window filter │ │ • Perf report (per-rank) │ │
│ │ • Category filter │ │ • Multi-rank collective report │ │
│ │ • Kernel stats CSV │ │ │ │
│ └────────────────────────┘ └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```

## More info

- [Benchmark frameworks with Magpie](../how-to/benchmarking/benchmark.md) — how-to guide covering configuration, run modes, TraceLens, gap analysis, and examples
- [Magpie benchmark mode configuration](../reference/benchmark-config.md) — full YAML schema with all available options and defaults
- [Run Magpie on a Ray cluster](../how-to/ray.md) — running benchmarks on remote GPU nodes using Ray
- [Find kernel sources with Magpie](../how-to/kernel-source-finder.md) — mapping kernel names from gap analysis output to source files
- [Magpie troubleshooting](../reference/troubleshooting.md) — solutions for common benchmark errors
60 changes: 60 additions & 0 deletions docs/conceptual/ray-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
myst:
html_meta:
"description": "Understand Magpie's Ray integration driver-worker model, executor selection, and end-to-end task flow for running GPU workloads on remote Ray clusters."
"keywords": "Magpie, Ray architecture, RayJobExecutor, driver worker, remote GPU, distributed benchmark, ROCm, CUDA"
---

# Magpie on Ray architecture

Magpie's Ray integration offloads analyze, compare, and benchmark workloads from the machine running the CLI or MCP server onto GPU-capable worker nodes in a Ray cluster, without changing the evaluation logic itself. The integration is built around a driver-worker split: the driver process submits a remote function via `RayJobExecutor`, and the worker node executes the same `AnalyzeMode`, `CompareMode`, or `BenchmarkMode` code it would run locally. This page describes how executor selection works, how the task flows end-to-end, and where to find the relevant source files.

Magpie's Ray integration follows a driver-worker model where the driver submits tasks and workers execute them on GPU-capable nodes.

## Driver vs worker

Magpie's Ray integration uses two roles: the driver process that submits work, and the worker nodes that execute it.

- **Driver**: process running `python -m Magpie …`, MCP, or your script. It calls `Scheduler` or `BenchmarkMode`, connects with `ray.init(address=…)`, and submits a remote function.
- **Worker**: Ray executes `Magpie.remote.tasks.run_task` on a GPU-capable node. That function dispatches to `_run_analyze`, `_run_compare`, or `_run_benchmark`.

## Executor selection

The executor is chosen based on `SchedulerConfig.environment_type`.

| `SchedulerConfig.environment_type` | Executor | Execution |
|-----------------------------------|----------|-----------|
| `local` | `LocalExecutor` | Subprocesses on the driver machine (`Magpie/core/executor.py`). |
| `container` | Container executor | Isolated environment on the driver (kernel flows). |
| `ray` | `RayJobExecutor` | `ray.remote(run_task)` on a cluster node (`Magpie/core/ray_executor.py`). |

Benchmark mode additionally uses `BenchmarkConfig.run_mode`: `docker`, `local`, or `ray`. When `run_mode` is `ray`, `BenchmarkMode` builds a `Task` and uses `RayJobExecutor` internally (`Magpie/modes/benchmark/benchmarker.py`).

## End-to-end flow

```mermaid
flowchart LR
subgraph Driver
CLI[MCP / CLI]
SCH[Scheduler or BenchmarkMode]
RJE[RayJobExecutor]
CLI --> SCH --> RJE
end
subgraph Cluster
RT[run_task]
A[AnalyzeMode]
C[CompareMode]
B[BenchmarkMode]
RJE -->|ray.remote| RT
RT --> A
RT --> C
RT --> B
end
```

## More info

- [Magpie on Ray](../how-to/ray.md) — how-to guide covering cluster setup, configuration, shared storage, and troubleshooting
- [Benchmark frameworks with Magpie](../how-to/benchmarking/benchmark.md) — benchmark run modes including `run_mode: ray`
- [Magpie benchmarking mode architecture](benchmarking-architecture.md) — how the benchmark pipeline is designed and how components interact
- [Ray documentation](https://docs.ray.io/) — cluster setup, job submission, and runtime environments
86 changes: 56 additions & 30 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -1,37 +1,63 @@
# Configuration file for the Sphinx documentation builder.
#
# Magpie documentation is built with rocm-docs-core, which configures the
# theme, navigation, MyST Markdown support, and shared ROCm options. Both
# Markdown (.md, via MyST) and reStructuredText (.rst) source files build out
# of the box.
#
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# https://rocm.docs.amd.com/projects/rocm-docs-core/en/latest/

# -- Project information ------------------------------------------------------
"""
html_theme is usually unchanged (rocm_docs_theme).
flavor defines the site header display, select the flavor for the corresponding portals
flavor options: rocm, rocm-docs-home, rocm-blogs, rocm-ds, instinct, ai-developer-hub, local, generic
"""

version_number = "0.1.0"

html_theme = "rocm_docs_theme"
html_theme_options = {
"flavor": "generic",
"header_title": f"Magpie {version_number}",
"header_link": False,
"version_list_link": False,
"nav_secondary_items": {
"GitHub": False,
"Community": False,
"Blogs": "https://rocm.blogs.amd.com/",
"ROCm Developer Hub": "https://www.amd.com/en/developer/resources/rocm-hub.html",
"Instinct™ Docs": "https://instinct.docs.amd.com/",
"Infinity Hub": "https://www.amd.com/en/developer/resources/infinity-hub.html",
"Support": False,
},
"link_main_doc": False,
}

# This section turns on/off article info
setting_all_article_info = True
all_article_info_os = ["linux"]
all_article_info_author = ""

# for PDF output on Read the Docs
project = "Magpie"
author = "Advanced Micro Devices, Inc."
copyright = "2026, Advanced Micro Devices, Inc."

# Single-sourced version. Update alongside pyproject.toml / package version.
version = "0.1.0"
release = version

# -- General configuration ----------------------------------------------------
copyright = "Copyright (c) 2026 Advanced Micro Devices, Inc. All rights reserved."
version = version_number
release = version_number

external_toc_path = "./sphinx/_toc.yml" # Defines Table of Content structure definition path

"""
Doxygen Settings
Ensure Doxyfile is located at docs/doxygen.
If the component does not need doxygen, delete this section for optimal build time
"""
# doxygen_root = "doxygen"
# doxysphinx_enabled = True
# doxygen_project = {
# "name": "doxygen",
# "path": "doxygen/xml",
# }

# Add more addtional package accordingly
extensions = [
"rocm_docs",
"sphinxcontrib.mermaid"
]

extensions = ["rocm_docs", "sphinxcontrib.mermaid"]

# Render fenced ```mermaid code blocks in Markdown as diagrams.
myst_fence_as_directive = ["mermaid"]

external_toc_path = "./sphinx/_toc.yml"

# docs/README.md documents the build process for contributors and is not a
# published page; keep it out of the source build so it is not treated as an
# orphan document.
exclude_patterns = ["README.md"]
html_title = f"{project} {version_number} documentation"

# rocm-docs-core options.
html_theme = "rocm_docs_theme"
html_theme_options = {"flavor": "rocm-docs-home"}
external_projects_current_project = "Magpie"
23 changes: 15 additions & 8 deletions docs/examples/examples.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
# Examples
---
myst:
html_meta:
"description": "Step-by-step Magpie examples for analyzing HIP kernels, comparing implementations, benchmarking vLLM with TraceLens, and running standalone gap analysis on GPU traces."
"keywords": "Magpie, examples, HIP kernel, compare kernels, vLLM benchmark, TraceLens, gap analysis, ROCm, CUDA, GPU"
---

This page provides end-to-end, step-by-step examples for common Magpie use
# Magpie examples

This topic provides end-to-end, step-by-step examples for common Magpie use
cases. Each example lists the prerequisites, the exact commands to run, and the
expected output. All example configuration files referenced here live in the
[`examples/`](https://github.com/AMD-AGI/Magpie/tree/main/examples) directory of
the Magpie repository.

Run every command from the Magpie repository root unless noted otherwise.

## Example 1: Analyze a simple HIP kernel
## Analyze a simple HIP kernel

This example analyzes a minimal HIP `vector_add` kernel for correctness using a
testcase command.
Expand Down Expand Up @@ -55,7 +62,7 @@ Magpie reports a passing correctness state and writes a JSON report to
with an overall `score` of `1.0` when correctness succeeds and profiling is
skipped.

## Example 2: Compare two kernel implementations
## Compare two kernel implementations

This example compares BF16 and FP16 grouped GEMM kernels from Composable Kernel
and ranks them by performance.
Expand Down Expand Up @@ -108,7 +115,7 @@ Magpie evaluates both kernels, prints a ranked comparison against the baseline
implementation. See [Analyze and compare kernels](../how-to/analyze-compare.md)
for how scores and rankings are computed.

## Example 3: Benchmark vLLM with TraceLens analysis
## Benchmark vLLM with TraceLens analysis

This example runs a framework-level benchmark of vLLM and analyzes the resulting
traces.
Expand Down Expand Up @@ -139,10 +146,10 @@ traces.

Magpie launches the benchmark, collects throughput and latency metrics, and (for
the TraceLens config) produces a trace analysis report under the benchmark
workspace in `./results`. See [Benchmark frameworks](../how-to/benchmark.md) for
workspace in `./results`. See [Benchmark frameworks with Magpie](../how-to/benchmarking/benchmark.md) for
the full result layout and metric descriptions.

## Example 4: Standalone gap analysis on existing traces
## Standalone gap analysis on existing traces

If you already have torch profiler traces, you can run gap analysis without
launching a benchmark to find the kernels that dominate runtime.
Expand All @@ -161,7 +168,7 @@ Magpie writes a `gap_analysis/gap_analysis.csv` file (plus optional per-rank
CSVs) under the trace directory, listing the top bottleneck kernels by
aggregated duration. Add `--find-kernel-sources` to also locate kernel source
files and test commands for AMD kernels; see
[Find kernel sources](../how-to/kernel-source-finder.md).
[Find kernel sources with Magpie](../how-to/kernel-source-finder.md).

## More examples

Expand Down
Loading