Feat/zero overhead latency by sinarafati-amd · Pull Request #22 · AMD-AGI/Magpie

sinarafati-amd · 2026-04-29T06:06:54Z

New Latency stage: wall-clock via CUDA/HIP-graph subprocess + kernel-only via rocprofv3 (HIP reuses existing pmc_perf.csv for free).
New magpie.bench (Python) + magpie_bench.hpp (C++) helpers; subprocess harness isolates Python/JIT/dispatch overhead, seeded for reproducibility.
Unified JSON report adds wall_median_ms, kernel_median_ms, dispatch_overhead_us, crosscheck_vs_rocprof_ratio.
Backward-compatible (opt-out via latency: {enabled: false});

irvineoy · 2026-05-07T01:11:00Z

I found a few issues worth addressing before merge (excluding the current merge conflicts):

primary_metric=kernel_median_ms can silently report wall-clock latency.

LatencyResult.get_primary_value() falls back to wall_stats when kernel_stats is missing, and _run_both() can return success=True when only the wall-clock path succeeds. That means a config asking for primary_metric: kernel_median_ms may still get a non-null primary_value_ms backed by wall latency if the kernel-trace side fails or produces no stats.

This is risky for compare/autotuning, because the result looks like a successful kernel-only ranking metric. I think this should either fail the latency result when the requested primary metric is unavailable, or report primary_value_ms: null with a clear warning instead of falling back to wall latency.

Latency appears to be enabled by default, which can break existing configs.

LatencyConfig.enabled defaults to True, so existing PyTorch/Triton/CUDA workflows that do not define a bench_target and do not emit a MAGPIE_LATENCY_JSON marker can now end up with latency_state=FAILED even though they never opted into the new latency stage.

For backward compatibility, it would be safer to default this to disabled, or auto-skip latency unless the user explicitly configures a latency target/method.

The prebuilt HIP example binary should not be committed.

examples/simple_hip_test/vector_add_bench is a checked-in prebuilt binary. This is not portable across systems/architectures and can be stale relative to vector_add_bench.hip. In testing, the committed binary failed on the MI300 machine, while rebuilding from the .hip source worked.

Can we remove the binary from the PR and rely on compile_command / docs to build it locally instead?

The relative pythonpath in the Triton examples is resolved from the runner working directory.

For example, examples/simple_triton_test/analyze_triton_latency.yaml sets:

working_dir: "examples/simple_triton_test"
latency:
  pythonpath:
    - "examples/simple_triton_test"

But the latency runner subprocess uses cwd=kernel_cfg.working_dir, so that relative PYTHONPATH becomes effectively examples/simple_triton_test/examples/simple_triton_test and the import target may not be found. This should either resolve relative pythonpath entries against the repo/config-file directory before passing them to the subprocess, or the example should use . for this case.

The HIP helper header is not included in the wheel/package data.

Magpie/bench/include/magpie_bench.hpp is added and referenced by the docs/examples, but pyproject.toml package-data currently only includes YAML/JSON files:

[tool.setuptools.package-data]
Magpie = ["*.yaml", "*.yaml.example", "*.json"]

A built wheel does not contain magpie_bench.hpp, so installed-package users will not have the documented include path. Please add the header path to package data, e.g. bench/include/*.hpp.

no overhead latency measurement

dc8f2e6

sinarafati-amd requested a review from haofrank April 29, 2026 06:09

sinarafati-amd marked this pull request as ready for review April 30, 2026 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/zero overhead latency#22

Feat/zero overhead latency#22
sinarafati-amd wants to merge 1 commit into
mainfrom
feat/zero-overhead-latency

sinarafati-amd commented Apr 29, 2026

Uh oh!

irvineoy commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sinarafati-amd commented Apr 29, 2026

Uh oh!

irvineoy commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants