PyTorch2FlyDSL benchmark robustness + cost estimation + model bump by peyron-amd · Pull Request #268 · AMD-AGI/GEAK

peyron-amd · 2026-06-08T14:48:32Z

Summary

Robustness + accounting improvements to the PyTorch→FlyDSL translation harness, plus a model default bump. Scope is limited to translate.py and its config (4 commits, ~209/49 lines).

Median-over-N latency: replace single-shot CUDA timing with a median over bench_warmup (10) + bench_iters (30) timed passes to cut measurement noise.
Configurable PyTorch reference mode: reference_mode ∈ eager | compile | compile_fallback (default compile_fallback), so the speedup baseline is PyTorch at its best.
Cost persistence: derive token usage from the agent trajectory and write translation_cost_usd into translation_result.json using configurable per-Mtok rates (defaults = public Opus pricing).
Always record PyTorch reference latency, even when the candidate fails correctness (candidate latency/speedup still gated on a passing candidate).
Config hardening: pop bench/reference keys before constructing the agent so they don't reach TranslationAgentConfig.__init__ (fixes an unexpected-kwarg crash).
Default model → claude-opus-4.8.
All settings are driven from mini_kernel_pytorch_to_flydsl.yaml (no env vars).

…ce mode Replace the translation harness's single timed forward (after 3 warmups) with a median over N timed passes using CUDA events (no Triton), to remove the run-to-run speedup noise. Configured via the existing translation YAML agent: section (bench_warmup=10, bench_iters=30, reference_mode), with no new env vars; bench_iters defaults to the shared optimization constant DEFAULT_EVAL_BENCHMARK_ITERATIONS when omitted so the two stages can't drift. reference_mode (reference only; candidate unchanged): compile_fallback (default, torch.compile then fall back to eager on failure - PyTorch at its best), compile, or eager (reproduces historical numbers). Print/parse/speedup contracts preserved.

translation_result.json now records spend and tokens regardless of outcome: - translation_pytorch_latency_ms is always set when the harness prints it, even when the candidate fails correctness (parsed before the success/fail branch; candidate latency + speedup stay success-only since they're meaningless for an incorrect kernel). - translation_cost_usd / translation_tokens / translation_model_calls / translation_cost_rates_per_mtok aggregated from the round trajectories (input/output/cache read+write), priced with configurable per-Mtok rates (model: cost_per_mtok_*, default public Claude Opus rates).

…agent bench_warmup/bench_iters/reference_mode live in the agent: YAML section but are translation-harness settings, not agent fields. run_translation_agent splats agent_config into TranslationAgentConfig(**kwargs), so reading them with .get() left them in the dict and crashed every round with "TranslationAgentConfig.__init__() got an unexpected keyword argument". Use .pop() to consume them before the agent config is built.

All translation-bench arms now run on claude-opus-4.8 by default (verified accepted by the amd_llm gateway via cond48/med48 runs).

run_translation parsed the PyTorch reference latency with re.search but re was never imported in that scope (the file uses function-local imports), so the always-record-reference-latency path raised NameError at runtime and ruff flagged F821. Add a local import and apply ruff format to the cost helper + bench log line.

peyron-amd added 4 commits June 8, 2026 15:23

feat(translate): default translation model to claude-opus-4.8

973485c

All translation-bench arms now run on claude-opus-4.8 by default (verified accepted by the amd_llm gateway via cond48/med48 runs).

peyron-amd requested review from Umangatamd, amd-ethany, chao-xu-spec, iraj465, jianghui-jianghui, sdubagun-amd and yueliu14 as code owners June 8, 2026 14:48

yueliu14 approved these changes Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorch2FlyDSL benchmark robustness + cost estimation + model bump#268

PyTorch2FlyDSL benchmark robustness + cost estimation + model bump#268
peyron-amd wants to merge 5 commits into
mainfrom
feat/translation-bench-median

peyron-amd commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

peyron-amd commented Jun 8, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants