PyTorch2FlyDSL benchmark robustness + cost estimation + model bump#268
Open
peyron-amd wants to merge 5 commits into
Open
PyTorch2FlyDSL benchmark robustness + cost estimation + model bump#268peyron-amd wants to merge 5 commits into
peyron-amd wants to merge 5 commits into
Conversation
…ce mode Replace the translation harness's single timed forward (after 3 warmups) with a median over N timed passes using CUDA events (no Triton), to remove the run-to-run speedup noise. Configured via the existing translation YAML agent: section (bench_warmup=10, bench_iters=30, reference_mode), with no new env vars; bench_iters defaults to the shared optimization constant DEFAULT_EVAL_BENCHMARK_ITERATIONS when omitted so the two stages can't drift. reference_mode (reference only; candidate unchanged): compile_fallback (default, torch.compile then fall back to eager on failure - PyTorch at its best), compile, or eager (reproduces historical numbers). Print/parse/speedup contracts preserved.
translation_result.json now records spend and tokens regardless of outcome: - translation_pytorch_latency_ms is always set when the harness prints it, even when the candidate fails correctness (parsed before the success/fail branch; candidate latency + speedup stay success-only since they're meaningless for an incorrect kernel). - translation_cost_usd / translation_tokens / translation_model_calls / translation_cost_rates_per_mtok aggregated from the round trajectories (input/output/cache read+write), priced with configurable per-Mtok rates (model: cost_per_mtok_*, default public Claude Opus rates).
…agent bench_warmup/bench_iters/reference_mode live in the agent: YAML section but are translation-harness settings, not agent fields. run_translation_agent splats agent_config into TranslationAgentConfig(**kwargs), so reading them with .get() left them in the dict and crashed every round with "TranslationAgentConfig.__init__() got an unexpected keyword argument". Use .pop() to consume them before the agent config is built.
All translation-bench arms now run on claude-opus-4.8 by default (verified accepted by the amd_llm gateway via cond48/med48 runs).
run_translation parsed the PyTorch reference latency with re.search but re was never imported in that scope (the file uses function-local imports), so the always-record-reference-latency path raised NameError at runtime and ruff flagged F821. Add a local import and apply ruff format to the cost helper + bench log line.
yueliu14
approved these changes
Jun 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Robustness + accounting improvements to the PyTorch→FlyDSL translation harness, plus a model default bump. Scope is limited to
translate.pyand its config (4 commits, ~209/49 lines).bench_warmup(10) +bench_iters(30) timed passes to cut measurement noise.reference_mode∈eager|compile|compile_fallback(defaultcompile_fallback), so the speedup baseline is PyTorch at its best.translation_cost_usdintotranslation_result.jsonusing configurable per-Mtok rates (defaults = public Opus pricing).TranslationAgentConfig.__init__(fixes an unexpected-kwarg crash).claude-opus-4.8.All settings are driven from
mini_kernel_pytorch_to_flydsl.yaml(no env vars).