Skip to content

fix(flydsl): make PyTorch→FlyDSL e2e reliable (gate, harness resolution, verify/opt wiring)#288

Open
Umangatamd wants to merge 1 commit into
feature/flydsl_translation_optimizationfrom
fix/scoped-correctness-gate
Open

fix(flydsl): make PyTorch→FlyDSL e2e reliable (gate, harness resolution, verify/opt wiring)#288
Umangatamd wants to merge 1 commit into
feature/flydsl_translation_optimizationfrom
fix/scoped-correctness-gate

Conversation

@Umangatamd

Copy link
Copy Markdown
Collaborator

✅ Errors fixed — PyTorch→FlyDSL e2e now runs end-to-end (translation → baseline → optimization)

Validated on KernelBench/level3/1_MLP.py (claude-opus-4.8 / amd_llm gateway). Reproduced the original FAIL_PREPROCESS/no-baseline/step_limit failures, traced the full chain, and fixed each root cause.

Root causes found & fixed

  1. Baseline correctness-gate false-abortcollect_baseline re-validated correctness on the stricter harness-generator harness and aborted on any non-zero exit, discarding kernels translation already validated → scoped skip when translation.success (env-independent; user-supplied harnesses still gate).
  2. Harness couldn't resolve the kernel (LLM guessed a wrong rel-path, e.g. a spurious KernelBench/ segment) → deterministic kernel_relpath injection into the harness subagents.
  3. Silent no-baselineloud kernel not found at X diagnostic (detect_kernel_resolution_failure).
  4. Spin to step_limit on a broken harnessfail-closed once the harness-generator retry budget is exhausted.
  5. Work-dir mismatch across verify/baseline/optimize → retarget preprocess (subagent sandbox + baseline work_dir) to the per-run _opt_repo, git-init it for patch capture, and fix _copy_repo_sandbox to copy a repo that lives under output_dir.
  6. Verifier never confirms a working harnessdeterministic --correctness backstop (mark HARNESS_VERIFIED when the harness passes --correctness but the LLM verifier didn't confirm).
  7. mini.py — use the per-run _opt_repo as the optimization root for translation runs even when --repo is passed (otherwise preflight/opt root at the source repo, which has no translated kernel).
  8. RAG postprocessor built a provider-less LitellmModel (else: get_model()) → litellm.BadRequestError: LLM Provider NOT provided crashing the optimize loop → route through the gateway model (load_geak_model) + make post-processing non-fatal.

Plus: raise full-mode preprocess soft cap 900s → 2400s (geak.yaml) so translation + multi-round harness-gen + baseline fit.

Result

Optimize loop now produces real patches with verified speedups (round-1, over the translated FlyDSL baseline):

strategy speedup
gemm-tile-tuning-per-layer 1.68× (0.609 → 0.363 ms)
fixed-pad-2 1.46×
fixed-pad-3 1.17×

Cross-verified speedup 1.14×. Tests: 20 new preprocess_v3 bugfix unit tests + broader suite green.

Note for main

Fixes #7 (mini.py) and #8 (RAG postprocessor) are general bugs that exist on main too (affect any amd_llm + RAG / translation run), worth landing independently of the FlyDSL branch.

Made with Cursor

…ion, verify/opt wiring)

Fixes the chain of failures that stopped translated FlyDSL kernels from
completing preprocess + optimization end-to-end:

- baseline: scope the correctness-gate skip to translation runs (translation
  already validates correctness + perf-regression); keep gate for user harnesses
- baseline: detect_kernel_resolution_failure -> loud "kernel not found at X"
- tools: deterministic kernel_relpath injection into harness subagents so the
  harness resolves os.path.join(WORK_DIR, kernel_relpath) instead of guessing
- tools: fail-closed when harness-generator retry budget is exhausted (no spin
  to step_limit on a broken harness)
- tools: retarget preprocess (sandbox + baseline work_dir) to the per-run
  _opt_repo after translation + git-init it; fix _copy_repo_sandbox to copy a
  repo that lives under output_dir
- tools: deterministic harness-verifier backstop (mark HARNESS_VERIFIED when the
  harness passes --correctness but the LLM verifier did not confirm)
- mini: use the per-run _opt_repo as the optimization root for translation runs
  even when --repo is passed (else preflight/opt root at the source repo)
- rag_postprocessor: route through the gateway model (load_geak_model) instead
  of a provider-less LitellmModel, and make post-processing non-fatal
- geak.yaml: raise full-mode preprocess soft cap 900s -> 2400s

Adds 20 preprocess_v3 bugfix unit tests. Note: the mini.py and
rag_postprocessor fixes are general bugs present on main too.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant