Fix FFT live-in canonicalization without regressing ReLU II#320
Merged
Conversation
Closed
Contributor
There was a problem hiding this comment.
Pull request overview
This PR tightens canonicalize-live-in so direct-dominating live-ins are not used when the def→use path crosses loop backedges (except for latch/merge-style use blocks), fixing an FFT correctness issue in later ctrl→dataflow lowering while preserving the ReLU II guardrail.
Changes:
- Add CFG reachability/backedge detection to reject unsafe direct-live-in fast paths in loop-nested control flow.
- Update FFT/SPMV end-to-end mapping checks to reflect the new lowered/mapped form (FFT mapping II changes accordingly).
- Adjust the ReLU pipeline test to write intermediate outputs to files and run FileCheck against those artifacts (still checking mapped II = 5).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| lib/NeuraDialect/Transforms/CanonicalizeLiveInPass.cpp | Adds reachability/backedge checks to make direct-live-in canonicalization safe across loop control-flow. |
| test/neura/for_loop/relu_test.mlir | Updates RUN lines to materialize intermediate outputs to files before running FileCheck, retaining the II regression check. |
| test/e2e/spmv/spmv_kernel.mlir | Updates expected mapping/YAML/ASM checks to match new lowered/mapped output. |
| test/e2e/fft/fft_kernel.mlir | Updates expected mapping/YAML/ASM checks to match new lowered/mapped output (including updated mapping II expectations). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tancheng
approved these changes
Jun 9, 2026
tancheng
reviewed
Jun 9, 2026
tancheng
reviewed
Jun 9, 2026
tancheng
approved these changes
Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix FFT live-in canonicalization without regressing ReLU II
Background
This PR fixes an FFT correctness issue caused by an unsafe live-in canonicalization across nested loop control flow.
The FFT kernel has stage-level variables:
The important semantic requirement is that
groupsPerStage,buttersPerGroup,coef_base, and the stage index are updated once per FFT stage, after both inner loops finish. They must not update once per butterfly iteration.What went wrong
After
canonicalize-live-in, the FFT control IR contained this pattern:Here:
The bug is that
^bb6only receives%79as a block argument, but directly uses%4,%5,%6, and%7from the outer loop header^bb1.This is valid under simple SSA dominance because
^bb1dominates^bb6. However, it is not safe for our lowering pipeline. The latertransform-ctrl-to-data-flowpass relies on branch/block arguments to preserve control predicates when CFG control flow is flattened into dataflow. Direct dominating live-ins bypass that predicate structure.As a result, the FFT stage update expressions became available too often after dataflow lowering. This explains the observed behavior where stage-level variables such as
groupsPerStage,buttersPerGroup, andcoef_basewere updated at the inner butterfly rate instead of once per FFT stage.Compiler bug
The problematic logic was in:
Specifically, the direct-live-in detection was too permissive:
isSingleSourceSingleSinkPattern(...) isDirectUnconditionalPattern(...) identifyDirectDominatingLiveIns(...)These helpers allowed a value defined in a dominating block to be used directly in a later block, even if the path from the definition to the use passed through nested loop/backedge control flow.
That optimization is safe for simple straight-line or acyclic control regions, but it is unsafe for the FFT pattern because the direct live-in crosses loop control that must become an explicit predicate in dataflow form.
Fix
The fix is to make the direct-live-in optimization more conservative only when it needs to be.
This PR adds a CFG reachability check that detects whether there is a backedge before the target use block:
Then both direct-live-in fast paths reject candidates if a backedge exists before the use:
When this direct-live-in optimization is rejected, the existing canonicalization logic promotes the value through explicit block arguments instead.
Conceptually, the FFT stage update should become closer to this shape:
The key difference is that the stage-update values are now explicit block arguments. That gives
transform-ctrl-to-data-flowthe information it needs to attach the correct stage-boundary predicates.Why this is not over-conservative
A simpler fix would be to reject every direct live-in whenever there is any backedge between the defining block and the using block. That would fix FFT, but it would also be too conservative for simple loop latch / merge patterns.
ReLU is the important example. Its loop has a latch/merge block where the value is consumed before the loop backedge leaves that same block. In that case, the backedge is after the live-in use, so rejecting the direct-live-in optimization would add unnecessary block arguments and data movement. That can increase the mapped recurrence/resource pressure and risk degrading the final II.
This PR avoids that by skipping the target block itself when looking for backedges:
So the rule is:
That is the distinction that fixes FFT without unnecessarily pessimizing ReLU.
ReLU II protection
ReLU is covered by the existing full-pipeline mapping test:
The test runs the relevant pipeline:
and checks that the final mapped function still has:
This guarantees that the fix does not regress the ReLU mapped II from 5.