FFT generated ASM appears to update stage-level loop recurrences on the inner butterfly iteration
Summary
The generated FFT instruction YAML/ASM does not match the CPU result from fft_int.c. The issue appears to be in the generated loop-control / PHI wiring for the nested FFT loops.
In the C source, groupsPerStage, buttersPerGroup, and coef_base are stage-level variables. They should update only after all j/k iterations for the current stage complete.
for (i = 0; i < NSTAGES; ++i) {
for (j = 0; j < groupsPerStage; ++j) {
for (k = 0; k < buttersPerGroup; ++k) {
... butterfly ...
}
}
groupsPerStage = groupsPerStage * 2;
buttersPerGroup = buttersPerGroup / 2;
coef_base = (coef_base << 1) + 1;
}
Suspect generated ASM
In tmp-generated-instructions.asm, PE(1,2) contains a PHI_START id=36 followed by DIV / 2. Dynamic trace shows this value evolves as:
128 -> 64 -> 32 -> 16 -> 8 -> 4 -> 2 -> 1
This looks like buttersPerGroup is being updated every dynamic butterfly iteration, rather than at the stage boundary.
In PE(2,2), PHI_START id=37 followed by SHL << 1 evolves as:
1 -> 2 -> 4 -> 8 -> 16 -> 32 -> 64 -> 128
This looks like groupsPerStage is also being updated every dynamic butterfly iteration.
ICMP_EQ id=64 then reaches 8 and triggers RETURN_VOID id=111 after only a few butterfly iterations:
Time 297: ICMP_EQ id=64 Src1=8 Src2=8 Result=1(true)
Time 299: RETURN_VOID id=111 Pred=true
Address stream symptom
The data load addresses become a linear sweep:
real lower: 0, 1, 2, ...
real upper: 128, 129, 130, ...
imag lower: 256, 257, 258, ...
imag upper: 384, 385, 386, ...
The CPU nested loop should switch address patterns between FFT stages, so this linear stream indicates the stage-level recurrence/predicate is likely connected to the wrong loop level.
Expected compiler behavior
The recurrence/predicate for:
groupsPerStage *= 2;
buttersPerGroup /= 2;
coef_base = (coef_base << 1) + 1;
should be guarded by the stage-boundary predicate, after the j/k loops finish. It should not be driven by the inner butterfly iteration.
Attached screenshots
01_c_source_loop.png: source loop and stage-level updates
02_asm_pe12_butters_recurrence.png: suspect buttersPerGroup recurrence
03_asm_pe22_groups_return.png: suspect groupsPerStage / return predicate
04_asm_address_generation_symptom.png: address-generation symptom
05_dfg_loop_control_subset.png: local DFG relation

FFT generated ASM appears to update stage-level loop recurrences on the inner butterfly iteration
Summary
The generated FFT instruction YAML/ASM does not match the CPU result from
fft_int.c. The issue appears to be in the generated loop-control / PHI wiring for the nested FFT loops.In the C source,
groupsPerStage,buttersPerGroup, andcoef_baseare stage-level variables. They should update only after allj/kiterations for the current stage complete.Suspect generated ASM
In
tmp-generated-instructions.asm,PE(1,2)contains aPHI_START id=36followed byDIV / 2. Dynamic trace shows this value evolves as:This looks like
buttersPerGroupis being updated every dynamic butterfly iteration, rather than at the stage boundary.In
PE(2,2),PHI_START id=37followed bySHL << 1evolves as:This looks like
groupsPerStageis also being updated every dynamic butterfly iteration.ICMP_EQ id=64then reaches8and triggersRETURN_VOID id=111after only a few butterfly iterations:Address stream symptom
The data load addresses become a linear sweep:
The CPU nested loop should switch address patterns between FFT stages, so this linear stream indicates the stage-level recurrence/predicate is likely connected to the wrong loop level.
Expected compiler behavior
The recurrence/predicate for:
should be guarded by the stage-boundary predicate, after the
j/kloops finish. It should not be driven by the inner butterfly iteration.Attached screenshots
01_c_source_loop.png: source loop and stage-level updates02_asm_pe12_butters_recurrence.png: suspectbuttersPerGrouprecurrence03_asm_pe22_groups_return.png: suspectgroupsPerStage/ return predicate04_asm_address_generation_symptom.png: address-generation symptom05_dfg_loop_control_subset.png: local DFG relation