Skip to content

[P1] FFT kernel issue #316

@n0thingNoob

Description

@n0thingNoob

FFT generated ASM appears to update stage-level loop recurrences on the inner butterfly iteration

Summary

The generated FFT instruction YAML/ASM does not match the CPU result from fft_int.c. The issue appears to be in the generated loop-control / PHI wiring for the nested FFT loops.

In the C source, groupsPerStage, buttersPerGroup, and coef_base are stage-level variables. They should update only after all j/k iterations for the current stage complete.

for (i = 0; i < NSTAGES; ++i) {
  for (j = 0; j < groupsPerStage; ++j) {
    for (k = 0; k < buttersPerGroup; ++k) {
      ... butterfly ...
    }
  }
  groupsPerStage = groupsPerStage * 2;
  buttersPerGroup = buttersPerGroup / 2;
  coef_base = (coef_base << 1) + 1;
}

Suspect generated ASM

In tmp-generated-instructions.asm, PE(1,2) contains a PHI_START id=36 followed by DIV / 2. Dynamic trace shows this value evolves as:

128 -> 64 -> 32 -> 16 -> 8 -> 4 -> 2 -> 1

This looks like buttersPerGroup is being updated every dynamic butterfly iteration, rather than at the stage boundary.

In PE(2,2), PHI_START id=37 followed by SHL << 1 evolves as:

1 -> 2 -> 4 -> 8 -> 16 -> 32 -> 64 -> 128

This looks like groupsPerStage is also being updated every dynamic butterfly iteration.

ICMP_EQ id=64 then reaches 8 and triggers RETURN_VOID id=111 after only a few butterfly iterations:

Time 297: ICMP_EQ id=64 Src1=8 Src2=8 Result=1(true)
Time 299: RETURN_VOID id=111 Pred=true

Address stream symptom

The data load addresses become a linear sweep:

real lower: 0, 1, 2, ...
real upper: 128, 129, 130, ...
imag lower: 256, 257, 258, ...
imag upper: 384, 385, 386, ...

The CPU nested loop should switch address patterns between FFT stages, so this linear stream indicates the stage-level recurrence/predicate is likely connected to the wrong loop level.

Expected compiler behavior

The recurrence/predicate for:

groupsPerStage *= 2;
buttersPerGroup /= 2;
coef_base = (coef_base << 1) + 1;

should be guarded by the stage-boundary predicate, after the j/k loops finish. It should not be driven by the inner butterfly iteration.

Attached screenshots

  • 01_c_source_loop.png: source loop and stage-level updates
  • 02_asm_pe12_butters_recurrence.png: suspect buttersPerGroup recurrence
  • 03_asm_pe22_groups_return.png: suspect groupsPerStage / return predicate
  • 04_asm_address_generation_symptom.png: address-generation symptom
  • 05_dfg_loop_control_subset.png: local DFG relation
Image Image Image Image Image

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions