[AMDGPU] Add a target option to disable aggressive FMA fusion #161450

yiqian1 · 2025-09-30T21:36:15Z

Add a target feature to disable fusing fmul and fadd/fsub into fma. This allows preserving bitwise accuracy when needed.

llvmbot · 2025-09-30T21:36:53Z

@llvm/pr-subscribers-backend-amdgpu

Author: Yi Qian (yiqian1)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/161450.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+1-1)
(modified) llvm/lib/Target/AMDGPU/GCNSubtarget.cpp (+4)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+4-2)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index eaa1870f4be28..62b4e32dbcb59 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1524,7 +1524,7 @@ def FeatureGFX9 : GCNSubtargetFeatureGeneration<"GFX9",
    FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,
    FeatureSMemRealTime, FeatureScalarStores, FeatureInv2PiInlineImm,
    FeatureApertureRegs, FeatureGFX9Insts, FeatureVOP3P, FeatureVGPRIndexMode,
-   FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,
+   FeatureDPP, FeatureIntClamp,
    FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,
    FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,
    FeatureAddNoCarryInsts, FeatureGFX8Insts, FeatureGFX7GFX8GFX9Insts,
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
index 7b94ea3ffbf1f..b7473e5ea4759 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
@@ -85,6 +85,10 @@ GCNSubtarget &GCNSubtarget::initializeSubtargetDependencies(const Triple &TT,
       FullFS += "-wavefrontsize64,";
   }
 
+  // GFX9 enables fast-fmaf by default
+  if (GPU.contains_insensitive("gfx9") && !FS.contains_insensitive("fast-fmaf"))
+    FullFS += "+fast-fmaf";
+
   FullFS += FS;
 
   ParseSubtargetFeatures(GPU, /*TuneCPU*/ GPU, FullFS);
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 16530087444d2..910b693c8c6a6 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -6502,10 +6502,12 @@ bool SITargetLowering::enableAggressiveFMAFusion(EVT VT) const {
   // When fma is quarter rate, for f64 where add / sub are at best half rate,
   // most of these combines appear to be cycle neutral but save on instruction
   // count / code size.
-  return true;
+  return Subtarget->hasFastFMAF32();
 }
 
-bool SITargetLowering::enableAggressiveFMAFusion(LLT Ty) const { return true; }
+bool SITargetLowering::enableAggressiveFMAFusion(LLT Ty) const {
+  return Subtarget->hasFastFMAF32();
+}
 
 EVT SITargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &Ctx,
                                          EVT VT) const {

arsenm

You should not be scanning through feature names, But also this is baked into the device properties. I do not think this should be optional

yiqian1 · 2025-10-01T01:29:29Z

You should not be scanning through feature names, But also this is baked into the device properties. I do not think this should be optional

Sometimes we need to disable fusing fmul+fadd/fsub into fma, so making this flag optional is a simple way to achieve that. But it's debatable whether reusing this flag is proper, as it describes a target feature which should conceptually always be true.

arsenm · 2025-10-01T15:07:15Z

Sometimes we need to disable fusing fmul+fadd/fsub into fma,

Why? That's not something you should need to do

But it's debatable whether reusing this flag is proper

It's not

yiqian1 · 2025-10-02T15:46:48Z

Sometimes we need to disable fusing fmul+fadd/fsub into fma,

Why? That's not something you should need to do

This is because FMA fusion may cause numerical errors when we want bitwise accuracy. For example, one of our test cases includes q * k - q * k. It's lowered to fsub(fmul(q, k), fmul(q, k)) which does round(round(q * k) - round(q * k)). With FMA enabled, it's transformed into fma(q, k, -fmul(q, k)). or round(q * k - round(q * k)). In this case we want to disable fusing fmul + fsub into fma to ensure correct results.

arsenm · 2025-10-03T01:16:24Z

This is because FMA fusion may cause numerical errors when we want bitwise accuracy.

This is a property of the incoming IR and cannot be addressed by changing backend costing heuristics. What IR are you using? You have to change that

llvmbot added the backend:AMDGPU label Sep 30, 2025

arsenm requested changes Oct 1, 2025

View reviewed changes

yiqian1 marked this pull request as draft October 1, 2025 01:58

[AMDGPU] Add a target option to disable aggressive FMA fusion

aab5e41

yiqian1 force-pushed the support-fast-fmaf branch from 8dc707f to aab5e41 Compare October 2, 2025 02:01

yiqian1 added 2 commits October 2, 2025 05:00

Remove an old change

9a47421

Remove an unneeded check

ada0bb9

yiqian1 changed the title ~~[AMDGPU] Make fast-fmaf an optional flag, defaulting to True for GFX9~~ [AMDGPU] Add a target option to disable aggressive FMA fusion Oct 2, 2025

yiqian1 marked this pull request as ready for review October 2, 2025 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Add a target option to disable aggressive FMA fusion #161450

[AMDGPU] Add a target option to disable aggressive FMA fusion #161450

Uh oh!

yiqian1 commented Sep 30, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 30, 2025

Uh oh!

arsenm left a comment

Uh oh!

yiqian1 commented Oct 1, 2025

Uh oh!

arsenm commented Oct 1, 2025

Uh oh!

yiqian1 commented Oct 2, 2025 •

edited

Loading

Uh oh!

arsenm commented Oct 3, 2025

Uh oh!

Uh oh!

[AMDGPU] Add a target option to disable aggressive FMA fusion #161450

Are you sure you want to change the base?

[AMDGPU] Add a target option to disable aggressive FMA fusion #161450

Uh oh!

Conversation

yiqian1 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 30, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

yiqian1 commented Oct 1, 2025

Uh oh!

arsenm commented Oct 1, 2025

Uh oh!

yiqian1 commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Oct 3, 2025

Uh oh!

Uh oh!

yiqian1 commented Sep 30, 2025 •

edited

Loading

yiqian1 commented Oct 2, 2025 •

edited

Loading