Skip to content

Conversation

yiqian1
Copy link
Contributor

@yiqian1 yiqian1 commented Sep 30, 2025

Add a target feature to disable fusing fmul and fadd/fsub into fma. This allows preserving bitwise accuracy when needed.

@llvmbot
Copy link
Member

llvmbot commented Sep 30, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Yi Qian (yiqian1)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/161450.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.cpp (+4)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+4-2)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index eaa1870f4be28..62b4e32dbcb59 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1524,7 +1524,7 @@ def FeatureGFX9 : GCNSubtargetFeatureGeneration<"GFX9",
    FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,
    FeatureSMemRealTime, FeatureScalarStores, FeatureInv2PiInlineImm,
    FeatureApertureRegs, FeatureGFX9Insts, FeatureVOP3P, FeatureVGPRIndexMode,
-   FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,
+   FeatureDPP, FeatureIntClamp,
    FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,
    FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,
    FeatureAddNoCarryInsts, FeatureGFX8Insts, FeatureGFX7GFX8GFX9Insts,
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
index 7b94ea3ffbf1f..b7473e5ea4759 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
@@ -85,6 +85,10 @@ GCNSubtarget &GCNSubtarget::initializeSubtargetDependencies(const Triple &TT,
       FullFS += "-wavefrontsize64,";
   }
 
+  // GFX9 enables fast-fmaf by default
+  if (GPU.contains_insensitive("gfx9") && !FS.contains_insensitive("fast-fmaf"))
+    FullFS += "+fast-fmaf";
+
   FullFS += FS;
 
   ParseSubtargetFeatures(GPU, /*TuneCPU*/ GPU, FullFS);
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 16530087444d2..910b693c8c6a6 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -6502,10 +6502,12 @@ bool SITargetLowering::enableAggressiveFMAFusion(EVT VT) const {
   // When fma is quarter rate, for f64 where add / sub are at best half rate,
   // most of these combines appear to be cycle neutral but save on instruction
   // count / code size.
-  return true;
+  return Subtarget->hasFastFMAF32();
 }
 
-bool SITargetLowering::enableAggressiveFMAFusion(LLT Ty) const { return true; }
+bool SITargetLowering::enableAggressiveFMAFusion(LLT Ty) const {
+  return Subtarget->hasFastFMAF32();
+}
 
 EVT SITargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &Ctx,
                                          EVT VT) const {

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not be scanning through feature names, But also this is baked into the device properties. I do not think this should be optional

@yiqian1
Copy link
Contributor Author

yiqian1 commented Oct 1, 2025

You should not be scanning through feature names, But also this is baked into the device properties. I do not think this should be optional

Sometimes we need to disable fusing fmul+fadd/fsub into fma, so making this flag optional is a simple way to achieve that. But it's debatable whether reusing this flag is proper, as it describes a target feature which should conceptually always be true.

@yiqian1 yiqian1 marked this pull request as draft October 1, 2025 01:58
@arsenm
Copy link
Contributor

arsenm commented Oct 1, 2025

Sometimes we need to disable fusing fmul+fadd/fsub into fma,

Why? That's not something you should need to do

But it's debatable whether reusing this flag is proper

It's not

@yiqian1 yiqian1 force-pushed the support-fast-fmaf branch from 8dc707f to aab5e41 Compare October 2, 2025 02:01
@yiqian1 yiqian1 changed the title [AMDGPU] Make fast-fmaf an optional flag, defaulting to True for GFX9 [AMDGPU] Add a target option to disable aggressive FMA fusion Oct 2, 2025
@yiqian1
Copy link
Contributor Author

yiqian1 commented Oct 2, 2025

Sometimes we need to disable fusing fmul+fadd/fsub into fma,

Why? That's not something you should need to do

This is because FMA fusion may cause numerical errors when we want bitwise accuracy. For example, one of our test cases includes q * k - q * k. It's lowered to fsub(fmul(q, k), fmul(q, k)) which does round(round(q * k) - round(q * k)). With FMA enabled, it's transformed into fma(q, k, -fmul(q, k)). or round(q * k - round(q * k)). In this case we want to disable fusing fmul + fsub into fma to ensure correct results.

@yiqian1 yiqian1 marked this pull request as ready for review October 2, 2025 15:55
@arsenm
Copy link
Contributor

arsenm commented Oct 3, 2025

This is because FMA fusion may cause numerical errors when we want bitwise accuracy.

This is a property of the incoming IR and cannot be addressed by changing backend costing heuristics. What IR are you using? You have to change that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants