-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AMDGPU] Add a target option to disable aggressive FMA fusion #161450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Yi Qian (yiqian1) ChangesFull diff: https://github.com/llvm/llvm-project/pull/161450.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index eaa1870f4be28..62b4e32dbcb59 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1524,7 +1524,7 @@ def FeatureGFX9 : GCNSubtargetFeatureGeneration<"GFX9",
FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,
FeatureSMemRealTime, FeatureScalarStores, FeatureInv2PiInlineImm,
FeatureApertureRegs, FeatureGFX9Insts, FeatureVOP3P, FeatureVGPRIndexMode,
- FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,
+ FeatureDPP, FeatureIntClamp,
FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,
FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,
FeatureAddNoCarryInsts, FeatureGFX8Insts, FeatureGFX7GFX8GFX9Insts,
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
index 7b94ea3ffbf1f..b7473e5ea4759 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
@@ -85,6 +85,10 @@ GCNSubtarget &GCNSubtarget::initializeSubtargetDependencies(const Triple &TT,
FullFS += "-wavefrontsize64,";
}
+ // GFX9 enables fast-fmaf by default
+ if (GPU.contains_insensitive("gfx9") && !FS.contains_insensitive("fast-fmaf"))
+ FullFS += "+fast-fmaf";
+
FullFS += FS;
ParseSubtargetFeatures(GPU, /*TuneCPU*/ GPU, FullFS);
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 16530087444d2..910b693c8c6a6 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -6502,10 +6502,12 @@ bool SITargetLowering::enableAggressiveFMAFusion(EVT VT) const {
// When fma is quarter rate, for f64 where add / sub are at best half rate,
// most of these combines appear to be cycle neutral but save on instruction
// count / code size.
- return true;
+ return Subtarget->hasFastFMAF32();
}
-bool SITargetLowering::enableAggressiveFMAFusion(LLT Ty) const { return true; }
+bool SITargetLowering::enableAggressiveFMAFusion(LLT Ty) const {
+ return Subtarget->hasFastFMAF32();
+}
EVT SITargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &Ctx,
EVT VT) const {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not be scanning through feature names, But also this is baked into the device properties. I do not think this should be optional
Sometimes we need to disable fusing fmul+fadd/fsub into fma, so making this flag optional is a simple way to achieve that. But it's debatable whether reusing this flag is proper, as it describes a target feature which should conceptually always be true. |
Why? That's not something you should need to do
It's not |
8dc707f
to
aab5e41
Compare
This is because FMA fusion may cause numerical errors when we want bitwise accuracy. For example, one of our test cases includes |
This is a property of the incoming IR and cannot be addressed by changing backend costing heuristics. What IR are you using? You have to change that |
Add a target feature to disable fusing fmul and fadd/fsub into fma. This allows preserving bitwise accuracy when needed.