-
Notifications
You must be signed in to change notification settings - Fork 208
Add -fno-reciprocal-math -mrecip=none to Ofast to solve the AMD/Intel differences #8280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add -fno-reciprocal-math -mrecip=none to Ofast to solve the AMD/Intel differences #8280
Conversation
|
A new Pull Request was created by @silviodonato (Silvio Donato) for branch IB/CMSSW_13_0_X/master. @cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks. |
|
I suggest to run test using "enable profiling" (I think I don't have anymore the rights to start tests) |
|
test parameters:
|
|
test parameters:
|
|
please test |
|
This PR replaces cms-sw/cmssw#40649 |
|
Is this making the change globally? |
|
Hi everyone, the test was performed using |
|
cc @goys |
|
@Martin-Grunewald , yes, afaik this change should affect all modules, but I let people from @cms-sw/core-l2 to comment more on this. These are the PRs which moved all packages from |
|
@silviodonato @Martin-Grunewald , change will only apply to those packages which directly have |
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30324/summary.html Comparison SummarySummary:
|
|
I'm surprised to see so many differences even at simulation level in (eg. 13234.0 and 13434.0, see https://tinyurl.com/27gnfjt9 ), do you think they are just statistical fluctuation of the PU mixing? |
|
please test lets rerun the tests based on latest IB. There were a lot of other changes which were accumulated for last tests |
|
The timing seems ok, but I let @cms-sw/reconstruction-l2 comment more on this |
What is the impact on the HLT timing, running on 2022 data ? |
|
@cms-sw/reconstruction-l2 |
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30351/summary.html The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Comparison SummarySummary:
|
As mentioned in yesterday's ORP, HLT would like to have this PR merged in time for @cms-sw/reconstruction-l2 , do you have any further comments ? (reco already signed in #8280 (comment), but that was before the latest round of tests.) |
|
+reconstruction |
|
This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_13_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
|
@perrotta, @rappoccio this is ready to go in , feel free to merge it for tonight's IB |
|
OK, I am merging it for tonights IB |
|
+1
|
|
@cms-sw/core-l2 @perrotta @rappoccio I think this PR caused the failure of the build in ARM ( I hope this can be solved with something similar to 70f1969 |
|
#8302 should fix the aarch64 issue. |
|
For reference, @gparida run in 13_0_0_pre4 the comparison Intel vs AMD and he found no difference as expected: |
-Ofast implies -funsafe-math-optimizations, which affects the global FP state for all programs: when used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. -Ofast enables (directly or indirectly): - ❌ -fallow-store-data-races - ✔ -fno-semantic-interposition - ✔ -fno-math-errno - ❌ -funsafe-math-optimizations - ❌ -ffinite-math-only - ✔ -fno-rounding-math (_default_) - ✔ -fno-signaling-nans (_default_) - ❔ -fcx-limited-range - ✔ -fexcess-precision=fast - ✔ -fno-signed-zeros - ✔ -fno-trapping-math - ✔ -fassociative-math - ❌ -freciprocal-math (cms-sw#8280) We should not use -fallow-store-data-races, -ffinite-math-only. We should not use -funsafe-math-optimizations at link time, as it affects the global FP state. It's probably easier to replace it with the individual options -fno-signed-zeros, -fno-trapping-math and -fassociative-math. We may revisit -fcx-limited-range if we do use Complex arithmetics. -freciprocal-math was already disabled explicitly, see cms-sw#8280.
-Ofast implies -funsafe-math-optimizations, which affects the global FP state for all programs: when used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. -Ofast enables (directly or indirectly): - ❌ -fallow-store-data-races - ✔ -fno-semantic-interposition - ✔ -fno-math-errno - ❌ -funsafe-math-optimizations - ❌ -ffinite-math-only - ✔ -fno-rounding-math (_default_) - ✔ -fno-signaling-nans (_default_) - ❔ -fcx-limited-range - ✔ -fexcess-precision=fast - ✔ -fno-signed-zeros - ✔ -fno-trapping-math - ✔ -fassociative-math - ❌ -freciprocal-math (disabled in cms-sw#8280) We should not use -fallow-store-data-races, -ffinite-math-only. We should not use -funsafe-math-optimizations at link time, as it affects the global FP state. It's probably easier to replace it with the individual options -fno-signed-zeros, -fno-trapping-math and -fassociative-math. We may revisit -fcx-limited-range if we do use Complex arithmetics. -freciprocal-math was already disabled explicitly, see cms-sw#8280.
About one year ago we discovered differences of few percents between AMD and Intel in few HLT paths when rerunning the HLT [1]. The situation got worse when we added the parking di-electron paths with dZ path, which gave a difference of ~8% [2].
After several failed attempts, @gparida managed to solve the differences following @VinInn suggestion [3] to use
-Ofast -fno-reciprocal-math -mno-recip.@gparida replaced
<use name="ofast-flag"/>with<flags CXXFLAGS="-Ofast -fno-reciprocal-math -mrecip=none"/>in allBuildFiles.xml(excepttestfolders to save compilation time)and found no AMD/Intel differences @gparida please share your results here (which release did you use?).
So I made this PR to implement this change directly at
cmsdistlevel(but I didn't test it compiling the externals).
I don't know exactly the impact on timing, but I assume it will be much smaller than 1%, that was the impact of the full removal of Ofast [4]
[1] https://its.cern.ch/jira/browse/CMSHLT-2257
[2] https://docs.google.com/spreadsheets/d/1XRu1sfIYJUQ0e7Z_yJWdGrxTDA31zr33uWPFY54-APc
[3] https://its.cern.ch/jira/browse/CMSHLT-2257?focusedCommentId=4686018&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-4686018
[4] cms-sw/cmssw#40089
cc @cms-sw/hlt-l2 @cms-sw/reconstruction-l2 @gparida @sanuvarghese @cms-sw/core-l2