Add -fno-reciprocal-math -mrecip=none to Ofast to solve the AMD/Intel differences #8280

silviodonato · 2023-02-01T10:40:33Z

About one year ago we discovered differences of few percents between AMD and Intel in few HLT paths when rerunning the HLT [1]. The situation got worse when we added the parking di-electron paths with dZ path, which gave a difference of ~8% [2].

After several failed attempts, @gparida managed to solve the differences following @VinInn suggestion [3] to use -Ofast -fno-reciprocal-math -mno-recip.

@gparida replaced <use name="ofast-flag"/> with <flags CXXFLAGS="-Ofast -fno-reciprocal-math -mrecip=none"/> in all BuildFiles.xml (except test folders to save compilation time)

"RecoEgamma/EgammaElectronAlgos/BuildFile.xml"
"RecoEgamma/EgammaPhotonAlgos/BuildFile.xml"
"RecoPixelVertexing/PixelTriplets/BuildFile.xml"
"RecoPixelVertexing/PixelTriplets/plugins/BuildFile.xml"
"RecoTracker/FinalTrackSelectors/BuildFile.xml"
"RecoTracker/FinalTrackSelectors/plugins/BuildFile.xml"
"RecoTracker/TkDetLayers/BuildFile.xml"
"RecoTracker/TkHitPairs/BuildFile.xml"
"RecoVertex/PrimaryVertexProducer/BuildFile.xml"
"TrackingTools/GsfTools/BuildFile.xml"
"TrackingTools/GsfTools/plugins/BuildFile.xml"
"TrackingTools/GsfTracking/BuildFile.xml"
"TrackingTools/KalmanUpdators/BuildFile.xml"
"TrackingTools/MaterialEffects/BuildFile.xml"
"TrackingTools/MaterialEffects/plugins/BuildFile.xml"
"TrackingTools/TrackFitters/BuildFile.xml"
"TrackingTools/TrajectoryState/BuildFile.xml"

and found no AMD/Intel differences @gparida please share your results here (which release did you use?).

So I made this PR to implement this change directly at cmsdist level
(but I didn't test it compiling the externals).

I don't know exactly the impact on timing, but I assume it will be much smaller than 1%, that was the impact of the full removal of Ofast [4]

[1] https://its.cern.ch/jira/browse/CMSHLT-2257
[2] https://docs.google.com/spreadsheets/d/1XRu1sfIYJUQ0e7Z_yJWdGrxTDA31zr33uWPFY54-APc
[3] https://its.cern.ch/jira/browse/CMSHLT-2257?focusedCommentId=4686018&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-4686018
[4] cms-sw/cmssw#40089

cc @cms-sw/hlt-l2 @cms-sw/reconstruction-l2 @gparida @sanuvarghese @cms-sw/core-l2

…ifferences

cmsbuild · 2023-02-01T10:40:56Z

A new Pull Request was created by @silviodonato (Silvio Donato) for branch IB/CMSSW_13_0_X/master.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

silviodonato · 2023-02-01T10:46:09Z

I suggest to run test using "enable profiling" (I think I don't have anymore the rights to start tests)

iarspider · 2023-02-01T10:47:38Z

test parameters:

enable profiling

iarspider · 2023-02-01T10:48:31Z

test parameters:

enable = profiling

iarspider · 2023-02-01T10:49:09Z

please test

silviodonato · 2023-02-01T11:12:19Z

This PR replaces cms-sw/cmssw#40649

Martin-Grunewald · 2023-02-01T11:18:14Z

Is this making the change globally?
And will then be overwritten by the (different) Ofast setting still in the local package buildfiles?

gparida · 2023-02-01T12:08:28Z

Hi everyone, the test was performed using CMSSW_12_4_11 with the menu generated from
hltGetConfiguration /dev/CMSSW_12_4_0/GRun/V188 --full --offline --no-output --data --process MYHLT --type GRun --unprescale --globaltag 124X_dataRun3_HLT_v7 --max-events -1 - running on Run3 files.

silviodonato · 2023-02-01T12:55:40Z

cc @goys

silviodonato · 2023-02-01T13:00:17Z

@Martin-Grunewald , yes, afaik this change should affect all modules, but I let people from @cms-sw/core-l2 to comment more on this.

These are the PRs which moved all packages from <flags CXXFLAGS="-Ofast"/> to <use name="ofast-flag"/>
cmssw: cms-sw/cmssw#24969
cmsdist: #4438

smuzaffar · 2023-02-01T13:19:01Z

@silviodonato @Martin-Grunewald , change will only apply to those packages which directly have <use name="ofast-flag"/> . There should not be any explicit <flags CXXFLAGS="-Ofast"/> in cmssw/package/Buidlfile.xml but if there are any then those packages will just have a union of flags from ofast-flag tool and their local flags.

cmsbuild · 2023-02-01T17:11:12Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30324/summary.html
COMMIT: cf7c106
CMSSW: CMSSW_13_0_X_2023-01-31-2300/el8_amd64_gcc11
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8280/30324/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially added 55 lines to the logs
Reco comparison results: 7354 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3555495
DQMHistoTests: Total failures: 36278
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3519195
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 211 log files, 162 edm output root files, 49 DQM output files
TriggerResults: found differences in 2 / 47 workflows

silviodonato · 2023-02-02T10:20:32Z

I'm surprised to see so many differences even at simulation level in (eg. 13234.0 and 13434.0, see https://tinyurl.com/27gnfjt9 ), do you think they are just statistical fluctuation of the PU mixing?
The differences in workflows without PU is minimal and related to mainly to tracking, as expected (see 140.56 https://tinyurl.com/2bbnbm53)

smuzaffar · 2023-02-02T10:23:23Z

please test

lets rerun the tests based on latest IB. There were a lot of other changes which were accumulated for last tests

silviodonato · 2023-02-02T11:14:43Z

The timing seems ok, but I let @cms-sw/reconstruction-l2 comment more on this

11834.21 TTbar_14TeV_TuneCP5_2021PU_GenSim+DigiPU_ProdLike_2021PU+RecoNanoPU_ProdLike_2021PU+MiniAODPU_ProdLike_2021PU+NanoPU_ProdLike_2021PU 
Cumulative: 546.4 -> 557.2
cms::CkfTrackCandidateMakerBase::produceBase: 11.7 -> 11.9
CAHitNtupletEDProducerT<CAHitQuadrupletGenerator>::produce: 7.9 -> 6.9

21034.21 TTbar_14TeV_TuneCP5_2026D88PU_GenSimHLBeamSpot14_ProdLike+DigiTriggerPU_ProdLike_2026D88PU+RecoGlobalPU_ProdLike_2026D88PU+MiniAODPU_ProdLike_2026D88PU 
Cumulative: 142.7 -> 139.1
cms::CkfTrackCandidateMakerBase::produceBase:  40.97 -> 40.38

fwyzard · 2023-02-02T12:55:37Z

The timing seems ok

What is the impact on the HLT timing, running on 2022 data ?

slava77 · 2023-02-02T13:22:48Z

@cms-sw/reconstruction-l2
the igprof results in the data workflow do not seem right
https://cmssdt.cern.ch/SDT/cgi-bin/igprof-navigator/CMSSW_13_0_X_2023-01-31-2300/el8_amd64_gcc11/profiling/136.889/PR-2deb6c/30324/igprofCPU_step3
this looks more like a harvesting step
Please check.

cmsbuild · 2023-02-02T13:53:55Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30351/summary.html
COMMIT: cf7c106
CMSSW: CMSSW_13_0_X_2023-02-01-1100/el8_amd64_gcc11
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8280/30351/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30351/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30351/git-merge-result

Comparison Summary

Summary:

You potentially added 16 lines to the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 1019 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3555495
DQMHistoTests: Total failures: 21283
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3534190
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 211 log files, 162 edm output root files, 49 DQM output files
TriggerResults: found differences in 5 / 47 workflows

missirol · 2023-02-08T16:17:23Z

reconstruction-l2 please sign it if you are happy with the performance/profiling results

As mentioned in yesterday's ORP, HLT would like to have this PR merged in time for 13_0_0_pre4.

@cms-sw/reconstruction-l2 , do you have any further comments ?

(reco already signed in #8280 (comment), but that was before the latest round of tests.)

mandrenguyen · 2023-02-08T16:35:30Z

+reconstruction
Changes to reco timing look basically negligible in profiling

cmsbuild · 2023-02-08T16:35:55Z

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_13_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

smuzaffar · 2023-02-08T16:39:42Z

@perrotta, @rappoccio this is ready to go in , feel free to merge it for tonight's IB

smuzaffar · 2023-02-08T20:57:26Z

OK, I am merging it for tonights IB

perrotta · 2023-02-08T22:02:31Z

+1

Thank you @smuzaffar for having merged

silviodonato · 2023-02-09T08:41:29Z

@cms-sw/core-l2 @perrotta @rappoccio I think this PR caused the failure of the build in ARM (el8_aarch64_gcc11)

c++: error: unrecognized command-line option '-mrecip=none'

I hope this can be solved with something similar to 70f1969

smuzaffar · 2023-02-09T08:43:02Z

#8302 should fix the aarch64 issue.

silviodonato · 2023-03-03T13:14:28Z

For reference, @gparida run in 13_0_0_pre4 the comparison Intel vs AMD and he found no difference as expected:
https://docs.google.com/spreadsheets/d/1s2SZ-85s1-2_TcPYoU0cO-awApDJkj0eLSE0VzDAzU8/edit#gid=995113825

-Ofast implies -funsafe-math-optimizations, which affects the global FP state for all programs: when used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. -Ofast enables (directly or indirectly): - ❌ -fallow-store-data-races - ✔ -fno-semantic-interposition - ✔ -fno-math-errno - ❌ -funsafe-math-optimizations - ❌ -ffinite-math-only - ✔ -fno-rounding-math (_default_) - ✔ -fno-signaling-nans (_default_) - ❔ -fcx-limited-range - ✔ -fexcess-precision=fast - ✔ -fno-signed-zeros - ✔ -fno-trapping-math - ✔ -fassociative-math - ❌ -freciprocal-math (cms-sw#8280) We should not use -fallow-store-data-races, -ffinite-math-only. We should not use -funsafe-math-optimizations at link time, as it affects the global FP state. It's probably easier to replace it with the individual options -fno-signed-zeros, -fno-trapping-math and -fassociative-math. We may revisit -fcx-limited-range if we do use Complex arithmetics. -freciprocal-math was already disabled explicitly, see cms-sw#8280.

-Ofast implies -funsafe-math-optimizations, which affects the global FP state for all programs: when used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. -Ofast enables (directly or indirectly): - ❌ -fallow-store-data-races - ✔ -fno-semantic-interposition - ✔ -fno-math-errno - ❌ -funsafe-math-optimizations - ❌ -ffinite-math-only - ✔ -fno-rounding-math (_default_) - ✔ -fno-signaling-nans (_default_) - ❔ -fcx-limited-range - ✔ -fexcess-precision=fast - ✔ -fno-signed-zeros - ✔ -fno-trapping-math - ✔ -fassociative-math - ❌ -freciprocal-math (disabled in cms-sw#8280) We should not use -fallow-store-data-races, -ffinite-math-only. We should not use -funsafe-math-optimizations at link time, as it affects the global FP state. It's probably easier to replace it with the individual options -fno-signed-zeros, -fno-trapping-math and -fassociative-math. We may revisit -fcx-limited-range if we do use Complex arithmetics. -freciprocal-math was already disabled explicitly, see cms-sw#8280.

add -fno-reciprocal-math -mno-recip to Ofast to solve the AMD/Intel d…

cf7c106

…ifferences

cmsbuild added tests-pending externals-pending pending-signatures orp-pending labels Feb 1, 2023

cmsbuild added tests-started and removed tests-pending labels Feb 1, 2023

silviodonato mentioned this pull request Feb 1, 2023

Removal of ofast-flag (AMD/Intel differences) cms-sw/cmssw#40649

Closed

cmsbuild added tests-approved and removed tests-started labels Feb 1, 2023

cmsbuild added tests-started and removed tests-approved labels Feb 2, 2023

cmsbuild added hlt-approved and removed hlt-pending labels Feb 8, 2023

cmsbuild added reconstruction-approved fully-signed and removed reconstruction-pending pending-signatures labels Feb 8, 2023

smuzaffar merged commit 1be93a5 into cms-sw:IB/CMSSW_13_0_X/master Feb 8, 2023

cmsbuild added orp-approved and removed orp-pending labels Feb 8, 2023

cmsbuild mentioned this pull request Feb 9, 2023

CUDA, ROCm and Alpaka-related updates [13.0.x] cms-sw/cmssw#40725

Merged

cmsbuild mentioned this pull request Feb 9, 2023

Run3-sim138 Changes which will allow the usage of new shower library (the cfi file in Geometry/HcalSimData will be modified once the new library appears) cms-sw/cmssw#40703

Merged

mandrenguyen mentioned this pull request Mar 23, 2023

-Ofast and CPU reco timing cms-sw/cmssw#40089

Closed

srlantz mentioned this pull request Mar 31, 2023

mkFit: standalone build fixes 2 trackreco/cmssw#122

Merged

srlantz mentioned this pull request Apr 7, 2023

Make standalone compiler options match what CMSSW is doing for -ffast-math or similar trackreco/cmssw#123

Closed

srlantz mentioned this pull request Apr 14, 2023

Use ofast-flag rather than fast-math for mkFit trackreco/cmssw#124

Merged

mmasciov mentioned this pull request Apr 19, 2023

Use ofast-flag rather than fast-math for mkFit cms-sw/cmssw#41374

Merged

makortel mentioned this pull request May 24, 2023

re-evaluate the optimisation flags for liblzma.so cms-sw/cmssw#41773

Closed

slava77 mentioned this pull request Mar 26, 2024

HLT Validation of x86-64-v3 cms-sw/cmssw#44550

Closed

fwyzard mentioned this pull request Jan 11, 2026

Do not use -Ofast #10267

Open

Add -fno-reciprocal-math -mrecip=none to Ofast to solve the AMD/Intel differences #8280

Add -fno-reciprocal-math -mrecip=none to Ofast to solve the AMD/Intel differences #8280

Uh oh!

Conversation

silviodonato commented Feb 1, 2023 • edited by smuzaffar Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmsbuild commented Feb 1, 2023

Uh oh!

silviodonato commented Feb 1, 2023

Uh oh!

iarspider commented Feb 1, 2023

Uh oh!

iarspider commented Feb 1, 2023

Uh oh!

iarspider commented Feb 1, 2023

Uh oh!

silviodonato commented Feb 1, 2023

Uh oh!

Martin-Grunewald commented Feb 1, 2023

Uh oh!

gparida commented Feb 1, 2023

Uh oh!

silviodonato commented Feb 1, 2023

Uh oh!

silviodonato commented Feb 1, 2023

Uh oh!

smuzaffar commented Feb 1, 2023

Uh oh!

cmsbuild commented Feb 1, 2023

Comparison Summary

Uh oh!

silviodonato commented Feb 2, 2023

Uh oh!

smuzaffar commented Feb 2, 2023

Uh oh!

silviodonato commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fwyzard commented Feb 2, 2023

Uh oh!

slava77 commented Feb 2, 2023

Uh oh!

cmsbuild commented Feb 2, 2023

Comparison Summary

Uh oh!

missirol commented Feb 8, 2023

Uh oh!

mandrenguyen commented Feb 8, 2023

Uh oh!

cmsbuild commented Feb 8, 2023

Uh oh!

smuzaffar commented Feb 8, 2023

Uh oh!

smuzaffar commented Feb 8, 2023

Uh oh!

perrotta commented Feb 8, 2023

Uh oh!

silviodonato commented Feb 9, 2023

Uh oh!

smuzaffar commented Feb 9, 2023

Uh oh!

silviodonato commented Mar 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

silviodonato commented Feb 1, 2023 •

edited by smuzaffar

Loading

silviodonato commented Feb 2, 2023 •

edited

Loading