Skip to content

Conversation

@silviodonato
Copy link
Contributor

@silviodonato silviodonato commented Feb 1, 2023

About one year ago we discovered differences of few percents between AMD and Intel in few HLT paths when rerunning the HLT [1]. The situation got worse when we added the parking di-electron paths with dZ path, which gave a difference of ~8% [2].

After several failed attempts, @gparida managed to solve the differences following @VinInn suggestion [3] to use -Ofast -fno-reciprocal-math -mno-recip.

@gparida replaced <use name="ofast-flag"/> with <flags CXXFLAGS="-Ofast -fno-reciprocal-math -mrecip=none"/> in all BuildFiles.xml (except test folders to save compilation time)

"RecoEgamma/EgammaElectronAlgos/BuildFile.xml"
"RecoEgamma/EgammaPhotonAlgos/BuildFile.xml"
"RecoPixelVertexing/PixelTriplets/BuildFile.xml"
"RecoPixelVertexing/PixelTriplets/plugins/BuildFile.xml"
"RecoTracker/FinalTrackSelectors/BuildFile.xml"
"RecoTracker/FinalTrackSelectors/plugins/BuildFile.xml"
"RecoTracker/TkDetLayers/BuildFile.xml"
"RecoTracker/TkHitPairs/BuildFile.xml"
"RecoVertex/PrimaryVertexProducer/BuildFile.xml"
"TrackingTools/GsfTools/BuildFile.xml"
"TrackingTools/GsfTools/plugins/BuildFile.xml"
"TrackingTools/GsfTracking/BuildFile.xml"
"TrackingTools/KalmanUpdators/BuildFile.xml"
"TrackingTools/MaterialEffects/BuildFile.xml"
"TrackingTools/MaterialEffects/plugins/BuildFile.xml"
"TrackingTools/TrackFitters/BuildFile.xml"
"TrackingTools/TrajectoryState/BuildFile.xml"

and found no AMD/Intel differences @gparida please share your results here (which release did you use?).

So I made this PR to implement this change directly at cmsdist level
(but I didn't test it compiling the externals).

I don't know exactly the impact on timing, but I assume it will be much smaller than 1%, that was the impact of the full removal of Ofast [4]

[1] https://its.cern.ch/jira/browse/CMSHLT-2257
[2] https://docs.google.com/spreadsheets/d/1XRu1sfIYJUQ0e7Z_yJWdGrxTDA31zr33uWPFY54-APc
[3] https://its.cern.ch/jira/browse/CMSHLT-2257?focusedCommentId=4686018&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-4686018
[4] cms-sw/cmssw#40089

cc @cms-sw/hlt-l2 @cms-sw/reconstruction-l2 @gparida @sanuvarghese @cms-sw/core-l2

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 1, 2023

A new Pull Request was created by @silviodonato (Silvio Donato) for branch IB/CMSSW_13_0_X/master.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@silviodonato
Copy link
Contributor Author

I suggest to run test using "enable profiling" (I think I don't have anymore the rights to start tests)

@iarspider
Copy link
Contributor

test parameters:

  • enable profiling

@iarspider
Copy link
Contributor

test parameters:

  • enable = profiling

@iarspider
Copy link
Contributor

please test

@silviodonato
Copy link
Contributor Author

This PR replaces cms-sw/cmssw#40649

@Martin-Grunewald
Copy link

Is this making the change globally?
And will then be overwritten by the (different) Ofast setting still in the local package buildfiles?

@gparida
Copy link

gparida commented Feb 1, 2023

Hi everyone, the test was performed using CMSSW_12_4_11 with the menu generated from
hltGetConfiguration /dev/CMSSW_12_4_0/GRun/V188 --full --offline --no-output --data --process MYHLT --type GRun --unprescale --globaltag 124X_dataRun3_HLT_v7 --max-events -1 - running on Run3 files.

@silviodonato
Copy link
Contributor Author

cc @goys

@silviodonato
Copy link
Contributor Author

@Martin-Grunewald , yes, afaik this change should affect all modules, but I let people from @cms-sw/core-l2 to comment more on this.

These are the PRs which moved all packages from <flags CXXFLAGS="-Ofast"/> to <use name="ofast-flag"/>
cmssw: cms-sw/cmssw#24969
cmsdist: #4438

@smuzaffar
Copy link
Contributor

@silviodonato @Martin-Grunewald , change will only apply to those packages which directly have <use name="ofast-flag"/> . There should not be any explicit <flags CXXFLAGS="-Ofast"/> in cmssw/package/Buidlfile.xml but if there are any then those packages will just have a union of flags from ofast-flag tool and their local flags.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 1, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30324/summary.html
COMMIT: cf7c106
CMSSW: CMSSW_13_0_X_2023-01-31-2300/el8_amd64_gcc11
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8280/30324/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 55 lines to the logs
  • Reco comparison results: 7354 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555495
  • DQMHistoTests: Total failures: 36278
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3519195
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: found differences in 2 / 47 workflows

@silviodonato
Copy link
Contributor Author

I'm surprised to see so many differences even at simulation level in (eg. 13234.0 and 13434.0, see https://tinyurl.com/27gnfjt9 ), do you think they are just statistical fluctuation of the PU mixing?
The differences in workflows without PU is minimal and related to mainly to tracking, as expected (see 140.56 https://tinyurl.com/2bbnbm53)

@smuzaffar
Copy link
Contributor

please test

lets rerun the tests based on latest IB. There were a lot of other changes which were accumulated for last tests

@silviodonato
Copy link
Contributor Author

silviodonato commented Feb 2, 2023

The timing seems ok, but I let @cms-sw/reconstruction-l2 comment more on this

11834.21 TTbar_14TeV_TuneCP5_2021PU_GenSim+DigiPU_ProdLike_2021PU+RecoNanoPU_ProdLike_2021PU+MiniAODPU_ProdLike_2021PU+NanoPU_ProdLike_2021PU 
Cumulative: 546.4 -> 557.2
cms::CkfTrackCandidateMakerBase::produceBase: 11.7 -> 11.9
CAHitNtupletEDProducerT<CAHitQuadrupletGenerator>::produce: 7.9 -> 6.9
21034.21 TTbar_14TeV_TuneCP5_2026D88PU_GenSimHLBeamSpot14_ProdLike+DigiTriggerPU_ProdLike_2026D88PU+RecoGlobalPU_ProdLike_2026D88PU+MiniAODPU_ProdLike_2026D88PU 
Cumulative: 142.7 -> 139.1
cms::CkfTrackCandidateMakerBase::produceBase:  40.97 -> 40.38

@fwyzard
Copy link
Contributor

fwyzard commented Feb 2, 2023

The timing seems ok

What is the impact on the HLT timing, running on 2022 data ?

@slava77
Copy link
Contributor

slava77 commented Feb 2, 2023

@cms-sw/reconstruction-l2
the igprof results in the data workflow do not seem right
https://cmssdt.cern.ch/SDT/cgi-bin/igprof-navigator/CMSSW_13_0_X_2023-01-31-2300/el8_amd64_gcc11/profiling/136.889/PR-2deb6c/30324/igprofCPU_step3
this looks more like a harvesting step
Please check.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 2, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30351/summary.html
COMMIT: cf7c106
CMSSW: CMSSW_13_0_X_2023-02-01-1100/el8_amd64_gcc11
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8280/30351/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30351/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30351/git-merge-result

Comparison Summary

Summary:

  • You potentially added 16 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 1019 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555495
  • DQMHistoTests: Total failures: 21283
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3534190
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: found differences in 5 / 47 workflows

@missirol
Copy link
Contributor

missirol commented Feb 8, 2023

reconstruction-l2 please sign it if you are happy with the performance/profiling results

As mentioned in yesterday's ORP, HLT would like to have this PR merged in time for 13_0_0_pre4.

@cms-sw/reconstruction-l2 , do you have any further comments ?

(reco already signed in #8280 (comment), but that was before the latest round of tests.)

@mandrenguyen
Copy link

+reconstruction
Changes to reco timing look basically negligible in profiling

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 8, 2023

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_13_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@smuzaffar
Copy link
Contributor

@perrotta, @rappoccio this is ready to go in , feel free to merge it for tonight's IB

@smuzaffar
Copy link
Contributor

OK, I am merging it for tonights IB

@smuzaffar smuzaffar merged commit 1be93a5 into cms-sw:IB/CMSSW_13_0_X/master Feb 8, 2023
@perrotta
Copy link
Contributor

perrotta commented Feb 8, 2023

+1

@silviodonato
Copy link
Contributor Author

@cms-sw/core-l2 @perrotta @rappoccio I think this PR caused the failure of the build in ARM (el8_aarch64_gcc11)

c++: error: unrecognized command-line option '-mrecip=none'

I hope this can be solved with something similar to 70f1969

@smuzaffar
Copy link
Contributor

#8302 should fix the aarch64 issue.

@silviodonato
Copy link
Contributor Author

For reference, @gparida run in 13_0_0_pre4 the comparison Intel vs AMD and he found no difference as expected:
https://docs.google.com/spreadsheets/d/1s2SZ-85s1-2_TcPYoU0cO-awApDJkj0eLSE0VzDAzU8/edit#gid=995113825

fwyzard added a commit to fwyzard/cmsdist that referenced this pull request Jan 11, 2026
-Ofast implies -funsafe-math-optimizations, which affects the global FP state
for all programs: when used at link time, it may include libraries or startup
files that change the default FPU control word or other similar optimizations.

-Ofast enables (directly or indirectly):
  - ❌ -fallow-store-data-races
  - ✔ -fno-semantic-interposition
  - ✔ -fno-math-errno
  - ❌ -funsafe-math-optimizations
  - ❌ -ffinite-math-only
  - ✔ -fno-rounding-math (_default_)
  - ✔ -fno-signaling-nans (_default_)
  - ❔ -fcx-limited-range
  - ✔ -fexcess-precision=fast
  - ✔ -fno-signed-zeros
  - ✔ -fno-trapping-math
  - ✔ -fassociative-math
  - ❌ -freciprocal-math (cms-sw#8280)

We should not use -fallow-store-data-races, -ffinite-math-only.

We should not use -funsafe-math-optimizations at link time, as it affects the
global FP state. It's probably easier to replace it with the individual options
-fno-signed-zeros, -fno-trapping-math and -fassociative-math.

We may revisit -fcx-limited-range if we do use Complex arithmetics.

-freciprocal-math was already disabled explicitly, see cms-sw#8280.
@fwyzard fwyzard mentioned this pull request Jan 11, 2026
fwyzard added a commit to fwyzard/cmsdist that referenced this pull request Jan 13, 2026
-Ofast implies -funsafe-math-optimizations, which affects the global FP state
for all programs: when used at link time, it may include libraries or startup
files that change the default FPU control word or other similar optimizations.

-Ofast enables (directly or indirectly):
  - ❌ -fallow-store-data-races
  - ✔ -fno-semantic-interposition
  - ✔ -fno-math-errno
  - ❌ -funsafe-math-optimizations
  - ❌ -ffinite-math-only
  - ✔ -fno-rounding-math (_default_)
  - ✔ -fno-signaling-nans (_default_)
  - ❔ -fcx-limited-range
  - ✔ -fexcess-precision=fast
  - ✔ -fno-signed-zeros
  - ✔ -fno-trapping-math
  - ✔ -fassociative-math
  - ❌ -freciprocal-math (disabled in cms-sw#8280)

We should not use -fallow-store-data-races, -ffinite-math-only.

We should not use -funsafe-math-optimizations at link time, as it affects the
global FP state. It's probably easier to replace it with the individual options
-fno-signed-zeros, -fno-trapping-math and -fassociative-math.

We may revisit -fcx-limited-range if we do use Complex arithmetics.

-freciprocal-math was already disabled explicitly, see cms-sw#8280.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.