Skip to content

[TBB] Fix concurrent_[bounded_]queue correctness on weak memory models [13.0.x]#8358

Merged
smuzaffar merged 1 commit intocms-sw:IB/CMSSW_13_0_X/masterfrom
fwyzard:IB/CMSSW_13_0_X/master_fix_TBB_weak_memory
Mar 28, 2023
Merged

[TBB] Fix concurrent_[bounded_]queue correctness on weak memory models [13.0.x]#8358
smuzaffar merged 1 commit intocms-sw:IB/CMSSW_13_0_X/masterfrom
fwyzard:IB/CMSSW_13_0_X/master_fix_TBB_weak_memory

Conversation

@fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Mar 4, 2023

Applied oneapi-src/oneTBB#782 patch on top of v2021.8.0 .
This should fix the testFWCoreUtilities failure in ARM IBs

Co-authored-by: Ivan Razumov <ivan.razumov@cern.ch>
Co-authored-by: Andrea Bocci <andrea.bocci@cern.ch>
@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

type bugfix

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

backport #8355

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2023

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_13_0_X/master.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

please test for el8_ppc64le_gcc11

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

please test for el8_aarch64_gcc11

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

urgent

@perrotta, @rappoccio, as this is a bug fix, can we have it in 13.0.0 ?

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31070/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-02-2300/el8_aarch64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8358/31070/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31070/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31070/git-merge-result

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2023

-1

Failed Tests: UnitTests RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31071/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-02-2300/el8_ppc64le_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8358/31071/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31071/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31071/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test testONNXRuntime had ERRORS
---> test testFWCoreConcurrency had ERRORS

RelVals

  • 11634.011634.0_TTbar_14TeV+2021/step1_TTbar_14TeV+2021.log
  • 11634.711634.7_TTbar_14TeV+2021_trackingMkFit/step1_TTbar_14TeV+2021_trackingMkFit.log
  • 11634.91111634.911_TTbar_14TeV+2021_DD4hep/step1_TTbar_14TeV+2021_DD4hep.log
Expand to see more relval errors ...

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 5, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31069/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-04-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8358/31069/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3557934
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3557909
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 7, 2023

please test for el8_ppc64le_gcc11

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 8, 2023

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31134/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-05-2300/el8_ppc64le_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8358/31134/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test testONNXRuntime had ERRORS
---> test testFWCoreConcurrency had ERRORS

@smuzaffar
Copy link
Contributor

please test

@smuzaffar
Copy link
Contributor

please test for el8_aarch64_gcc11

@perrotta
Copy link
Contributor

@fwyzard I started this morning the build of CMSSW_13_0_1 and I forgot to (test and, if successful) include this one
I could stop the build, if needed. However, as far as I understand this is not strictly necessary, and it could even enter a forthcoming new 13_0_X release (expected soon because also PPS needs to include some update for the data taking): if so, I would not delay the ongoing build of 13_0_1, requested by HLT and L1T.
Please let us know,

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 23, 2023

Hi @perrotta,
you can go ahead with the ongoing build.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 23, 2023

The long version is: this PR fixes a problem in TBB queues on ARM and Power - but we've lived with it for years now, so empirically it should not break anything.

Given how TBB queues are used in the framework, it may lead to a non optimal reuse of resources, but it should introduce any incorrect behaviour.

While there may be other code that benefits from the optimal behaviour, a concurrent_queue cannot really guarantee that it is empty (another thread could push to it right after the check was made), so nothing should actually rely on it.

So, the fix is good to have, but not critical.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31539/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-21-2300/el8_aarch64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8358/31539/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31539/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31539/git-merge-result

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31541/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-22-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8358/31541/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 8 lines from the logs
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3552993
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3552968
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_13_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@smuzaffar
Copy link
Contributor

merging it for next 13.0.X IB so that it can be part of next 13.0.2 release

@smuzaffar smuzaffar merged commit eb7f67d into cms-sw:IB/CMSSW_13_0_X/master Mar 28, 2023
@fwyzard fwyzard deleted the IB/CMSSW_13_0_X/master_fix_TBB_weak_memory branch February 2, 2025 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants