Skip to content

Conversation

@dan131riley
Copy link

PR description:

UBSAN jobs are failing in recent IBs with the runtime error unreachable program point. This is mostly seen in NANOAOD production jobs that read run products at BeginRun, but also appears in the framework TestFWCoreFrameworkGlobalStreamOne unit test. See #49151 for details.

This PR disables UBSAN checks for the problematic routines. While unreachable program point usually means some undefined behavior that the optimizer is trying to take advantage of, no UB has been found. Selective disabling of the UBSAN checks also failed to work around the problem, so this PR completely disables UBSAN for the routines in question.

Resolves #49151
Resolves cms-sw/framework-team#1610

PR validation:

Compiles, verified to fix the TestFWCoreFrameworkGlobalStreamOne unit test. Purely technical fix.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2025

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2025

A new Pull Request was created by @dan131riley for master.

It involves the following packages:

  • FWCore/Framework (core)

@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks.
@makortel, @wddgit this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor

makortel commented Nov 4, 2025

@cmsbuild, please test

@makortel
Copy link
Contributor

makortel commented Nov 4, 2025

@cmsbuild, please test for CMSSW_16_0_UBSAN_X

@makortel
Copy link
Contributor

makortel commented Nov 4, 2025

Visually looks ok to me. @smuzaffar What do you think?

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2025

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4e6daf/49239/summary.html
COMMIT: fb59675
CMSSW: CMSSW_16_0_X_2025-11-04-1100/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49301/49239/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3939953
  • DQMHistoTests: Total failures: 50
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3939883
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 218 log files, 188 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2025

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4e6daf/49240/summary.html
COMMIT: fb59675
CMSSW: CMSSW_16_0_UBSAN_X_2025-11-03-2300/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49301/49240/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Unit Tests

I found 1 errors in the following unit tests:

---> test TestFWCoreFrameworkGlobalStreamOne had ERRORS

@makortel
Copy link
Contributor

makortel commented Nov 4, 2025

---> test TestFWCoreFrameworkGlobalStreamOne had ERRORS

The test failed with

���H��:1082210368:5331: runtime error: execution reached an unreachable program point
    #0 0x14d33fcbd03e in virtual thunk to edm::limited::impl::RunCacheHolder<edm::limited::EDFilterBase, edmtest::limited::(anonymous namespace)::Cache>::doBeginRun_(edm::Run const&, edm::EventSetup const&) (/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_16_0_UBSAN_X_2025-11-03-2300/lib/el8_amd64_gcc13/pluginTestLimitedModules.so+0x2bd03e) (BuildId: 0720788e68a71a3f67da93fa0ee69f8347df9dcc)
    #1 0x14d375c35242 in edm::limited::EDFilterBase::doBeginRun(edm::RunTransitionInfo const&, edm::ModuleCallingContext const*) src/FWCore/Framework/src/limited/EDFilterBase.cc:139
    #2 0x14d375b674cf in edm::WorkerT<edm::limited::EDFilterBase>::implDoBegin(edm::RunTransitionInfo const&, edm::ModuleCallingContext const*) src/FWCore/Framework/src/WorkerT.cc:464
    #3 0x14d3751e020e in edm::workerhelper::CallImpl<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::call(edm::Worker*, edm::StreamID, edm::RunTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::GlobalContext const*) src/FWCore/Framework/interface/maker/Worker.h:638
    #4 0x14d3751e020e in edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}::operator()() const src/FWCore/Framework/interface/maker/Worker.h:1183
    #5 0x14d3751f30d1 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}) /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/cms/cmssw/CMSSW_16_0_UBSAN_X_2025-11-03-2300/src/FWCore/Utilities/interface/ConvertException.h:21
    #6 0x14d3751f354b in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*) src/FWCore/Framework/interface/maker/Worker.h:1182
    #7 0x14d3751f354b in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*) src/FWCore/Framework/interface/maker/Worker.h:1096
    #8 0x14d3751f425a in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}::operator()() const src/FWCore/Framework/interface/maker/Worker.h:444
    #9 0x14d3751f425a in edm::LimitedTaskQueue::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(tbb::detail::d2::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&)::{lambda()#1}::operator()() /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/cms/cmssw/CMSSW_16_0_UBSAN_X_2025-11-03-2300/src/FWCore/Concurrency/interface/LimitedTaskQueue.h:121
    #10 0x14d3751f425a in edm::SerialTaskQueue::QueuedTask<edm::LimitedTaskQueue::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(tbb::detail::d2::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/cms/cmssw/CMSSW_16_0_UBSAN_X_2025-11-03-2300/src/FWCore/Concurrency/interface/SerialTaskQueue.h:177
    #11 0x14d373957b3b in operator() src/FWCore/Concurrency/src/SerialTaskQueue.cc:46
    #12 0x14d373957b3b in task_ptr_or_nullptr_impl<const edm::SerialTaskQueue::spawn(TaskBase&)::<lambda()>&> /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc13/external/tbb/v2022.3.0-be320b1b82e025fbed5ae22f47d43c2d/include/oneapi/tbb/task_group.h:149
    #13 0x14d373957b3b in task_ptr_or_nullptr<const edm::SerialTaskQueue::spawn(TaskBase&)::<lambda()>&> /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc13/external/tbb/v2022.3.0-be320b1b82e025fbed5ae22f47d43c2d/include/oneapi/tbb/task_group.h:159
    #14 0x14d373957b3b in execute /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc13/external/tbb/v2022.3.0-be320b1b82e025fbed5ae22f47d43c2d/include/oneapi/tbb/task_group.h:101
    #15 0x14d3774b62f2 in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) src/tbb/task_dispatcher.h:344
    #16 0x14d3774b62f2 in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) src/tbb/task_dispatcher.h:487
    #17 0x14d3774b62f2 in tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) src/tbb/task_dispatcher.cpp:169
    #18 0x14d375111655 in tbb::detail::d1::wait(tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/external/tbb/v2022.3.0-be320b1b82e025fbed5ae22f47d43c2d/include/oneapi/tbb/detail/_task.h:242
    #19 0x14d375111655 in tbb::detail::d2::task_group_base::wait()::{lambda()#1}::operator()() const /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/external/tbb/v2022.3.0-be320b1b82e025fbed5ae22f47d43c2d/include/oneapi/tbb/task_group.h:599
    #20 0x14d375111655 in void tbb::detail::d0::try_call_proxy<tbb::detail::d2::task_group_base::wait()::{lambda()#1}>::on_completion<tbb::detail::d2::task_group_base::wait()::{lambda()#2}>(tbb::detail::d2::task_group_base::wait()::{lambda()#2}) /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/external/tbb/v2022.3.0-be320b1b82e025fbed5ae22f47d43c2d/include/oneapi/tbb/detail/_template_helpers.h:234
    #21 0x14d375111655 in tbb::detail::d2::task_group_base::wait() /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/external/tbb/v2022.3.0-be320b1b82e025fbed5ae22f47d43c2d/include/oneapi/tbb/task_group.h:600
    #22 0x14d375111655 in edm::FinalWaitingTask::wait() /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/cms/cmssw/CMSSW_16_0_UBSAN_X_2025-11-03-2300/src/FWCore/Concurrency/interface/FinalWaitingTask.h:42
    #23 0x14d37505942c in edm::EventProcessor::processRuns() src/FWCore/Framework/src/EventProcessor.cc:1195
    #24 0x14d3750a4596 in processRuns src/FWCore/Framework/src/TransitionProcessors.icc:84
    #25 0x14d3750a4596 in processFiles src/FWCore/Framework/src/TransitionProcessors.icc:115
    #26 0x14d3750a4596 in operator() src/FWCore/Framework/src/EventProcessor.cc:949
    #27 0x14d3750a4596 in wrap<edm::EventProcessor::runToCompletion()::<lambda()> > /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/cms/cmssw/CMSSW_16_0_UBSAN_X_2025-11-03-2300/src/FWCore/Utilities/interface/ConvertException.h:21
    #28 0x14d3750a4596 in edm::EventProcessor::runToCompletion() src/FWCore/Framework/src/EventProcessor.cc:938
    #29 0x40a009 in operator() src/FWCore/Framework/bin/cmsRun.cpp:281
    #30 0x40a009 in operator() /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/external/tbb/v2022.3.0-be320b1b82e025fbed5ae22f47d43c2d/include/oneapi/tbb/task_arena.h:71
    #31 0x14d3774a58c1 in tbb::detail::r1::task_arena_impl::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) src/tbb/arena.cpp:860
    #32 0x40d463 in execute_impl<void, main(int, char const**)::<lambda()>::<lambda()> > /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/external/tbb/v2022.3.0-be320b1b82e025fbed5ae22f47d43c2d/include/oneapi/tbb/task_arena.h:304
    #33 0x40d463 in execute<main(int, char const**)::<lambda()>::<lambda()> > /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/external/tbb/v2022.3.0-be320b1b82e025fbed5ae22f47d43c2d/include/oneapi/tbb/task_arena.h:527
    #34 0x40d463 in operator() src/FWCore/Framework/bin/cmsRun.cpp:264
    #35 0x407f84 in wrap<main(int, char const**)::<lambda()> > /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02914/el8_amd64_gcc13/cms/cmssw/CMSSW_16_0_UBSAN_X_2025-11-03-2300/src/FWCore/Utilities/interface/ConvertException.h:21
    #36 0x407f84 in main src/FWCore/Framework/bin/cmsRun.cpp:104
    #37 0x14d3708637e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: 9846edf82646848f2857c47c5a2eb71c288059ec)
    #38 0x40805d in _start (/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_16_0_UBSAN_X_2025-11-03-2300/bin/el8_amd64_gcc13/cmsRun+0x40805d) (BuildId: f6112ffdc7c6766143da9e5135fb873567327930)

@dan131riley
Copy link
Author

Interesting. I only did the global versions, not limited (or any others). It doesn't replicate on my development system, but does on lxplus. It does seem to be somewhat probabilistic, I think it depends on the alignment of the this pointer after it has been offset in the thunk for the virtual inheritance.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 5, 2025

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 5, 2025

Pull request #49301 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UBSAN IBs failing with "unreachable program point" Address UBSAN runtime errors in doBeginRunSummary_

3 participants