Skip to content

segfaults in PluginFactoryBase::findPMaker() #32344

@dan131riley

Description

@dan131riley

We've been getting occasional segfaults in PluginFactoryBase::findPMaker(), including a recent one in the CLANG IB which seems especially informative:

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_11_3_CLANG_X_2020-11-29-2300/pyRelValMatrixLogs/run/23434.99_TTbar_14TeV+2026D49PU_PMXS1S2+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU/step2_TTbar_14TeV+2026D49PU_PMXS1S2+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU.log#/

Looking at the stack trace I'm wondering if we've got a race window before PluginMakerInfo.m_ptr gets zeroed by the constructor. Should we be using the zero_allocator recommended here:

https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Advanced_Idiom_Waiting_on_an_Element.html

Also, I have doubts about the correctness of this comment:

//NOTE: this has to be last since once it is non zero it signals
// that the construction has finished
std::atomic<void*> m_ptr;

The constructor initializer lists zero m_ptr, so we want that first, then setting it to non-zero in the body of the constructor is last. However, that would still leave a window open before the constructor is even called, which is why I think we need the zeroing allocator so that it's zeroed before being put into the vector.

Stack trace:

Thread 11 (Thread 0x2b978a800700 (LWP 7164)):
#3  0x00002b9756a825af in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b974e402600 in edmplugin::PluginFactoryBase::findPMaker(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#6  0x00002b977bc9b907 in AttachSD::create(edm::EventSetup const&, SensitiveDetectorCatalog const&, edm::ParameterSet const&, SimTrackManager const*, SimActivityRegistry&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreSensitiveDetector.so
#7  0x00002b977bc77673 in RunManagerMTWorker::initializeG4(RunManagerMT*, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreApplication.so
#8  0x00002b977bc25266 in OscarMTProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginSimG4CoreApplicationPlugins.so

Thread 10 (Thread 0x2b9789801700 (LWP 7163)):
#2  0x00002b9756a82260 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b97506cc54d in __lll_lock_wait () from /lib64/libpthread.so.0
#5  0x00002b97506c7eb6 in _L_lock_941 () from /lib64/libpthread.so.0
#6  0x00002b97506c7daf in pthread_mutex_lock () from /lib64/libpthread.so.0
#7  0x00002b974e4062b8 in edmplugin::PluginManager::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#8  0x00002b974e402596 in edmplugin::PluginFactoryBase::findPMaker(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#9  0x00002b977bc9b907 in AttachSD::create(edm::EventSetup const&, SensitiveDetectorCatalog const&, edm::ParameterSet const&, SimTrackManager const*, SimActivityRegistry&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreSensitiveDetector.so
#10 0x00002b977bc77673 in RunManagerMTWorker::initializeG4(RunManagerMT*, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreApplication.so
#11 0x00002b977bc25266 in OscarMTProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginSimG4CoreApplicationPlugins.so

Thread 9 (Thread 0x2b9788e00700 (LWP 7162)):
#2  0x00002b9756a82260 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b97506cc54d in __lll_lock_wait () from /lib64/libpthread.so.0
#5  0x00002b97506c7eb6 in _L_lock_941 () from /lib64/libpthread.so.0
#6  0x00002b97506c7daf in pthread_mutex_lock () from /lib64/libpthread.so.0
#7  0x00002b974e4062b8 in edmplugin::PluginManager::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#8  0x00002b974e402596 in edmplugin::PluginFactoryBase::findPMaker(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#9  0x00002b977bc9b907 in AttachSD::create(edm::EventSetup const&, SensitiveDetectorCatalog const&, edm::ParameterSet const&, SimTrackManager const*, SimActivityRegistry&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreSensitiveDetector.so
#10 0x00002b977bc77673 in RunManagerMTWorker::initializeG4(RunManagerMT*, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreApplication.so
#11 0x00002b977bc25266 in OscarMTProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginSimG4CoreApplicationPlugins.so


Thread 3 (Thread 0x2b9781b0b700 (LWP 6757)):
#0  0x00002b97506c9a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b975029185c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_pre6-slc7_amd64_gcc900/build/CMSSW_11_1_0_pre6-build/BUILD/slc7_amd64_gcc900/external/gcc/9.3.0/gcc-9.3.0/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:865
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00002b977bc652bb in std::thread::_State_impl<std::thread::_Invoker<std::tuple<OscarMTMasterThread::OscarMTMasterThread(edm::ParameterSet const&)::$_0> > >::_M_run() () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreApplication.so
#4  0x00002b9750296af0 in std::execute_native_thread_routine (__p=0x2b9777c9bba0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#5  0x00002b97506c5ea5 in start_thread () from /lib64/libpthread.so.0
#6  0x00002b97509d896d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2b97525d6d00 (LWP 3644)):
#2  0x00002b9756a82260 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b9750a26f1d in __memcpy_ssse3 () from /lib64/libc.so.6
#5  0x00002b974e403252 in edmplugin::PluginFactoryBase::registerPMaker(void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#6  0x00002b97bf7de1a9 in _GLOBAL__sub_I_module.cc () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginSimG4CMSTrackerPlugins.so
#7  0x00002b974dd689c3 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#8  0x00002b974dd6d59e in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#9  0x00002b974dd687d4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#10 0x00002b974dd6cb8b in _dl_open () from /lib64/ld-linux-x86-64.so.2
#11 0x00002b974fda2fab in dlopen_doit () from /lib64/libdl.so.2
#12 0x00002b974dd687d4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#13 0x00002b974fda35ad in _dlerror_run () from /lib64/libdl.so.2
#14 0x00002b974fda3041 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#15 0x00002b974e409666 in edmplugin::SharedLibrary::SharedLibrary(std::filesystem::__cxx11::path const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#16 0x00002b974e40651e in edmplugin::PluginManager::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#17 0x00002b974e402596 in edmplugin::PluginFactoryBase::findPMaker(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#18 0x00002b977bc9b907 in AttachSD::create(edm::EventSetup const&, SensitiveDetectorCatalog const&, edm::ParameterSet const&, SimTrackManager const*, SimActivityRegistry&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreSensitiveDetector.so
#19 0x00002b977bc77673 in RunManagerMTWorker::initializeG4(RunManagerMT*, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreApplication.so
#20 0x00002b977bc25266 in OscarMTProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginSimG4CoreApplicationPlugins.so

Current Modules:

Module: OscarMTProducer:g4SimHits (crashed)
Module: OscarMTProducer:g4SimHits
Module: OscarMTProducer:g4SimHits
Module: OscarMTProducer:g4SimHits

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions