Skip to content

[GPU] Multiple RelVals failing with memory allocation error #43866

@aandvalenzuela

Description

@aandvalenzuela

Hello,

There are multiple RelVals failing with the following exception in GPU IBs:

----- Begin Fatal Exception 05-Feb-2024 04:19:50 CET-----------------------
An exception of category 'StdException' occurred while
   [0] Processing  Event run: 366727 lumi: 89 event: 131642946 stream: 3
   [1] Running path 'MC_Run3_PFScoutingPixelTracking_v22'
   [2] Calling method for module HBHERecHitProducerGPU/'hltHbherecoGPU'
Exception Message:
A std::exception was thrown.

/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/5569e690981e3c5d49d7743adaadedca/opt/cmssw/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_GPU_X_2024-02-04-2300/src/HeterogeneousCore/CUDAUtilities/src/CachingDeviceAllocator.h, line 489:
cudaCheck(error = cudaMalloc(&search_key.d_ptr, search_key.bytes));
cudaErrorMemoryAllocation: out of memory
----- End Fatal Exception -------------------------------------------------

It seems caused by modifications in #43804.

FYI, @iarspider

Thanks,
Andrea

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions