Skip to content

running through CUDA OutOfMemory error #1058

Open
@M0E313

Description

@M0E313

I'm always getting cuda OutOfMemory error :

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.50 GiB. GPU 0 has a total capacity of 21.99 GiB of which 6.62 GiB is free. Including non-PyTorch memory, this process has 15.35 GiB memory in use. Of the allocated memory 6.15 GiB is allocated by PyTorch, and 8.71 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (
https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I already reduced batch size and I placed torch.cuda.empty_cache() everywhere in my script, but still not enough...

### I'm using :

pip list | grep cuda

nvidia-cuda-cupti-cu11 11.8.87
nvidia-cuda-nvrtc-cu11 11.8.89
nvidia-cuda-runtime-cu11 11.8.89

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

pip list | grep torch

pytorch-lightning 2.1.2
pytorch-triton 3.0.0+989adb9a29
torch 2.2.1+cu118
torchaudio 2.2.1+cu118
torchmetrics 1.3.2
torchvision 0.17.1+cu118

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions