Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle) #44

Closed
ThompsonHe opened this issue Oct 26, 2020 · 0 comments

Comments

@ThompsonHe
Copy link

ThompsonHe commented Oct 26, 2020

Sorry for bothering. I have run the make.sh and it finished successfully. But when i run the test.py , something went wrong. Here is the output imformation:

Traceback (most recent call last):
File "test.py", line 255, in
example_dconv()
File "test.py", line 179, in example_dconv
error.backward()
File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/autograd/function.py", line 77, in apply
return self._forward_cls.backward(self, *args)
File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/autograd/function.py", line 189, in wrapper
outputs = fn(ctx, args)
File "/data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/dcn_v2.py", line 44, in backward
ctx.deformable_groups)
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle) (createCublasHandle at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/cuda/CublasHandlePool.cpp:8)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7fac36e16627 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x4173335 (0x7fac3ccb4335 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: at::cuda::getCurrentCUDABlasHandle() + 0x458 (0x7fac3ccb4c18 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: + 0x416b092 (0x7fac3ccac092 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #4: THCudaBlas_Sgemm + 0x7e (0x7fac3d0b9a3e in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #5: dcn_v2_cuda_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0xe94 (0x7fac1b20e141 in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #6: dcn_v2_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0x9b (0x7fac1b1f987b in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #7: + 0x3f1f1 (0x7fac1b2071f1 in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #8: + 0x3f82e (0x7fac1b20782e in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #9: + 0x3af0e (0x7fac1b202f0e in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)

frame #22: torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x178 (0x7fac68f94468 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #23: + 0x3bd3fb6 (0x7fac3c714fb6 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #24: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node
, torch::autograd::InputBuffer&) + 0x1373 (0x7fac3c711413 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #25: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x4b2 (0x7fac3c712042 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #26: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fac3c70b939 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #27: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fac68f8afaa in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #28: + 0xc819d (0x7fac6887719d in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/../../../libstdc++.so.6)
frame #29: + 0x76ba (0x7fac786076ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #30: clone + 0x6d (0x7fac7833d4dd in /lib/x86_64-linux-gnu/libc.so.6)

Segmentation fault (core dumped)

I don't know how to fix it. Could you please help and give me some ideas? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant