-
Notifications
You must be signed in to change notification settings - Fork 77
Run Beersheba's convolve on GPUs using cupy #886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi Stefano! I'm not sure how IC considers the specifications of the computer in use, but I believe you can avoid using a global flag by using the cupy function I'm not too familiar with cupy, but I'm assuming it would require a CUDA toolkit to be installed before use, right? So possibly the installation of such a toolkit should be included when building IC, and a check for a CUDA-compatible GPU during building would be helpful to avoid those without a compatible GPU from installing it. I'm not sure how tricky it would be to check this within bash, but I can have a look. |
|
@jwaiton the global flag is not needed to check if CUDA is available, I am already doing that, it is needed because then inside the code there are a couple of ifs that depend on the presence of CUDA and I wanted to avoid calling |
|
@soleti I think a nice way of doing it would be adding a new check into Edit: checking specifically for a GPU compatible with CUDA is more complicated, I can see a couple of methods but they're platform specific, but I'll keep looking into it. |
|
Passing an environment variable is a possibility, but I want to double check with IC experts if this is a viable solution or there other guidelines for this use case. @gonzaponte ? |
To test what exactly?
We try not to rely on global flags in IC. If the reason to use one is:
I suggest replacing the variable with a function with cached output that checks for that (and can even do imports, etc.). Also, I would make this an opt-in feature, probably controlled from config file. |
That the GPU code is giving correct results.
I am trying to do this, but I am not sure how to do the import in the cached function and make that module available outside the function. The only way I see is to use @lru_cache
def is_gpu_available() -> bool:
'''
Check if GPUs are available, import the necessary libraries and
return True if they are available, False otherwise.
'''
try:
import cupy as cp
... Maybe making
Yes. |
By definition this is not possible, right? We will need to find a way to run the tests in machines with GPUs...
Including the imports in the function was a stupid suggestion from my side, sorry. The function should only check the availability of gpus. Can the libraries be installed in any machine, even in those without a gpu? If so, the try/except clause in the imports can be omitted if the libraries are included in the IC environment. Going back to...
is this slow or are there any other reasons for that? Caching was the solution I proposed because I assumed this was the reason. |
|
Ok this was stuck for a bit and I took the opportunity to finish it since @jwaiton might be interested. It would be good to have an independent tester :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds GPU support for deconvolution operations by integrating Cupy into the convolution/FFT convolution functions.
- Added an optional GPU flag (use_gpu) to the deconvolution and satellite mask functions.
- Introduced an is_gpu_available helper function to check for available GPUs and adjusted array operations accordingly.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| invisible_cities/reco/deconv_functions.py | Added GPU flag support, integrated Cupy functions, and updated array operations. |
| invisible_cities/cities/beersheba.py | Updated deconvolve_signal to accept and propagate the use_gpu parameter to deconvolution. |
Co-authored-by: Copilot <[email protected]>
|
Thanks for updating the PR again, I'll pull the branch and start testing to crosscheck that everything works as expected. I'll also look into a quick comparison soon with how it compares when using larger bin sizes as well (2.5mm bin size increased runtime by factor 20 using CPUs with minimal difference in reconstruction). Do you happen to still have the data from your first plot? For NEW/N100 the number of iterations is on the order of 10s to 100s, where the difference in time complexity may be less drastic. Would be interesting to check 😸 |
|
Oh I didn't use real data, I just generated random arrays to produce the scaling plot... I agree it would be interesting to see it at work on real data, if you have a dataset at hand I think it's the fastest way to verify that. |
I added a functionality which allows to run
convolveandfftconvolveusing Cupy on the GPU, if available.The performance gain starts to be significant only when the image is large (in my tests, when
bin_size< 1, see plot).Two things I am not sure of: