Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigations in using GPU for pyresample #630

Open
mraspaud opened this issue Nov 4, 2024 · 4 comments
Open

Investigations in using GPU for pyresample #630

mraspaud opened this issue Nov 4, 2024 · 4 comments

Comments

@mraspaud
Copy link
Member

mraspaud commented Nov 4, 2024

Triggered by the recent availability of GPUs in computing resources I had access too, I started investigation how feasible it would be to use the GPU to speed up resampling of satellite imagery.

In order for others to see how far we are on this, I thought I would open this issue to have some visibility of the investigations and work that has been done here. Feel free to complement with further investigations in the comments.

Cuproj, transforming coordinates

One requirement to be able to resample is to have the possibility to convert coordinates, as we do with pyproj at the moment.
The rapidsai project has a cuproj library, that provide equivalent interface. However they only provide support for epsg:4326 and utm-based projection.

Testing this shows about a factor 100 speed up with the gpu.

print("Numpy/CPU")
import numpy as np
from pyproj.transformer import Transformer


lons, lats = np.meshgrid(np.linspace(-180, 180, 20000), np.linspace(-90, 90, 10000))

print("creating transform")
tr = Transformer.from_crs(4326, 32630)
print("transforming")
x, y = tr.transform(lons, lats)

and

print("Cupy/GPU")
import cupy as cp
from cuproj.transformer import Transformer


lons, lats = cp.meshgrid(cp.linspace(-180, 180, 20000), cp.linspace(-90, 90, 10000))

print("creating transform")
tr = Transformer.from_crs("EPSG:4326", "EPSG:32630")
print("transforming")
x, y = tr.transform(lons.reshape(-1), lats.reshape(-1))
x = x.reshape((20000, 10000))
y = y.reshape((20000, 10000))

outputs repectively (using time)

Numpy/CPU
creating transform
transforming

real	1m31.581s
user	1m31.588s
sys	0m0.644s

and

Cupy/GPU
creating transform
transforming

real	0m0.981s
user	0m1.503s
sys	0m0.142s

KDTree implementation

Cupy seems to have a GPU-optimized kdtree https://docs.cupy.dev/en/latest/reference/generated/cupyx.scipy.spatial.KDTree.html
However at the time writing, this has not been released yet and would need manual building of cupy to try it out (which I don't have time for right now).

Gradient search

Cupy has the possibility to define custom kernels, where we could implement the gradient search. However, GPUs are good for doing things for each pixel in parallel, so we might need to implement a pixel-wise version of the algorithm. I haven't test this.

@djhoese
Copy link
Member

djhoese commented Nov 4, 2024

On mobile right now so can'tfind the links easily, but I believe there are other GPU issues on this repository and/or pykdtree.

@mraspaud
Copy link
Member Author

mraspaud commented Nov 4, 2024

Indeed, here is a previous issue of yours on the topic of kdtrees #174

@beckernick
Copy link

beckernick commented Nov 19, 2024

I'm not well versed in pyresample but came across this issue due to the RAPIDS references here and in #174.

If nearest neighbors queries are important but you don't need exact neighbor guarantees, we recently implemented nn-descent on the GPU for approximate nearest neighbors.

We've started using it in cuML's UMAP to bring significant performance gains vs. the prior brute force (exact) KNN. Happy to share my info, if potentially relevant.

@mraspaud
Copy link
Member Author

@beckernick thanks a lot for the heads up! We'll definitely check it out, I can think to applications where this is would work if the performance is significantly better than the exact nn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants