-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask reproject #845
base: master
Are you sure you want to change the base?
Dask reproject #845
Conversation
this is mostly just copy-pasted at barely-past-the-idea stage. @astrofrog any chance you (or aperio folks) could find some time to work on this? partly, I'm wondering if we should be pushing the "don't put everything in memory" part into reproject or leave it here in spectral-cube? |
6def462
to
07a12b1
Compare
FYI I am currently working on dask support in reproject - more soon! |
07a12b1
to
0cc1125
Compare
@astrofrog can you help me out here - do we need this in spectral-cube or can spectral-cube just call reproject and not worry about it? |
@keflavich - right now I've added the ability in reproject to use dask behind the scenes to operate over data chunks, but I haven't yet made it possible to pass in/get out dask arrays. It shouldn't be too difficult to do that though, so if you are willing to wait a little longer, then it will be possible to just pass the dask array to reproject and get a dask array back with the reprojected cube. I can try and prioritize that for the next couple of weeks. |
Just curious, would you actually want the reprojection to be lazily done? That is, creating the new cube would be 'instantaneous' and reprojection would only happen as parts of the cube are accessed. Or would you want the reprojection to actually be computed in full? |
I'm not entirely sure yet - I'm working this out as I go. The active use case is:
#868 is my attempt to make this a bit possible ideally under the hood we'd reproject to the minimal overlap region... where should that step be done? |
@keflavich - what do you mean by minimal overlap region? The mosaic WCS, or the subset of the mosaic WCS that overlaps with each cube? |
I mean the corner-finding that is done in I'm refactoring to use coadd.py with some additions: astropy/reproject#351 |
Do you mean they have >1000 pixels on each side? Do you really mean 1Tb/cube or 1Gb/cube? |
output cube target size is 27800 x 10200 x 1000. Probably will limit 3rd dimension to ~200 pixels in some cases. |
I revised the original definition of the cube sizes above in reply to #845 (comment) |
@keflavich ok thanks! I have started work on supporting dask inputs to reproject functions, will open a draft PR soon. |
However, feel free to also try on your side if you want, I will keep an eye on your PR :) |
I think we need both of these together, no? I'm adding 3D capability to |
ah, no, you're talking about _this_PR. Yeah, I'm not going to work on this while you are - I'm focused on the other support infrastructure for now |
I did mean the reproject PR not this one - yes our work should end up being complementary! (I will just focus on the individual reproject functions not coadd) |
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
@astrofrog I'm trying to test some of the upstream dask changes, but I need this PR as a frontend to those. I've run into a problem - the def reproject_interp_wrapper(img_slice, **kwargs):
# What exactly is the wrapper getting here?
# I think it is given a _cube_ that is a cutout?
# No, it is getting dask arrays (at least sometimes)
if filled:
data = img_slice.filled_data[:]
else:
data = img_slice._data
return reproject_interp((data, img_slice.header), is wrong - So I think this PR is totally wrong, taking the wrong general approach - instead, perhaps we should just directly call |
Yes I think the intermediate wrappers pre date some of the reproject improvements. We should be able to just call reproject with the full dask array and target WCS. |
e21b153
to
939bac6
Compare
@keflavich - I've rebased and pushed an additional commit that I think makes the |
939bac6
to
df32208
Compare
Ok so with astropy/reproject#390 I can get things to run. However, this is currently not optimal for two reasons:
It might be better to be pragmatic here and actually carry out the reprojection inside the reproject function rather than delaying it, at least by default, because it gives us more control over the points above. I'll try and work on this a little more shortly. |
Yes, I agree - the default should be least likely to cause harm on the user's computer (i.e., unlikely to fill up their HDs with tmpdir junk, unlikely to just crash), and exceptional cases needed for performance/other reasons should be described in the docs |
df32208
to
259dc4d
Compare
Agreed with @astrofrog on computing reprojection inside the reproject function #845 (comment). The way Adding onto that point, I think rechunking needs to be computed ahead of reprojection for the same reason (see astropy/reproject#485). It can be done internally within the function, just not "delayed" into the actual reprojection. |
WIP: trying to get dask reprojection working.