-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Region (...) does not align with Zarr chunks (). #644
Comments
Thanks for raising an issue @ghislainp. Any chance you can share the input list of netcdf files used to create the file pattern? |
Sure, you can download the data from here: https://filesender.renater.fr/?s=download&token=17666c2e-d738-4447-b338-406315b08aae The link is valid for 2 weeks. |
This is almost certainly due to the presence of coordinates in the data variables. I know there are other similar issues but I can't find them. Anything in the data variables with a |
Thanks for the files @ghislainp. I moved them to a temp s3 bucket and added a transform to drop the offending dims/vars to get a working example. Hope this helps. import apache_beam as beam
import pandas as pd
from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.transforms import OpenURLWithFSSpec, OpenWithXarray, StoreToZarr
year_list = [2004,2005,2006]
def make_url(time):
return f"s3://carbonplan-scratch/pgf/melt-AMSRU-Antarctic-{time}-12km.nc"
concat_dim = ConcatDim("time", year_list)
pattern = FilePattern(make_url, concat_dim)
from pangeo_forge_recipes.transforms import Indexed, T
class DropDims(beam.PTransform):
@staticmethod
def _drop_dims(item: Indexed[T]) -> Indexed[T]:
index, ds = item
ds = ds.drop_dims('nv')
ds = ds[['snow_status_wet_dry_19H_ASC_raw',
'snow_status_wet_dry_19H_ASC_filter',
'snow_status_wet_dry_19H_DSC_raw',
'snow_status_wet_dry_19H_DSC_filter']]
return index, ds
def expand(self, pcoll: beam.PCollection) -> beam.PCollection:
return pcoll | beam.Map(self._drop_dims)
recipe = (
beam.Create(pattern.items())
| OpenURLWithFSSpec()
| OpenWithXarray(file_type=pattern.file_type, xarray_open_kwargs={"decode_coords": "all"})
| DropDims()
| StoreToZarr(
combine_dims=pattern.combine_dim_keys,
target_root='.',
store_name='out.zarr',
)
)
with beam.Pipeline() as p:
p | recipe |
Thank you. I also obtained the same effect by removing the variables manually with NCO... However, is there a way to improve StoreToZarr to recover the previous behavior of XarrayZarrRecipe that was dealing correctly with these variables not depending on the combine dim ? |
I don't think you have to drop all these variables. Just move them to coords instead of data variables.
Are you sure about that? In the previous version, would |
I think I also ran into the same issue over at LEAP. ...
| OpenWithXarray(xarray_open_kwargs={'preprocess':lambda ds: ds.set_coords(['list', 'of', 'offending', 'coords'])})
... In these relatively simple cases I wonder if we can provide a much more helpful error message by catching the
|
After a bit of snooping around in various feedstocks I was able to cobble together a solution that seems to do the job when this became an issue for me (forgive all the comments which I was doing to help myself figure it out!):
|
Great to see this worked for you @mattjbr123. |
I'm trying to merge netcdf file into a single Zarr.
The recipe is:
and the pattern = pattern_from_file_sequence(ncfiles, 'time')
the structure of the ncfiles is:
I get the error: ValueError: Region (slice(0, 365, None), slice(None, None, None), slice(None, None, None)) does not align with Zarr chunks (402, 462).
It seems that StoreToZarr tries to use the 'time' dimension to merge variables that do not depend on time.
When I remove the variable lat, lon, bounds_lat, bounds_lon, it works fine.
How can I solve this problem ? I had not such a problem with XarrayZarrRecipe
The text was updated successfully, but these errors were encountered: