-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for groups #84
Comments
One thing I realized about this is that concatenating multiple Adding vdt1 = open_virtual_datatree('file1.nc')
vdt1 = open_virtual_datatree('file2.nc') but currently you can't do combined_vdt = xr.concat([vdt1, vdt1], dim='time') because from datatree import map_over_subtree
concat_datatrees = map_over_subtree(xr.concat)
combined_vdt = concat_datatrees([vdt1, vdt1], dim='time') but it raises the question of whether the xarray DataTree upstream integration should include generalizing |
Hi @TomNicholas I often have the problem that I want to concat different datatrees into a single xr.Dataset again. Generate some sample data: import datatree
import xarray as xr
ds1 = xr.Dataset(
data_vars=dict(a=("x", [11, 22, 33])),
coords=dict(x=[1,2,3])
)
ds2 = xr.Dataset(
data_vars=dict(a=("x", [111, 222, 333])),
coords=dict(x=[1,2,3])
)
mytree = datatree.DataTree.from_dict({"two_digits": ds1, "three_digits": ds2})
print(mytree) output:
I tried: # fails
concat_datatrees = datatree.map_over_subtree(xr.concat)
ds_concatenated = concat_datatrees(mytree, dim='digits')
# also fails:
# ds_concatenated = concat_datatrees([mytree.two_digits, mytree.three_digits], dim='digits') output:
this would work, but it is not really nice: # works
ds_concatenated = xr.concat([mytree[subtree].ds for subtree in mytree], dim="digits")
print(ds_concatenated) output:
Do you have a suggestion of how to deal with such situations? |
Thanks for trying this @jonas-spaeth ! I now realise that I didn't think hard enough before making this suggestion 😅 This is an issue with xarray-datatree, not with virtualizarr at all, so I will re-raise this on the xarray repo instead and we can continue discussion there.
I think your first attempt is just an incorrect use of the
This is more troubling, but I've realised why it doesn't work. Basically But I never thought to make We might imagine changing
For now I think this is your only option. |
Note that pydata/xarray#9077 might affect this - if datatree becomes slightly less general then there could in theory be some netcdf files that cannot be opened as a |
Also this issue wouldn't really be closed until we also have the ability to do EDIT: Serialization of DataTree objects is tracked separately in #244 |
We should support generating references from files containing multiple groups in the same way that
xr.open_dataset
anddatatree.open_datatree
work.So we should add a new
open_virtual_datatree
function, and a new (optional)group
kwarg toopen_virtual_dataset
.This can be done right now using the datatree package (as an optional dependency imported inside
open_virtual_datatree
) but once that gets merged into xarraymain
(which is happening right now) we can get rid of that dependency.See https://github.com/TomNicholas/VirtualiZarr/issues/78#issuecomment-2059737479 and #11.
cc @sharkinsspatial
The text was updated successfully, but these errors were encountered: