-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backward/forward compatibility wrt v0.6.0 group structure/variable name change #606
Comments
Thank you @leewujung for putting this together! I have been working on this conversion problem, so I am happy to work on several of these items. I think that once we establish the sensor-specific files in #596, then we can proceed with making the code forward compatible. The following is the first step I would take (input is welcome) to address this issue. Note that this is primarily focused on changes to downstream items related to
From my understanding, if we end up implementing datatree, we would only need to modify the above items slightly. |
Wow, thank you @leewujung !! This is really helpful.
I think what you describe as forward compatibility is what we had agreed on. That is, to temporarily retain (at least in 0.6.0) the current When using I think there are still challenges we need to work through, and let's discuss them at our meeting tomorrow. For example, all these group mappings and variable renamings (and other changes -- see below) have a consequence I hadn't thought about:
Actually, there's a third type that's related to "variable name changes" but a lot more consequential: changes to a dimension/coordinate that go beyond just a name change. Specifically:
|
I am not sure "what it takes" to rename the coordinate variables (i.e. does the renaming operation requires actually loading data into memory?), but this part to hook up the v0.6.0 groups and subgroups to our current flattened For the third type of changes -- Thanks! I agree this is a different type that should be listed to the main issue description! I'll add it later today.
This is true, but I think the processing of the other sonar models can be achieve without much disturbance of the code by using
Agreed. I think this will be a tedious PR thought I don't see immediately major obstacles. |
100% correct. I'm glad you pointed it out, since it's easy (for me at least) to get lost in the details.
Oh, that's brilliant!!! I hadn't thought of that. Hopefully it will pan out that way. |
Thank you for pointing this out @emiliom, I will be sure to keep this in mind. Perhaps I can lay the groundwork for this forward compatibility by just implementing the restructuring of the groups and changing |
That will be great! This will also let us know if renaming a coordinate requires loading the coordinate data into memory. But I think even if it is, the penalty may not be a problem for the size of data files we are talking about at the moment -- until we fix the problem with |
That sounds great, @b-reyes ! Specially the range_bin > range_sample conversion (in |
I suggest that we talk this through and agree on who will do what in the call tomorrow morning. There are 2 steps to consider:
|
@emiliom yes, once we settle #596 I will start working on this, I plan to follow the outline I created above. I will be sure to discuss this with @lsetiawan in the meeting today. |
Done! |
Context
We are coming up against multiple planned breaking changes in v0.6.0. The changes include netCDF group structures as well as coordinates and variable names and attributes.
We know that many users, including both OOI and NCEI have converted some non-trivial volume of data into zarr or nc using echopype v0.5.x, so it is imperative that all processing can accommodate these already converted data.
In light of the group structure changes, we also want to ease the transition for ourselves such that we do not disturb the entire codebase significantly all at once (so that we can focus on convention compliance for v0.6.0).
Problems and Approaches
There are two types of changes we need to deal with:
Group structure changes
After discussions (#567) we concluded that a cleaner approach is for us to use DataTree functionality to allow accessing the data like a dictionary or like in the netcdf4 library, such as
EchoData["Sonar/Beam_groupX"]
, because our current attribute setup would requireEchoData.sonar.beam_group1
which can be quite involved to implement. The downside to this are:open_tree
on v0.5.x data, the groups will not be in the right placeThe last one requires some more discussion, so I wonder if an intermediate step we can take in v0.6.0 is to avoid this (for now) by making our code forward compatible to the v0.6.0 data:
Basically, we would hook up:
EchoData.beam
toroot/Sonar/Beam_group1
EchoData.beam_power
(if exists) toroot/Sonar/Beam_group2
EchoData.sonar
still toroot/Sonar
(no action)EchoData.nmea
still toroot/Platform/NMEA
(no action)This way all the downstream functions do not have to change in v0.6.0, and can be dealt with later once we figure out how to use DataTree to handle v0.5.x data.
Variable name changes
For these changes, I propose that instead of taking the forward compatibility route as above, here we actually change the variable names in the code once the correct groups are read into xarray datasets.
For example, we would:
range_bin
--> immediately change that torange_sample
--> change all the downstream processing that usesrange_bin
also torange_sample
frequency
--> immediately change that tochannel
--> change all the downstream processing that usesrange_bin
also torange_sample
Decision?
I want to list out all the changes we need to make in v0.6.0 here, but that requires that we decide what approach to take for the group structure changes problem. So, pinging @emiliom @b-reyes @lsetiawan for inputs!
Once we arrive at a decision I'll add specific tasks to this issue.
The text was updated successfully, but these errors were encountered: