-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DataTree for EchoData #611
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #611 +/- ##
==========================================
- Coverage 78.66% 78.41% -0.26%
==========================================
Files 40 42 +2
Lines 3600 3743 +143
==========================================
+ Hits 2832 2935 +103
- Misses 768 808 +40
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
DemoOpen Raw
from echopype import open_raw
echodata = open_raw('echopype/test_data/ek60/ncei-wcsd/Summer2017-D20170615-T190214.raw', sonar_model='EK60')
In [5]: print(repr(echodata))
EchoData: standardized raw data from Internal Memory
> top: (Top-level) contains metadata about the SONAR-netCDF4 file format.
> environment: (Environment) contains information relevant to acoustic propagation through water.
> platform: (Platform) contains information about the platform on which the sonar is installed.
> nmea: (Platform/NMEA) contains information specific to the NMEA protocol.
> provenance: (Provenance) contains metadata about how the SONAR-netCDF4 version of the data were obtained.
> sonar: (Sonar) contains specific metadata for the sonar system.
> beam: (Sonar/Beam_group1) contains backscatter data (either complex samples or uncalibrated power samples) and other beam or channel-specific data, including split-beam angle data when they exist.
> vendor: (Vendor specific) contains vendor-specific information about the sonar and the data.
In [6]: print(echodata)
EchoData: standardized raw data from Internal Memory
DataTree('Top-level', parent=None)
│ Dimensions: ()
│ Data variables:
│ *empty*
│ Attributes:
│ conventions: CF-1.7, SONAR-netCDF4-1.0, ACDD-1.3
│ keywords: EK60
│ sonar_convention_authority: ICES
│ sonar_convention_name: SONAR-netCDF4
│ sonar_convention_version: 1.0
│ summary:
│ title:
│ date_created: 2017-06-15T19:02:14Z
│ survey_name:
├── DataTree('Environment')
│ Dimensions: (frequency: 3, ping_time: 19)
│ Coordinates:
│ * frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05
│ * ping_time (ping_time) datetime64[ns] 2017-06-15T19:02:14.20...
│ Data variables:
│ absorption_indicative (frequency, ping_time) float64 0.002226 ... 0.04069
│ sound_speed_indicative (frequency, ping_time) float64 1.507e+03 ... 1.50...
├── DataTree('Platform')
│ │ Dimensions: (location_time: 72, frequency: 3, ping_time: 19)
│ │ Coordinates:
│ │ * location_time (location_time) datetime64[ns] 2017-06-15T19:02:15.4450001...
│ │ * frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05
│ │ * ping_time (ping_time) datetime64[ns] 2017-06-15T19:02:14.206000128 ....
│ │ Data variables:
│ │ latitude (location_time) float64 dask.array<chunksize=(72,), meta=np.ndarray>
│ │ longitude (location_time) float64 dask.array<chunksize=(72,), meta=np.ndarray>
│ │ sentence_type (location_time) <U3 dask.array<chunksize=(72,), meta=np.ndarray>
│ │ pitch (frequency, ping_time) float64 dask.array<chunksize=(3, 19), meta=np.ndarray>
│ │ roll (frequency, ping_time) float64 dask.array<chunksize=(3, 19), meta=np.ndarray>
│ │ heave (frequency, ping_time) float64 dask.array<chunksize=(3, 19), meta=np.ndarray>
│ │ water_level (frequency, ping_time) float64 dask.array<chunksize=(3, 19), meta=np.ndarray>
│ └── DataTree('NMEA')
│ Dimensions: (location_time: 688)
│ Coordinates:
│ * location_time (location_time) datetime64[ns] 2017-06-15T19:02:14.2059996...
│ Data variables:
│ NMEA_datagram (location_time) <U73 '$SDVLW,2.084,N,2.084,N' ... '$INHDT,...
│ Attributes:
│ description: All NMEA sensor datagrams
├── DataTree('Provenance')
│ Dimensions: ()
│ Data variables:
│ *empty*
│ Attributes:
│ conversion_software_name: echopype
│ conversion_software_version: 0.5.6.dev42+gebb160b.d20220331
│ conversion_time: 2022-03-31T23:48:28Z
│ src_filenames: echopype/test_data/ek60/ncei-wcsd/Summer201...
│ duplicate_ping_times: 0
├── DataTree('Sonar')
│ │ Dimensions: (beam_group: 1)
│ │ Dimensions without coordinates: beam_group
│ │ Data variables:
│ │ beam_group_name (beam_group) <U11 'Beam_group1'
│ │ beam_group_descr (beam_group) <U131 'contains backscatter power (uncalib...
│ │ Attributes:
│ │ sonar_manufacturer: Simrad
│ │ sonar_model: ER60
│ │ sonar_serial_number:
│ │ sonar_software_name:
│ │ sonar_software_version: 2.4.3
│ │ sonar_type: echosounder
│ └── DataTree('Beam_group1')
│ Dimensions: (frequency: 3, ping_time: 19,
│ range_sample: 3888)
│ Coordinates:
│ * frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05
│ * ping_time (ping_time) datetime64[ns] 2017-06-15T19:...
│ * range_sample (range_sample) int64 0 1 2 ... 3886 3887
│ Data variables: (12/30)
│ channel_id (frequency) <U37 'GPT 18 kHz 009072058c8...
│ beam_type (frequency) int64 1 1 1
│ beamwidth_receive_alongship (frequency) float64 10.3 6.8 7.3
│ beamwidth_receive_athwartship (frequency) float64 10.3 6.8 7.2
│ beamwidth_transmit_alongship (frequency) float64 10.3 6.8 7.3
│ beamwidth_transmit_athwartship (frequency) float64 10.3 6.8 7.2
│ ... ...
│ data_type (frequency, ping_time) float64 3.0 ... 3.0
│ count (frequency, ping_time) float64 3.888e+03 ...
│ offset (frequency, ping_time) float64 0.0 ... 0.0
│ transmit_mode (frequency, ping_time) float64 0.0 ... 0.0
│ angle_athwartship (frequency, ping_time, range_sample) float64 ...
│ angle_alongship (frequency, ping_time, range_sample) float64 ...
│ Attributes:
│ beam_mode: vertical
│ conversion_equation_t: type_3
└── DataTree('Vendor')
Dimensions: (frequency: 3, pulse_length_bin: 5)
Coordinates:
* frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05
* pulse_length_bin (pulse_length_bin) int64 0 1 2 3 4
Data variables:
sa_correction (frequency, pulse_length_bin) float64 0.0 -0.83 ... -0.34
gain_correction (frequency, pulse_length_bin) float64 20.3 23.35 ... 26.62
pulse_length (frequency, pulse_length_bin) float64 0.000512 ... 0.00... Open Converted
from echopype import open_raw
echodata = open_converted('echopype/test_data/ek60/ncei-wcsd/Summer2017-D20170615-T190214.zarr')
# Note that without the backward compatibility current version missed the beam group!!!
In [18]: print(repr(echodata))
EchoData: standardized raw data from /home/lsetiawan/GitRepos/GitHub/echopype/echopype/test_data/ek60/ncei-wcsd/Summer2017-D20170615-T190214.zarr
> top: (Top-level) contains metadata about the SONAR-netCDF4 file format.
> environment: (Environment) contains information relevant to acoustic propagation through water.
> platform: (Platform) contains information about the platform on which the sonar is installed.
> nmea: (Platform/NMEA) contains information specific to the NMEA protocol.
> provenance: (Provenance) contains metadata about how the SONAR-netCDF4 version of the data were obtained.
> sonar: (Sonar) contains specific metadata for the sonar system.
> vendor: (Vendor specific) contains vendor-specific information about the sonar and the data.
In [19]: print(echodata)
EchoData: standardized raw data from /home/lsetiawan/GitRepos/GitHub/echopype/echopype/test_data/ek60/ncei-wcsd/Summer2017-D20170615-T190214.zarr
DataTree('Top-level', parent=None)
│ Dimensions: ()
│ Data variables:
│ *empty*
│ Attributes:
│ conventions: CF-1.7, SONAR-netCDF4-1.0, ACDD-1.3
│ date_created: 2017-06-15T19:02:14Z
│ keywords: EK60
│ sonar_convention_authority: ICES
│ sonar_convention_name: SONAR-netCDF4
│ sonar_convention_version: 1.0
│ summary:
│ survey_name:
│ title:
├── DataTree('Beam')
│ Dimensions: (frequency: 3, ping_time: 19,
│ range_bin: 3888)
│ Coordinates:
│ * frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05
│ * ping_time (ping_time) datetime64[ns] 2017-06-15T19:...
│ * range_bin (range_bin) int64 0 1 2 3 ... 3885 3886 3887
│ Data variables: (12/30)
│ angle_alongship (frequency, ping_time, range_bin) float64 ...
│ angle_athwartship (frequency, ping_time, range_bin) float64 ...
│ angle_offset_alongship (frequency) float64 ...
│ angle_offset_athwartship (frequency) float64 ...
│ angle_sensitivity_alongship (frequency) float64 ...
│ angle_sensitivity_athwartship (frequency) float64 ...
│ ... ...
│ transducer_offset_y (frequency) float64 ...
│ transducer_offset_z (frequency) float64 ...
│ transmit_bandwidth (frequency, ping_time) float64 ...
│ transmit_duration_nominal (frequency, ping_time) float64 ...
│ transmit_mode (frequency, ping_time) float64 ...
│ transmit_power (frequency, ping_time) float64 ...
│ Attributes:
│ beam_mode: vertical
│ conversion_equation_t: type_3
├── DataTree('Environment')
│ Dimensions: (frequency: 3, ping_time: 19)
│ Coordinates:
│ * frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05
│ * ping_time (ping_time) datetime64[ns] 2017-06-15T19:02:14.20...
│ Data variables:
│ absorption_indicative (frequency, ping_time) float64 ...
│ sound_speed_indicative (frequency, ping_time) float64 ...
├── DataTree('Platform')
│ │ Dimensions: (frequency: 3, ping_time: 19, location_time: 72)
│ │ Coordinates:
│ │ * frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05
│ │ * location_time (location_time) datetime64[ns] 2017-06-15T19:02:15.4450001...
│ │ * ping_time (ping_time) datetime64[ns] 2017-06-15T19:02:14.206000128 ....
│ │ Data variables:
│ │ heave (frequency, ping_time) float64 ...
│ │ latitude (location_time) float64 ...
│ │ longitude (location_time) float64 ...
│ │ pitch (frequency, ping_time) float64 ...
│ │ roll (frequency, ping_time) float64 ...
│ │ sentence_type (location_time) <U3 ...
│ │ water_level (frequency, ping_time) float64 ...
│ └── DataTree('NMEA')
│ Dimensions: (location_time: 688)
│ Coordinates:
│ * location_time (location_time) datetime64[ns] 2017-06-15T19:02:14.2059996...
│ Data variables:
│ NMEA_datagram (location_time) <U73 ...
│ Attributes:
│ description: All NMEA sensor datagrams
├── DataTree('Provenance')
│ Dimensions: ()
│ Data variables:
│ *empty*
│ Attributes:
│ conversion_software_name: echopype
│ conversion_software_version: 0.4.1.dev438+g7aa2cd0.d20210409
│ conversion_time: 2021-04-16T17:48:07Z
│ src_filenames: ./echopype/test_data/ek60/ncei-wcsd/Summer2...
├── DataTree('Sonar')
│ Dimensions: ()
│ Data variables:
│ *empty*
│ Attributes:
│ sonar_manufacturer: Simrad
│ sonar_model: ER60
│ sonar_serial_number:
│ sonar_software_name:
│ sonar_software_version: 2.4.3
│ sonar_type: echosounder
└── DataTree('Vendor')
Dimensions: (frequency: 3, pulse_length_bin: 5)
Coordinates:
* frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05
* pulse_length_bin (pulse_length_bin) int64 0 1 2 3 4
Data variables:
gain_correction (frequency, pulse_length_bin) float64 ...
pulse_length (frequency, pulse_length_bin) float64 ...
sa_correction (frequency, pulse_length_bin) float64 ... |
From slack conversations, access need to change to be like @b-reyes asked:
My current thoughts:
|
Regarding netcdf4 group path references, including (especially!) the "root group": First, a distinction. The encoding in groups:
top:
name: Top-level
description: contains metadata about the SONAR-netCDF4 file format.
ep_group:
...
beam_power:
name: Sonar/Beam_group2
description: >-
contains backscatter power (uncalibrated) and other beam or channel-specific data,
including split-beam angle data when they exist.
Only exists if complex backscatter data they already in Sonar/Beam_group1
ep_group: Sonar/Beam_group2 Second, the root group a special group. It can have all the elements of a group (variables, dimensions, other groups), but it's path is simply "/". There's no group name per se. When you open a netCDF4 file with the Python netcdf4 package using
So, at one level, the root group is just another group. At another, it's a very special group with no name other than "/" (but that's its path and not really a name). SONAR-netCDF4 describes it in a way that's completely parallel to all other groups. The conventions gives labels to all groups, b/c it's convenient, and the label for the root group is "Top-level". As you can see in netcdf4 an explicit analogy is made between group paths and unix directory paths. But I'd say its implementation is a bit loose. In xarray Whew. The Could this arrangement cause confusion? Sure. But does have its own self consistency and clear motivations and benefits? Absolutely. One possible change we could make, to be more explicit, is to set the top-level |
That's awesome, @lsetiawan ! Both repr's look good. My only comment is about indentation, especially for the Beam_groupX groups. For both the text and HTML repr's, b/c the group description is long the text wraps into a new line. But the wrapped line is not indented, so it looks awkward and breaks the visual arrangement a bit. For the html repr is a small, minor effect, b/c there's still an overall indentation. For the text repr it's a bigger impact. |
Yea... that's because of my small screen... I'm not sure how to fix that. It's the browser/window auto line wrap. If you or brandon have any suggestion, please let me know. I'm at a loss on that one 😅 It almost seems like somehow the code needs to figure out the window size in realtime and create line breaks in the text to make it work. |
Well, the small screen exposes the wrapping issue. But the wrapping issue is there. Looking at the example of the current (0.5.x) text repr you included earlier, the long text doesn't wrap; it just extends beyond the size of the window. Is that just the default markdown block behavior vs the behavior in a terminal? One alternative is to clip the line at a certain maximum length. That's done in the DataTree text repr (you also included an example above last week), where the line gets clipped and a "..." string is added. Anyway, I'm not saying I think that's a better option. Just an alternative to consider. |
I'll take a look at the css and see if I can make adjustments there. I think the
That example is the default behaviour of the markdown block. If you try this on a terminal, it will wrap. |
Maybe wait to see what @b-reyes and @leewujung think? I think I prefer the ... behavior, but I could go either way. |
I don't think the lack of indent is a problem in the HTML repr since the arrowhead is clear on what the expected behavior is. I agree this is a bigger problem for the text repr, but I think terminal in general has that behavior and it is expected by the user. Going with I would advocate for getting this merged and shelve the text repr "problem" to a separate issue that would potentially to be fixed in the future. |
@lsetiawan what happens if you throw a newline in the string? |
That would be fine with me, too |
Yea. I think this will take more thinking in terms of the repr behavior. And probably better to put it to a separate PR since this PR is getting really big. |
I just realized that when I generated a screenshot of the spacing between groups in the HTML repr just now, I ended up running my set of "tests" on this PR! That's because I ran the notebook where I use the datatree accessors in different ways. SO, I can say that this PR is good to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woo-hoo!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lsetiawan thank you for all of your great work Don!
Two items that I think need to be reviewed:
-
There are a couple of places that are trying to import from echopype. However, these references seem to be incorrect and will attempt to import using whatever echopype is in your environment. Please see my suggested changes.
-
In a past meeting, I think we talked about changing the name
group_paths
togroup_names
or something like that.
Co-authored-by: b-reyes <[email protected]>
for more information, see https://pre-commit.ci
I think we are good to go now. Once I have @b-reyes seal of approval. I will merge 😄 |
Co-authored-by: b-reyes <[email protected]>
This PR implements
xarray-datatree
for the underlying EchoData structure to ease access. This work is part of issue #567 and #606TODO
echodata['Top-level']
orechodata['/']
.Add group paths listing with/