Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/obs interpolated position #104

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jerabaul29
Copy link
Collaborator

Following recurrent user requests, this PR:

  • adds a robust function to interpolate messy / "with holes" time series, e.g. gnss positions; the idea is that real world data are messy and there may be NaNs, NaTs, holes, etc. So these are checked quite carefully, and a "robust best attempt" is done: interpolate as NaN if there are no close data points, as "nearest" if there is close data points only on one side in time, as "linear" if there is close data points on both sides in time
  • performs interpolation of the gnss lat and lon to the times of the wave measurements in the omb reader, and include these in the xr produced by the omb reader
  • document this in the in-depth omb example

@jerabaul29
Copy link
Collaborator Author

Looks like the CI test pipeline fails because of something else than this code for now:

Error: This request has been automatically failed because it uses a deprecated version of actions/upload-artifact: v2. Learn more: https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/

@jerabaul29
Copy link
Collaborator Author

(@gauteh are you familiar with this CI error? looks like this is something standard? something on trajan CI in general that could be updated? how do you want to proceed? If you push a fix to main I can rebase on it :) )

@jerabaul29
Copy link
Collaborator Author

Fixes #103 .

@gauteh
Copy link
Member

gauteh commented Sep 18, 2024

Hi Jean,

I'm looking at this now and thinking about how to do this so that we get a consistent set of features and methods in TrajAn. It seems that this functionality is very similar to that of gridtime, and that gridtime perhaps is a special case of interpolating observational data to a time coordinate.

The other question is how much post-processed data to include in these files. The goal should not be a one-liner in trajan in itself. You have to call that one-liner from some python-script anyway, so when you generate data for a particular end-user or purpose it is better to add the specific adaptations there as another line or two and keep the default in trajan as basic as possible. That will make the methods easier to compose into blocks for different purposes. We see in OpenDrift that adding a lot of particular functions or arguments to functions is a never-ending game, the next use case is slightly different and a new function or argument is needed. Better to have the user write a few lines, than one very specific one.

@knutfrode
Copy link
Contributor

Hi!
I dit not study in very much detail, but also get the impression that there is probably quite a bit overlap between existing gridtime and the new interpolate_variable_to_newtimes
And so I share Gautes concern that we might here fall into the trap of Duplication, which is the Root of all Evil:
https://dev.to/ivangavlik/dont-repeat-yourself-practical-advice-at-code-level-1mem
The problem is then that it is very hard to improve on this later, when there will be many internal and external cross-dependencies.

Thus small and well defined methods with very specific and clear task and input-output is desired, rather than larger methods which are very convenient ("oneliner") for a specific task, but just slightly unusable for similar tasks.
The existing gridtime was also made relatively quickly, so it is probably not ideal either.
So maybe we should sit down one day, i.e. a hackaton trying to outline a best strategy as clearly as possible?

@jerabaul29
Copy link
Collaborator Author

(note to myself: this PR currently breaks example_indepth_omb_use.py, this needs to be fixed if this would proceed any further :) )

@jerabaul29
Copy link
Collaborator Author

Hi again! :) Sorry it took a bit of time for me to come back to you!

I understand your comments about code duplication :) .

a few thoughts

  • I am not too fan of using the scipy .interp method (under the hood) for some of our applications, because it "just" performs very "simple / brutal" interpolation. For example, what if there is a hole in a time series for a specific variable because a sensor fails / is not reliable, but .interp silently interpolates wildly "inside" the hole? This could lead to generating bad interpolated data silently (I think, in the current state of things), which is a bit scary to me, and I think that this is not as "unrealistic" as it can sound, given how things are when instruments are deployed in the ocean :) . This is the reason for implementing my "own" .interp method, that ignores data that are outside of a given window around the point in time to which to interpolate, as done by finding data that have neighbors reasonably close before and / or after, and adapting the interpolation as a consequence:

https://github.com/jerabaul29/trajan/blob/8bd2bc4017370b2bf345dc63faf8f3da06cb7674/trajan/utils.py#L54-L72

and related uses of the there_is_a_time_close_after_newtime and there_is_a_time_close_before_newtime.

This will avoid, for example, to automatically interpolate a sensor that had been working intermittently. So instead of interpolating in case of missing data, NaNs will be introduced:

https://github.com/jerabaul29/trajan/blob/8bd2bc4017370b2bf345dc63faf8f3da06cb7674/tests/test_utils.py#L51-L70

I do not think that there is a way to do this with the scipy's interp function?

Would this kind of functionality be interesting for trajan? I agree that this is overkill for, e.g., model data output, but for actual buoy data, this may be useful?

  • another (sort of orthogonal) aspect is that at present gridtime only acts on the obs dimension, not on e.g. obs_waves_imu:
# %%

# generate a trajan dataset
path_to_test_data = Path.cwd().parent / "tests" / "test_data" / "csv" / "omb3.csv"
xr_buoys = read_omb_csv(path_to_test_data)

# %%

xr_buoys

# %%

xr_buoys.traj.gridtime('15min')

# %%

gives:

<xarray.Dataset> Size: 119kB                                
Dimensions:                              (trajectory: 3,                                                                  
                                          frequencies_waves_imu: 55, obs: 150,  
                                          obs_waves_imu: 26)                                                              
Coordinates:                                                                                                              
  * trajectory                           (trajectory) <U9 108B 'drifter_1' .....                                          
  * frequencies_waves_imu                (frequencies_waves_imu) float64 440B ...
Dimensions without coordinates: obs, obs_waves_imu                                                                        
Data variables: (12/14)                                                                                                   
    time                                 (trajectory, obs) datetime64[ns] 4kB ...           
    lat                                  (trajectory, obs) float64 4kB 79.4 ....                            
    lon                                  (trajectory, obs) float64 4kB 3.031 ...                            
    time_waves_imu                       (trajectory, obs_waves_imu) datetime64[ns] 624B ...                              
    accel_energy_spectrum                (trajectory, obs_waves_imu, frequencies_waves_imu) float64 34kB ...
    elevation_energy_spectrum            (trajectory, obs_waves_imu, frequencies_waves_imu) float64 34kB ...
    ...                                   ...                                                                             
    pHs0                                 (trajectory, obs_waves_imu) float64 624B ...
    pT02                                 (trajectory, obs_waves_imu) float64 624B ...
    pT24                                 (trajectory, obs_waves_imu) float64 624B ...
    Hs0                                  (trajectory, obs_waves_imu) float64 624B ...                                     
    T02                                  (trajectory, obs_waves_imu) float64 624B ...                                     
    T24                                  (trajectory, obs_waves_imu) float64 624B ...
Attributes: (12/14)                 
    Conventions:          CF-1.10   
    featureType:          trajectory
    geospatial_lat_min:   74.1878338
    geospatial_lat_max:   79.7326024
    geospatial_lon_min:   2.9766619
    geospatial_lon_max:   7.5205089
    ...                   ...    
    creator_name:         XX:TODO
    creator_email:        XX:TODO
    title:                XX:TODO                                                                                         
    summary:              XX:TODO                            
    creator_institution:  XX:TODO                                                                                         
    history:              created with trajan.reader.omb from a Rock7 Iridium...

and

<xarray.Dataset> Size: 126kB
Dimensions:                              (trajectory: 3, time: 304,
                                          obs_waves_imu: 26,
                                          frequencies_waves_imu: 55)
Coordinates:
  * frequencies_waves_imu                (frequencies_waves_imu) float64 440B ...
  * time                                 (time) datetime64[ns] 2kB 2022-06-15...
  * trajectory                           (trajectory) <U9 108B 'drifter_1' .....
Dimensions without coordinates: obs_waves_imu
Data variables: (12/13)
    lat                                  (trajectory, time) float64 7kB nan ....
    lon                                  (trajectory, time) float64 7kB nan ....
    time_waves_imu                       (trajectory, obs_waves_imu) datetime64[ns] 624B ...
    accel_energy_spectrum                (trajectory, obs_waves_imu, frequencies_waves_imu) float64 34kB ...
    elevation_energy_spectrum            (trajectory, obs_waves_imu, frequencies_waves_imu) float64 34kB ...
    processed_elevation_energy_spectrum  (trajectory, obs_waves_imu, frequencies_waves_imu) float64 34kB ...
    ...                                   ...
    pHs0                                 (trajectory, obs_waves_imu) float64 624B ...
    pT02                                 (trajectory, obs_waves_imu) float64 624B ...
    pT24                                 (trajectory, obs_waves_imu) float64 624B ...
    Hs0                                  (trajectory, obs_waves_imu) float64 624B ...
    T02                                  (trajectory, obs_waves_imu) float64 624B ...
    T24                                  (trajectory, obs_waves_imu) float64 624B ...
Attributes: (12/14)
    Conventions:          CF-1.10
    featureType:          trajectory
    geospatial_lat_min:   74.1878338
    geospatial_lat_max:   79.7326024
    geospatial_lon_min:   2.9766619
    geospatial_lon_max:   7.5205089
    ...                   ...
    creator_name:         XX:TODO
    creator_email:        XX:TODO
    title:                XX:TODO
    summary:              XX:TODO
    creator_institution:  XX:TODO
    history:              created with trajan.reader.omb from a Rock7 Iridium...

possible directions for PRs

  • would there be interest / should I look into extending .gridtime to also work on other kinds of obs? should this be done automatically or user-selected?

  • would there be interest in having an option for using different interpolators? For example we could have a function argument to .gridtime to provide one's own interpolator and .gridtime would apply it to the dataset?

@jerabaul29
Copy link
Collaborator Author

would there be interest / should I look into extending .gridtime to also work on other kinds of obs? should this be done automatically or user-selected?

if working in this direction, I could start by implementing this only for "scalar" quantities, not e.g. 1D spectra fields.

@jerabaul29
Copy link
Collaborator Author

Another consideration: this is actually something that is maybe more the responsibility of the OMB decoder than of trajan. Since the OMb decoder is part of trajan to allow to use it as a reader (which is very convenient, and I would like this to continue if you are ok with this :) ), would you be more willing that I still include this functionality, but moving the functions etc. fully to the omb decoder related files, so that this is "contained" there? This way, this function would not really be the responsibility of core trajan / "you".

Let me know if this would be an acceptable solution - if so, I can easily adapt the PR :) .

@gauteh
Copy link
Member

gauteh commented Oct 25, 2024

I don't have the complete overview, but here are my current thoughts so that you don't have to wait for too long for some feedback:

  • gridtime and interp are pretty close, they are trying to do something general: interpolate a variable which has an observation dimension rather than a time dimension (so it fits in the seltime, iseltime series)
  • gridtime currently tries to interpolate all obs variables on the same dimension as the trajectories, this makes a lot of sense on e.g. temperature.
  • for you gridtime doesn't work, the assumptions or methods are insufficient? though I am still a bit hazy on exactly why, but I believe you.

I am wondering if maybe we should rather try to make an obsinterp or obsgrid method, where gridtime and interp are special cases. This would then be general to any observational data, and you could select a different obs-variable. I think that by default gridtime (or the new obsinterp) should only work on the trajectory variables by default (and then you can specify a list of variables to interpolate in addition, e.g. temperature). In that way it is close to just interpolate non-trajectory, but observational data: ds.imu_waves.traj.obsinterp(...) would have all the context it need to pick out dimension and variable etc. We could have ds.traj.obsinterp(['imu..', 'temperature']) to do multiple variables at once.

To summarize:

  • maybe make an obsinterp that is general to whatever obs dimesnion or variable you choose
  • obsinterp should be on dataset level and/or on variable level
  • don't try to interpolate everything at once (like gridtime does by default)

@jerabaul29
Copy link
Collaborator Author

My concern is that I do not want to perform "brutal, blind" interpolation in my case, as we may easily have holes in the timeseries in particular for buoys with different sensors sending data over iridium, so interpolating "brutally" may lead to silently generating non-sense data.

Therefore, given that users ask for interpolated position at the time of the spectra for the OMB, I have to perform interpolation with a "careful interpolator". I asked to scipy (I think the "import chain" is trajan <- xarray <- scipy <- numpy, but that the numpy is too "fundamental" for adding lots of options) if this could be implemented "in general", but the response was that this is too use-case specific, so that it should be implemented ad-hoc by the user:

scipy/scipy#21755 .

This is why I would like to implement a separate interpolation function, that can be used to interpolate when there is a risk that there are holes in the data (like for buoys and other observations, and in particular in the OMB decoder to answer the user's request). I agree that if working with model data on a regular time (or space) grid, when data are of "perfect" quality (in the sense that there are never holes or missing data), then the functions that you mention and are already implemented work fine and it is fine to use these already existing functions to perform interpolation. But I think that the case I have here is a bit different, due to this holes issues, and that this will necessarily be a bit ad hoc :) . My hope is that the way I have written:

https://github.com/jerabaul29/trajan/blob/8bd2bc4017370b2bf345dc63faf8f3da06cb7674/trajan/utils.py#L5-L96

means that it should be easy to re-use other places in decoders / workflows where the same "careful interpolation in case of holes" use case arises :) .

@jerabaul29
Copy link
Collaborator Author

Thanks for the PR #136 showing how to do something similar with existing tooling @gauteh ! :)

I think that there are actually 2 different questions here:

So I think that we talk about slightly different things: your example focuses on the first point, but makes no guarantee about the second point. My PR here is actually mostly interested in the second point (though I also naturally have to consider technicalities so have some technical aspects to make the first point work too :) ).

@gauteh
Copy link
Member

gauteh commented Nov 12, 2024

Is the goal to interpolate / grid to a grid (like gridtime) or to the observation times of the IMU?

@jerabaul29
Copy link
Collaborator Author

The goal is to interpolate "the GPS position at the GPS times" to "the GPS position at the wave spectra times", and doing so without risking to do a wild invalid interpolation in case there are some holes in the data :) .

@gauteh
Copy link
Member

gauteh commented Nov 12, 2024

Ok. I think this should go in a ds.traj.gridobs(times) method. The interpolation method should be selected as an argument, yours can be the default. The times argument should be the observational time variable, e.g.: ds.imu_times (then I think it contains all the necessary information about dimension etc).

I can push up a small skeleton, I also added an to_1d() method which will be useful when iterating over trajectories.

On another note, the example_indepth_... example is getting pretty long, meaning it is easy to mess it up when changing unrelated parts of the code. Maybe it's better to have shorter example? Makes it easier for the user to find exactly what they are looking for (even if it means some duplication).

@gauteh gauteh force-pushed the feat/obs_interpolated_position branch from 59e268f to 54b5c7d Compare November 12, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants