Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/add contiguous #129

Merged
merged 12 commits into from
Oct 22, 2024
Merged

Feat/add contiguous #129

merged 12 commits into from
Oct 22, 2024

Conversation

jerabaul29
Copy link
Collaborator

This adds the possibility to open "simple" (i.e. with scalar variables, support for several-dimensional-per-sample variables would require some extra work) ragged contiguous dataset.

Includes test and example.

@jerabaul29
Copy link
Collaborator Author

(I will work some more on the failing tests; I think this is an issue with identifying if a ds is a candidate for "usual" or "ragged" traj classes).

@jerabaul29
Copy link
Collaborator Author

I think this should be ready for a bit of review and comments! :)

  • Honestly I do not like the logics I implemented in:

trajan/accessor.py

I think it is ugly and not very clear. But given how little we can assume about how "generic" nc files that should be readable by trajan may be formatted, I wonder if this is the best I can do so far?

  • This should be fine at reading e.g. spotter_data_bulk.nc , I have tested it in some other files and scripts

  • I have added a small (2 buoys) subset of the Sofar dataset in the test data - not sure if this poses any copyright issues. I guess not, we give proper credit and the attributes are kept:

✔ pc6012:~/Desktop/Git/trajan/tests/test_data [feat/add_contiguous|✔]> ncdump -h xr_spotter_bulk_test_data.nc 
netcdf xr_spotter_bulk_test_data {
...
// global attributes:
		:title = "Sofar Spotter Data Archive - Bulk Wave Parameters" ;
		:institution = "Sofar Ocean" ;
		:source = "Spotter wave buoy" ;
		:creation_date = "2023-10-18 00:43:55.333537" ;
		:author = "Isabel A. Houghton" ;
		:email = "[email protected]" ;
		:references = "https://content.sofarocean.com/hubfs/Spotter%20product%20documentation%20page/Sofar%20-%20Technical_Reference_Manual.pdf" ;
		:modified = "modified to only keep a couple of buoys, to be used as a test data file for the trajan package" ;
}

@gauteh
Copy link
Member

gauteh commented Oct 21, 2024

Cool! I will try to take a look tomorrow, I agree that the logics are messy. That's why I would like there to be a grid mapping with no guessing involved. Would be nice if a license attribute was standard. Another suggestion for CF.

trajan/accessor.py Outdated Show resolved Hide resolved
trajan/accessor.py Outdated Show resolved Hide resolved
@jerabaul29
Copy link
Collaborator Author

I think I implemented your comments :) .

I think this may be quite brittle until the following things are standardized (I do not think this is a standard yet?):

  • using index as the index dimension in case a ragged contiguous array is used (I do not think dimensions have standard_name attributes? If so we will have to rely on dimensions own name?
  • using rowsize as the lookup table for the trajectory length; I do not think there is a standard_name for this either yet? (and / or if so, it is not used in the .nc file I work with).

@jerabaul29
Copy link
Collaborator Author

I can try to robustify a bit the discovery of rowsize, it should have a dimension that is equal to trajectory. Doing it now :) .

@gauteh
Copy link
Member

gauteh commented Oct 22, 2024

One choice we should discuss is that this class converts the dataset on opening to traj2d. In trajan so far this requires an explicit operation (e.g. gridtime), meaning that as soon as something is done on this dataset (e.g. assign_cf_attrs) a 2d dataset is going to come out on the other end. Perhaps this should be behind a to_obs()/explode() method? Or there should be a @require_obs decorator on the methods that require this which conveniently convert the dataset first (but only when necessary)?

This may affect the API and require a lot of changes later if we decide to change.

@gauteh
Copy link
Member

gauteh commented Oct 22, 2024

I can try to robustify a bit the discovery of rowsize, it should have a dimension that is equal to trajectory. Doing it now :) .

A quick sanity check would be to see that it sums to length of the index dimension?

@knutfrode
Copy link
Contributor

I think I implemented your comments :) .

I think this may be quite brittle until the following things are standardized (I do not think this is a standard yet?):

  • using index as the index dimension in case a ragged contiguous array is used (I do not think dimensions have standard_name attributes? If so we will have to rely on dimensions own name?
  • using rowsize as the lookup table for the trajectory length; I do not think there is a standard_name for this either yet? (and / or if so, it is not used in the .nc file I work with).

Yes, I agree it is not at all clear from CF-convention.
Thus we should collect all our questions and suggestions and then forward to the CF-team as soon as we are ready (in cooperation with others)

@jerabaul29
Copy link
Collaborator Author

I can try to robustify a bit the discovery of rowsize, it should have a dimension that is equal to trajectory. Doing it now :) .

This should be implemented to the best I could in da1fe22 .

@jerabaul29
Copy link
Collaborator Author

One choice we should discuss is that this class converts the dataset on opening to traj2d. In trajan so far this requires an explicit operation (e.g. gridtime), meaning that as soon as something is done on this dataset (e.g. assign_cf_attrs) a 2d dataset is going to come out on the other end. Perhaps this should be behind a to_obs()/explode() method? Or there should be a @require_obs decorator on the methods that require this which conveniently convert the dataset first (but only when necessary)?

This may affect the API and require a lot of changes later if we decide to change.

I am not sure how to implement this, but I can try to get something this afternoon - I let you know :) .

@gauteh
Copy link
Member

gauteh commented Oct 22, 2024

I am not sure how to implement this, but I can try to get something this afternoon - I let you know :)

Ok! I can also make some code suggestions on this branch a bit later if you want? will coordinate with you before I start.

@jerabaul29
Copy link
Collaborator Author

I am not sure how to implement this, but I can try to get something this afternoon - I let you know :)

Ok! I can also make some code suggestions on this branch a bit later if you want? will coordinate with you before I start.

Btw, do you have an example / pointer of where this technique is used? :)

@jerabaul29
Copy link
Collaborator Author

@gauteh , regarding the discussion about the _exploding functionality :) .

Ok, maybe I can add a __require_array2D__ decorator, use it on all the functions in the Traj2d and Traj classes that need a "data format with 2D arrays containing the data", use a couple of abstract methods to propagate the calls?

This is quite a lot of machinery though - is it what you were thinking about / do you think the complexity is worth it?

Do you think it could be suitable to merge this PR (possibly with minor updates) first, and then add this extra functionality in an additional follow up PR, as this would touch quite a few extra classes and aspects about the architecture of the code? :) .

@jerabaul29
Copy link
Collaborator Author

Do you think it could be suitable to merge this PR (possibly with minor updates) first, and then add this extra functionality in an additional follow up PR, as this would touch quite a few extra classes and aspects about the architecture of the code? :) .

(this is just because separating this in different PR feels like a nice separation of concerns, and make it easier for me to think "atomatically" about these changes - also, it will be easier to play around with different designs / PRs with different approaches to this, if relevant :) ).

@gauteh
Copy link
Member

gauteh commented Oct 22, 2024

I made some attempts. I haven't found a good way to fall back on using to_2d().method without doing something weird. But it may be possible to solve using a wrapper or decorator of some sort.

@jerabaul29
Copy link
Collaborator Author

Ok! :)

Are you "happy for now" / ready to merge, and we follow this up in subsequent PRs? Or do you want some of the specific points to be improved before merging? :)

@gauteh
Copy link
Member

gauteh commented Oct 22, 2024

Yes 😊 feels like dataset type detection is a bit insecure, but it was in the first place anyway 😊

@gauteh gauteh merged commit 597927c into OpenDrift:main Oct 22, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants