Add Copernicus BGC DataWrangling #607

vtamsitt · 2025-08-29T20:35:30Z

Added GLORYS 1/4 degree BGC dataset metadata to Copernicus.jl

A couple of things:

There is no inpainting for sea ice variables because zeros are physical, not missing values'. This should also be the case for chlorophyll, phytoplankton, primary productivity I think? So should probably remove inpainting for these.
I split the dicts of variable names up for the data streams that have different variables (similar to ECCO.jl), but the metadata_prefix variable naming hasn't been updated to reflect this yet. It would be good to add a download_dataset function for Copernicus.jl as well and I'm not sure the file naming convention from Copernicus matches what's currently in metadata_prefix.

vtamsitt · 2025-08-29T20:44:09Z

Oh I see download_dataset is addressed in #522

src/DataWrangling/Copernicus/Copernicus.jl

navidcy · 2025-09-05T00:00:07Z

I'm happy to merge this.
@glwagner, @vtamsitt?

codecov · 2025-09-05T00:25:40Z

Codecov Report

❌ Patch coverage is 33.33333% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.69%. Comparing base (b3adc2f) to head (44c69f3).

Files with missing lines	Patch %	Lines
src/DataWrangling/Copernicus/Copernicus.jl	33.33%	12 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #607      +/-   ##
==========================================
- Coverage   23.06%   18.69%   -4.38%     
==========================================
  Files          47       47              
  Lines        2844     2857      +13     
==========================================
- Hits          656      534     -122     
- Misses       2188     2323     +135

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

navidcy · 2025-09-05T21:27:08Z

src/DataWrangling/Copernicus/Copernicus.jl

+available_variables(::GLORYSStatic) = copernicus_physics_dataset_variable_names
+available_variables(::GLORYSDaily) = copernicus_physics_dataset_variable_names
+available_variables(::GLORYSMonthly) = copernicus_physics_dataset_variable_names
+available_variables(::GLORYSBGCDaily) = copernicus_bgc_daily_dataset_variable_names
+available_variables(::GLORYSBGCMonthly) = copernicus_bgc_monthly_dataset_variable_names


how could we have available_variables give out different list of variables based on the dataset type if all variables are in one dictionary?

yeah that's why I did them separately, especially because the BGC variables are non-overlapping

I suggest using a single "master" copernicus_bgc_monthly_dataset_variable_names, and then writing out the available variable names explicitly in these functions. That way we have a single readable reference for all of the variables that can be downloaded from copernicus. It might become important as the number of variables grows.

Ok I reverted to a 'master' dict copernicus_dataset_variable_names and wrote out the variables explicitly in the avaiable_variables function. Is that what you were thinking?

@glwagner does this look ok now?

yes, thank you! I can eliminate the repeated code by creating a more organized type hierarchy. do you mind if I commit to your branch?

glwagner · 2025-09-11T21:45:28Z

@vtamsitt I made these changes:

I changed available_variables to return a tuple (a simple list of names) rather than a dictionary (pairs of values linking a long and short name). As far as I can tell, available_variables is supposed to be a list of names (more generally, we don't want to repeat information).
I added an abstract type GLORYSDataset, which allows us to express the fact that the GLORYS physics has one set of variables, while each GLORYSBGC dataset has different variables. There is no overlapping information now.
I changed available_variables for GLORYSBGCMonthly to call available_variables(GLORYSBGCDaily()). This documents how the monthly variables are a superset of the daily variables.

If you approve I think this is ready to merge. There are some downloading errors but they seem to be unrelated (the copernicus tests pass).

glwagner · 2025-09-11T21:46:14Z

src/DataWrangling/Copernicus/Copernicus.jl

+    :ph,
+    :surface_co2,
+    :total_phytoplankton,
+)


@vtamsitt this expresses how the monthly variables are a superset of the daily variables

glwagner · 2025-09-11T21:47:20Z

src/DataWrangling/Copernicus/Copernicus.jl


 # Datasets
 abstract type CopernicusDataset end
+abstract type GLORYSDataset <: CopernicusDataset end


This abstract type allows us to implement one available_variables for the two physics datasets GLORYSMonthly, GLORYSDaily

vtamsitt · 2025-09-12T03:42:37Z

@vtamsitt I made these changes:

I changed available_variables to return a tuple (a simple list of names) rather than a dictionary (pairs of values linking a long and short name). As far as I can tell, available_variables is supposed to be a list of names (more generally, we don't want to repeat information).

I added an abstract type GLORYSDataset, which allows us to express the fact that the GLORYS physics has one set of variables, while each GLORYSBGC dataset has different variables. There is no overlapping information now.

I changed available_variables for GLORYSBGCMonthly to call available_variables(GLORYSBGCDaily()). This documents how the monthly variables are a superset of the daily variables.

If you approve I think this is ready to merge. There are some downloading errors but they seem to be unrelated (the copernicus tests pass).

This all looks good to me and fine to merge @glwagner! Thanks for documenting the changes.

glwagner · 2025-09-12T09:43:36Z

@navidcy do you know anything about these failures?

simone-silvestri · 2025-09-12T12:25:46Z

looks like the ecco username or ecco password is not recognized

Veronica Tamsitt and others added 5 commits August 28, 2025 11:45

added BGC products to Copernicus.jl

f9d3425

fixed typo

7983fca

split sizes and variable names

dd4c45c

update available_variables

6b35151

Merge branch 'CliMA:main' into add_bgc_datawrangling

c6c0312

glwagner reviewed Aug 29, 2025

View reviewed changes

src/DataWrangling/Copernicus/Copernicus.jl Outdated Show resolved Hide resolved

navidcy reviewed Sep 4, 2025

View reviewed changes

src/DataWrangling/Copernicus/Copernicus.jl Outdated Show resolved Hide resolved

simplify + add test for download CopernicusBGC

ecb54a4

navidcy marked this pull request as ready for review September 4, 2025 23:59

Merge branch 'main' into add_bgc_datawrangling

91b7663

navidcy self-requested a review September 4, 2025 23:59

navidcy approved these changes Sep 4, 2025

View reviewed changes

navidcy added the data wrangling We must feed the models so they don't get cranky label Sep 5, 2025

navidcy reviewed Sep 5, 2025

View reviewed changes

vtamsitt and others added 5 commits September 8, 2025 11:57

Merge branch 'CliMA:main' into add_bgc_datawrangling

a659054

master variable names dict

9fc87c1

write out variable names

c9a505a

new abstract types and cleanup

1ccacf4

bugfix

44c69f3

glwagner reviewed Sep 11, 2025

View reviewed changes

glwagner approved these changes Sep 11, 2025

View reviewed changes

Uh oh!

Add Copernicus BGC DataWrangling #607

Are you sure you want to change the base?

Add Copernicus BGC DataWrangling #607

Uh oh!

Conversation

vtamsitt commented Aug 29, 2025

Uh oh!

vtamsitt commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

navidcy commented Sep 5, 2025

Uh oh!

codecov bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

navidcy Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

vtamsitt Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

glwagner Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vtamsitt Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

vtamsitt Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

glwagner Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

vtamsitt Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

glwagner commented Sep 11, 2025

Uh oh!

glwagner Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

glwagner Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vtamsitt commented Sep 12, 2025

Uh oh!

glwagner commented Sep 12, 2025

Uh oh!

simone-silvestri commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Sep 5, 2025 •

edited

Loading

glwagner Sep 8, 2025 •

edited

Loading

glwagner Sep 11, 2025 •

edited

Loading