Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting ERA5-Land data with extract_clima() #48

Open
jonathanvonoppen opened this issue Jan 13, 2025 · 3 comments
Open

Extracting ERA5-Land data with extract_clima() #48

jonathanvonoppen opened this issue Jan 13, 2025 · 3 comments

Comments

@jonathanvonoppen
Copy link

This concerns the compatibility of extract_clima() with ERA5-Land for extraction of gridded data. When trying to do that, I get the error:

# specify file 
era5land_nc <- "era5land_alps_2011_1.nc"
# unzip
unzip(gsub(".nc", ".zip", era5land_nc), exdir = gsub(".nc", "", era5land_nc))
# extract data
mcera5::extract_clima(nc = era5land_nc, 
                      long_min = 10.0870, long_max = 10.1445,
                      lat_min = 46.6347, lat_max = 46.6720,
                      start_time = as.POSIXct("2011-01-01 00:00:00", tz = "UTC"), 
                      end_time = as.POSIXct("2011-01-31 23:00:00", tz = "UTC"),
                      format = "microclimf")

Error: [rast] tcc not found. Choose one of: t2m, d2m, sp, u10, v10, tp, ssrd, str, strd

Looking into the code, the varname_list object defined within the function does not yet account for the different sets of variables available in ERA5 and ERA5-Land, respectively. The function thus fails at the following loop trying to rast() the layer.

Might be a quick fix with providing alternative varname_lists conditional to a dataset argument (e.g. "era5" vs "era5land"), or perhaps to allow users to specify a list themselves?

Thanks a lot!

link to zipped file - save in wd

@jonathanvonoppen
Copy link
Author

So, since not all variables needed for downstream calculations are available from ERA5-Land anyway, I have found a solution integrating both ERA5 (nc_era5) and ERA5-Land files (era5l) from corresponding time periods. The respective ERA5 layers tcc, msnlwrf, msdwlwrf, fdir and lsm are resampled to the ERA5-Land grid when importing data layers:

var_list <- lapply(varname_list, function(v) {
    if (v == "lsm") {
      # only need one timestep for land-sea mask
      # "lsm" variable is not in ERA5-Land: load from ERA5 data and resample and crop to ERA5-Land grid
      r <- terra::rast(nc_era5[nc_era5 == gsub("land", "", nc_era5l)], subds = v)  # assuming file naming schemes that only differ by "land"
      rref <- terra::rast(nc_era5l, subds = "t2m")
      r <- terra::resample(r, rref, method = "bilinear")
      r <- terra::crop(r, rref)
    } else if(v %in% c("tcc", "msnlwrf", "msdwlwrf", "fdir")) {
      # variables not in ERA5-Land: load from ERA5 data, and resample and crop to ERA5-Land grid
      r <- terra::rast(nc_era5[nc_era5 == gsub("land", "", nc_era5l)], subds = v)
      rref <- terra::rast(nc_era5l, subds = "t2m")
      r <- terra::resample(r, rref, method = "bilinear")
      r <- terra::crop(r, rref)
      # subset down to desired time period
      r <- r[[as.POSIXct(nc_datetimes, tz = "UTC") %in% tme]]
      # Name layers as timesteps
      names(r) <- tme
    } else {
      # For all others, subset down to desired time period
      # terra::time() not identifying time data of ERA5 data from new CDS, so
      # use nc_datetimes
      r <- terra::rast(nc_era5l, subds = v)
      r <- r[[as.POSIXct(nc_datetimes, tz = "UTC") %in% tme]]
      # Name layers as timesteps
      names(r) <- tme
    }
    
    # Subset down to desired spatial extent
    r <- terra::crop(r, terra::ext(long_min, long_max, lat_min, lat_max))
    return(r)
  })

In addition, when extracting coordinates, we need to account for the fact that the t2m template layer may now include NAs, hence adding na.rm = TRUE (using a resampled ERA5 layers as the template instead would work as well).

coords <- as.data.frame(terra::crds(t2m[[1]]))

Some other additions might be useful such as an index-based selection of corresponding ERA5 files, or checking that both nc_era5 and nc_era5l inputs cover the same time period.

Hope this helps!

P.S. unfortunately the corresponding ERA5-Land file is too big to attach here..

@dklinges9
Copy link
Owner

Hi Jonathan, I'm a bit backed up this week but I'll follow up here shortly!

@jonathanvonoppen
Copy link
Author

jonathanvonoppen commented Feb 4, 2025

Just flagging here that apparently and very annoyingly, CDS seem to have changed ERA5-Land .nc file structures.
Among other changes, when loading data with rast(nc_file), variables are now contained in names(rast) instead of the previous varnames(rast), for instance in the form
[1] "SFC (Ground or water surface); 2 metre temperature [C]"
, so my above solution looping through varnames_list won't work anymore for ERA5-Land data.
I'm also looking into it and will let you know if I come up with a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants