This repository contains ETLs to IFPS for publicly available datasets.
- Zarr chunks are optimized for long timeseries calculations: they are wide in time and short in spatial
- Chunking is set to 400 in time, 25 in latitude, 25 in longitude (for data variables that are float32, which is all of them)
- With this setting, each uncompressed chunk is exactly 1 Megabyte in size
- longitude is -180 to 180, latitude is -90 to 90. Both are sorted in ascending order.
- Source file metadata, including netCDF scaling and additive shift variables (which are applied and then removed), is largely dropped
Each heading is a set of data variables from some source, with the data variables listed inside.
precip-conus
Daily precipitation data for the continental United States. https://psl.noaa.gov/mddb2/showDataset.html?datasetID=45precip-global
Daily precipitation data for Earth. https://psl.noaa.gov/data/gridded/data.cpc.globalprecip.htmltmax
Daily maximum temperature for Earth. https://psl.noaa.gov/data/gridded/data.cpc.globaltemp.htmltmin
Daily minimum temperature for Earth. https://psl.noaa.gov/data/gridded/data.cpc.globaltemp.html
CHIRPS provides land precipitation estimates. For a more detailed overview, see https://data.chc.ucsb.edu/products/CHIRPS-2.0/README-CHIRPS.txt
"p05"/"p25" refers to the precision/width of the grid cells, where p05 means 0.05 lat/lon degrees and p25 means 0.25 lat/lon degrees.
(Based on this site stating CHIRPS incorporates 0.05° resolution satellite imagery", we assume that's what p05 means.
A "final-" prefix refers to finalized data. As opposed to the "prelim" prefix, which refers to data that may be revised by CHIRPS, but is often closer to realtime values.
final-p05
https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/netcdf/p05/final-p25
https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/netcdf/p25/prelim-p05
https://data.chc.ucsb.edu/products/CHIRPS-2.0/prelim/global_daily/netcdf/p05/
ERA5 provides a wide variety of hourly climate data variables and stretches back to 1940-01-01. See more here on the Copernicus site.
The ETL is only configured to accept data variables dClimate is concerned with, but it is very easy to extend the list, other variables should work the same.
Running the ETL requires a S3 compatible bucket to cache raw data files, you can configure that by taking the template in era5-env.json.example
and writing that to a era5-env.json
file.
You can find the below variable pages and descriptions by starting your search from this wiki page.
2m_temperature
"temperature of air at 2m above the surface of land, sea or in-land waters"10m_u_component_of_wind
"the eastward component of the 10m wind. It is the horizontal speed of air moving towards the east, at a height of ten metres above the surface of the Earth"10m_v_component_of_wind
"northward component of the 10m wind. It is the horizontal speed of air moving towards the north, at a height of ten metres above the surface of the Earth"100m_u_component_of_wind
"eastward component of the 100 m wind. It is the horizontal speed of air moving towards the east, at a height of 100 metres above the surface of the Earth"100m_v_component_of_wind
"northward component of the 100 m wind. It is the horizontal speed of air moving towards the north, at a height of 100 metres above the surface of the Earth"surface_pressure
"pressure (force per unit area) of the atmosphere on the surface of land, sea and in-land water"surface_solar_radiation_downwards
"mount of solar radiation (also known as shortwave radiation) that reaches a horizontal plane at the surface of the Earth. This parameter comprises both direct and diffuse solar radiation"total_precipitation
"accumulated liquid and frozen water, comprising rain and snow, that falls to the Earth's surface"
The PRISM organization provides precipitation and temperature data for the continental United States.
PRISM creates both 800 meter and 4 km precision versions. Only the 4 km versions are free, so we ETL those.
precip-4km
tmax-4km
tmin-4km
The one-shots
folder contains ETLs for data that only needs to be ETLed once, and does not need to be maintained. This is why many of the ETLs are also added as Jupyter notebooks, for ease of use by data scientists exploring and molding the data before it is provided to dClimate for storage in our system.