diff --git a/notebooks/2020_temp_precip_sta.nc b/notebooks/2020_temp_precip_sta.nc new file mode 100644 index 0000000..4d21c70 Binary files /dev/null and b/notebooks/2020_temp_precip_sta.nc differ diff --git a/notebooks/merge_climate_datasets_exercise.ipynb b/notebooks/merge_climate_datasets_exercise.ipynb index 9be1adf..bed0607 100644 --- a/notebooks/merge_climate_datasets_exercise.ipynb +++ b/notebooks/merge_climate_datasets_exercise.ipynb @@ -1,254 +1,3670 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "aa24b099", - "metadata": {}, - "source": [ - "# Merging Climate Datasets Exercise\n", - "\n", - "Work through this notebook to practice harmonizing and merging two climate datasets that differ in temporal cadence and spatial resolution.\n", - "\n", - "You will: \n", - "- Load two public NOAA datasets directly from the cloud\n", - "- Subset to the continental US (use 230°E–300°E in longitude since the data span 0–360°)\n", - "- Use `xr.resample` to aggregate time and `xr.interp` to match grids\n", - "- Combine the variables with `xr.merge` for joint analysis\n", - "\n", - "Refer back to the answer key after attempting each step.\n" - ] - }, - { - "cell_type": "markdown", - "id": "d6f677f5", - "metadata": {}, - "source": [ - "## 1. Setup\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "0a656265", - "metadata": { - "tags": [ - "parameters" - ] - }, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import xarray as xr\n", - "\n", - "try:\n", - " import cartopy.crs as ccrs\n", - " import cartopy.feature as cfeature\n", - "except ImportError:\n", - " ccrs = None\n", - " cfeature = None\n", - "\n", - "TEMP_URL = \"https://psl.noaa.gov/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.2020.nc\"\n", - "PRECIP_URL = \"https://psl.noaa.gov/thredds/dodsC/Datasets/cpc_global_precip/precip.2020.nc\"\n", - "\n", - "LAT_RANGE = (20, 50) # degrees North\n", - "LON_RANGE_360 = (230, 300) # degrees East (equivalent to -130° to -60°)\n", - "LON_RANGE_180 = (-130, -60) # convenience if a dataset uses -180° to 180°\n", - "\n", - "TIME_RANGE = slice(\"2020-06-01\", \"2020-06-30\")\n" - ] - }, - { - "cell_type": "markdown", - "id": "45f8536b", - "metadata": {}, - "source": [ - "## 2. Load the datasets\n", - "\n", - "Open both remote datasets with `xr.open_dataset`, passing a reasonable chunk size for the time dimension. Assign the resulting objects to `air` and `precip`.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3270985f", - "metadata": { - "tags": [ - "exercise" - ] - }, - "outputs": [], - "source": [ - "# TODO: load the air temperature and precipitation datasets.\n", - "# Example: air = xr.open_dataset(..., chunks={\"time\": 8})\n", - "raise NotImplementedError(\"Assign datasets to `air` and `precip`.\")\n" - ] - }, + "cells": [ + { + "cell_type": "markdown", + "id": "aa24b099", + "metadata": {}, + "source": [ + "# Merging Climate Datasets Exercise\n", + "\n", + "Work through this notebook to practice harmonizing and merging two climate datasets that differ in temporal cadence and spatial resolution.\n", + "\n", + "You will: \n", + "- Load two public NOAA datasets directly from the cloud\n", + "- Subset to the continental US (use 230°E–300°E in longitude since the data span 0–360°)\n", + "- Use `xr.resample` to aggregate time and `xr.interp` to match grids\n", + "- Combine the variables with `xr.merge` for joint analysis\n", + "\n", + "Refer back to the answer key after attempting each step.\n" + ] + }, + { + "cell_type": "markdown", + "id": "d6f677f5", + "metadata": {}, + "source": [ + "## 1. Setup\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "0a656265", + "metadata": { + "tags": [ + "parameters" + ] + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import xarray as xr\n", + "\n", + "try:\n", + " import cartopy.crs as ccrs\n", + " import cartopy.feature as cfeature\n", + "except ImportError:\n", + " ccrs = None\n", + " cfeature = None\n", + "\n", + "TEMP_URL = \"https://psl.noaa.gov/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.2020.nc\"\n", + "PRECIP_URL = \"https://psl.noaa.gov/thredds/dodsC/Datasets/cpc_global_precip/precip.2020.nc\"\n", + "\n", + "LAT_RANGE = slice(50, 20) # degrees North\n", + "LON_RANGE_360 = slice(230, 300) # degrees East (equivalent to -130° to -60°)\n", + "LON_RANGE_180 = (-130, -60) # convenience if a dataset uses -180° to 180°\n", + "\n", + "TIME_RANGE = slice(\"2020-06-01\", \"2020-06-30\")" + ] + }, + { + "cell_type": "markdown", + "id": "45f8536b", + "metadata": {}, + "source": [ + "## 2. Load the datasets\n", + "\n", + "Open both remote datasets with `xr.open_dataset`, passing a reasonable chunk size for the time dimension. Assign the resulting objects to `air` and `precip`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "3270985f", + "metadata": { + "tags": [ + "exercise" + ] + }, + "outputs": [], + "source": [ + "# TODO: load the air temperature and precipitation datasets.\n", + "# Example: air = xr.open_dataset(..., chunks={\"time\": 8})\n", + "air = xr.open_dataset(TEMP_URL, chunks={\"time\": 8}, mask_and_scale=True, decode_cf=True)\n", + "precip = xr.open_dataset(PRECIP_URL, chunks={\"time\": 8}, mask_and_scale=True, decode_cf=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "3f7cec98-7710-4370-bb1b-9ea078b08693", + "metadata": {}, + "outputs": [], + "source": [ + "precip = precip.compute()" + ] + }, + { + "cell_type": "markdown", + "id": "761ec85f", + "metadata": {}, + "source": [ + "## 3. Subset to the continental United States and June 2020\n", + "\n", + "Select the bounding box provided above and limit the time range to June 2020 for both datasets. Store the results in `air_us` and `precip_us`.\n", + "Remember that longitude runs from 0° to 360°, so select 230°E–300°E. Check whether each coordinate is ascending or descending before building the slice.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "264d9641", + "metadata": { + "tags": [ + "exercise" + ] + }, + "outputs": [], + "source": [ + "# TODO: subset both datasets using `sel`, handling coordinate ordering as needed.\n", + "air_us = air.sel(lat=LAT_RANGE, lon=LON_RANGE_360, time=TIME_RANGE)\n", + "precip_us = precip.sel(lat=LAT_RANGE, lon=LON_RANGE_360, time=TIME_RANGE)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "5f58dea7-4dd9-411a-acc9-4ed5479631b4", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "761ec85f", - "metadata": {}, - "source": [ - "## 3. Subset to the continental United States and June 2020\n", - "\n", - "Select the bounding box provided above and limit the time range to June 2020 for both datasets. Store the results in `air_us` and `precip_us`.\n", - "Remember that longitude runs from 0° to 360°, so select 230°E–300°E. Check whether each coordinate is ascending or descending before building the slice.\n" + "data": { + "text/html": [ + "
<xarray.Dataset> Size: 182kB\n", + "Dimensions: (time: 120, lat: 13, lon: 29)\n", + "Coordinates:\n", + " * lat (lat) float32 52B 50.0 47.5 45.0 42.5 40.0 ... 27.5 25.0 22.5 20.0\n", + " * lon (lon) float32 116B 230.0 232.5 235.0 237.5 ... 295.0 297.5 300.0\n", + " * time (time) datetime64[ns] 960B 2020-06-01 ... 2020-06-30T18:00:00\n", + "Data variables:\n", + " air (time, lat, lon) float32 181kB dask.array<chunksize=(8, 13, 29), meta=np.ndarray>\n", + "Attributes:\n", + " Conventions: COARDS\n", + " title: 4x daily NMC reanalysis (2014)\n", + " history: created 2017/12 by Hoop (netCDF2.3)\n", + " description: Data is from NMC initialized reanalysis\\...\n", + " platform: Model\n", + " dataset_title: NCEP-NCAR Reanalysis 1\n", + " _NCProperties: version=2,netcdf=4.6.3,hdf5=1.10.5\n", + " References: http://www.psl.noaa.gov/data/gridded/dat...\n", + " DODS_EXTRA.Unlimited_Dimension: time
<xarray.Dataset> Size: 1MB\n", + "Dimensions: (time: 30, lat: 60, lon: 140)\n", + "Coordinates:\n", + " * lat (lat) float32 240B 49.75 49.25 48.75 48.25 ... 21.25 20.75 20.25\n", + " * lon (lon) float32 560B 230.2 230.8 231.2 231.8 ... 298.8 299.2 299.8\n", + " * time (time) datetime64[ns] 240B 2020-06-01 2020-06-02 ... 2020-06-30\n", + "Data variables:\n", + " precip (time, lat, lon) float32 1MB nan nan nan nan ... nan nan nan nan\n", + "Attributes:\n", + " _NCProperties: version=1|netcdflibversion=4.4.1.1|hdf5l...\n", + " Conventions: CF-1.0\n", + " version: V1.0\n", + " title: CPC GLOBAL PRCP V1.0 RT\n", + " dataset_title: CPC GLOBAL PRCP V1.0\n", + " Source: ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_U...\n", + " References: https://www.psl.noaa.gov/data/gridded/da...\n", + " history: Updated 2021-01-02 23:31:03\n", + " DODS_EXTRA.Unlimited_Dimension: time
<xarray.Dataset> Size: 12MB\n", + "Dimensions: (time: 366, lat: 60, lon: 140)\n", + "Coordinates:\n", + " * time (time) datetime64[ns] 3kB 2020-01-01 2020-01-02 ... 2020-12-31\n", + " * lat (lat) float32 240B 49.75 49.25 48.75 48.25 ... 21.25 20.75 20.25\n", + " * lon (lon) float32 560B 230.2 230.8 231.2 231.8 ... 298.8 299.2 299.8\n", + "Data variables:\n", + " air (time, lat, lon) float32 12MB dask.array<chunksize=(61, 20, 140), meta=np.ndarray>\n", + "Attributes:\n", + " Conventions: COARDS\n", + " title: 4x daily NMC reanalysis (2014)\n", + " history: created 2017/12 by Hoop (netCDF2.3)\n", + " description: Data is from NMC initialized reanalysis\\...\n", + " platform: Model\n", + " dataset_title: NCEP-NCAR Reanalysis 1\n", + " _NCProperties: version=2,netcdf=4.6.3,hdf5=1.10.5\n", + " References: http://www.psl.noaa.gov/data/gridded/dat...\n", + " DODS_EXTRA.Unlimited_Dimension: time
<xarray.Dataset> Size: 2MB\n", + "Dimensions: (time: 30, lat: 60, lon: 140)\n", + "Coordinates:\n", + " * time (time) datetime64[ns] 240B 2020-06-01 ... 2020-06-30\n", + " * lat (lat) float32 240B 49.75 49.25 48.75 ... 21.25 20.75 20.25\n", + " * lon (lon) float32 560B 230.2 230.8 231.2 ... 298.8 299.2 299.8\n", + "Data variables:\n", + " air_temperature (time, lat, lon) float32 1MB dask.array<chunksize=(30, 20, 140), meta=np.ndarray>\n", + " daily_precip (time, lat, lon) float32 1MB nan nan nan ... nan nan nan\n", + "Attributes:\n", + " Conventions: COARDS\n", + " title: 4x daily NMC reanalysis (2014)\n", + " history: created 2017/12 by Hoop (netCDF2.3)\n", + " description: Data is from NMC initialized reanalysis\\...\n", + " platform: Model\n", + " dataset_title: NCEP-NCAR Reanalysis 1\n", + " _NCProperties: version=2,netcdf=4.6.3,hdf5=1.10.5\n", + " References: http://www.psl.noaa.gov/data/gridded/dat...\n", + " DODS_EXTRA.Unlimited_Dimension: time
<xarray.Dataset> Size: 2MB\n", + "Dimensions: (time: 30, lat: 60, lon: 140)\n", + "Coordinates:\n", + " * time (time) datetime64[ns] 240B 2020-06-01 ... 2020-06-30\n", + " * lat (lat) float32 240B 49.75 49.25 48.75 ... 21.25 20.75 20.25\n", + " * lon (lon) float32 560B 230.2 230.8 231.2 ... 298.8 299.2 299.8\n", + "Data variables:\n", + " air_temperature (time, lat, lon) float32 1MB dask.array<chunksize=(30, 20, 140), meta=np.ndarray>\n", + " daily_precip (time, lat, lon) float32 1MB nan nan nan ... nan nan nan\n", + "Attributes:\n", + " Conventions: COARDS\n", + " title: 4x daily NMC reanalysis (2014)\n", + " history: created 2017/12 by Hoop (netCDF2.3)\n", + " description: Data is from NMC initialized reanalysis\\...\n", + " platform: Model\n", + " dataset_title: NCEP-NCAR Reanalysis 1\n", + " _NCProperties: version=2,netcdf=4.6.3,hdf5=1.10.5\n", + " References: http://www.psl.noaa.gov/data/gridded/dat...\n", + " DODS_EXTRA.Unlimited_Dimension: time