This repository allows you to recreate the figures and statistics from the following paper:
The code in this repository is structured so that all analysis can be done with python notebooks in the ~/notebooks
directory. The expected runtime is approxiamtely one day for the full analysis. The steps are as follows:
-
Download model data, metadata, and pre-calculated metrics from the associated Zenodo repository .
-
Download and prepare Global Runoff Data Center (GRDC) streamflow observation data and model simulation data. This step is not necessary if you want to use the pre-calculated statistics included in the Zenodo repository.
-
Run notebooks to calclate metrics. This step is not necessary if you want to use the pre-calculated statistics included in the Zenodo repository.
-
Run notebooks to produce figures and analyses.
Detailed instructions for these three steps are below.
Also included in the ~/notebooks
directory is a subdirectory called 'backend', which contains much of the active (mostly functional) code used by the analysis notebooks. The user should only touch the source code in this directory to change their local working directory, as described in the instructions below.
Within the ~/notebooks/backend
source directory is another source directory called return_period_calculator
. This subdirectory contains python code for fitting return period distributions and estimating return periods. These calculations are based loosely on guidelines outlined in the USGS Bulletin 17c, with some differences related to the statistical tests used for identifying outliers.
This repository is licensed under an Apache 2.0 open source license. Please see the LICENSE file in the root directory for the full license.
This is not an official Google product.
This repository should run on any computer and operating system that supports Python version 3. It has been tested on the Debian GNU/Linux 11 (bullseye) operating system. Running the notebooks for calculating metrics requires 128 GB of local memory.
No software installation is required beyond Python v3 and Python libraries contained in the environment file. This repository is based on Python notebooks and can be run directly from a local clone:
git clone https://github.com/googlestaging/global_streamflow_model_paper.git
An environment file is included for installing the necessary Python dependencies. If you are using Anaconda (miniconda, etc.) you may create this invironment with the following command from inside the global_streamflow_model_paper
directory that results from cloning this repository:
conda env create -f environment.yml
Detailed Steps to Recreate Results Reported in the Paper
In the file ~/notebooks/backend/data_paths.py
change the local variable _WORKING_DIR
to your current working directory.
You will need to download and unzip/untar the tarballs from the Zenodo repository listed in the Code and Data Availability section of the paper referenced at the top of this README document. The DOI for the zenodo repository is:
Your working directory should be the directory created by cloning this repository. Unpacking the tarballs from the Zenodo repository will result in the following subdirectories: ~/model_data
, ~/metadata
, and ~/metrics
, and ~/gauge_groups_for_paper
. All of these subdirectories should be placed in the working directory so that the working directory contains ~/notebooks
(and other subdirectories included in this Github repository), as well as ~/model_data
(and all other subdirectories from the Zenodo repository).
Due to licensing restrictions, we are not allowed to share streamflow observation data from the Global Runoff Data Center (GRDC). Using the GRDC Data Portal, download GRDC data for all stations that are listed in the ~/gauge_groups/dual_lstm/grdc_filtered.txt
file. Download these as daily NetCDF files. This requires registering with the GRDC. You will likely have to download these data in multiple batches, resulting in multiple NetCDF files. If that is the case, name each of the NetCDF files uniqely and put them into a single directory somewhere on your local machine. Point to that directory using the GRDC_DATA_DOWNLOAD_DIRECTORY
variable in the ~/notebooks/backend/data_paths.py
file, and then run the ~/notebooks/concatenate_grdc_downloads.ipynb
notebook to concatenate the download files into one netcdf file.
GRDC Data Portal: https://portal.grdc.bafg.de/applications/public.html?publicuser=PublicUser
Groups of streamflow gauges that were used for cross validation studies are contained in the directory ~/gauge_groups_for_paper
(from the Zenodo repository). Code that shows how these cross validation splits were constructed is contained in the ~/notebooks/create_ungauged_experiments_gauge_groups.ipynb
notebook. This notebook will produce two products:
- Gauge groups as text files for various types of cross validation splits, which are stored in
~/gauge_groups
directory. - Maps of the locations of gauges in each cross validation split.
You have the option to create gauge splits with GRDC gauges and Caravan gauges (either or both combined).
Note that if you run this notebook it will overwrite any existing gauge groups with new ones. These new gauge groups will not be the same as the ones used in the paper, since at least some of these gauge groups were created with a random number generator (i.e., the k-fold cross validation splits and the hydrologically-separated gauge splits). Using gauge groups that you create yourself instead of the ones that are in the ~/gauge_groups_for_paper
subdirectory will result in inaccurate statistics. Doing so will cause the AI model to appear better than it really is since results will be pulled from gauges that were not withheld during training. This notebook is included in this repository only so that you can see how the gauge groups were created.
Once you have the GRDC netCDF file created, run the /notebooks/calculate_standard_hydrograph_metrics.ipynb
notebook to calculate a set of standard hydrological skill metrics on modeled hydrographs. This notebook produces plots that are in the paper’s Supplementary Material.
Next, run the ~/notebooks/calculate_return_period_metrics.ipynb
notebook to calculate precision and recall metrics on different magnitude extreme events from modeled hydrographs.
Run the various figure_*.ipynb
notebooks to create figures from the paper. These figures are saved in both PNG and vector graphics formats in the directory ~/results_figures