Snakemake workflow for preparing data and managing runs for the Geospatial Probabilistic Estimation Package (GPEP)
This repository contains a Snakemake workflow for setting up data preprocessing and run configuration for GPEP. The workflow structures data preparation and organizes run configurations in a transparent, reproducible manner.
The repository design follows a structure similar to other
Snakemake-driven hydrometeorological workflows, including
gpep_to_summa_snakemake.
Snakemake is a workflow management system that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in Python style. It is widely used in bioinformatics for creating data analysis pipelines, but it's useful in any context where a series of steps (i.e., a workflow) needs to be performed on some input data to produce some output data. The official website can be located here.
Snakemake workflows consist of a set of rules, where each rule describes how to create a certain part of the output. The rules describe both what needs to be done (the actions or commands), and under what circumstances (the input files, output files, and conditions).
Example rule:
rule clip_dem:
input:
full_dem = full_dem
output:
domain_dem = domain_dem
params:
dem_bbox = config['dem_bbox']
run:
clip_tif_with_bbox(input.full_dem, params.dem_bbox, output.domain_dem)Preprocessing of data is a key initial task that requires transparency and reproducibility. In the generated DAG below, the steps taken in GPEP data prep are shown visually.
This corresponds to the overall rule run_gpep_prep.smk.
git clone https://github.com/DaveCasson/gpep_snakemake.git
cd gpep_snakemakePython 3.8+ is recommended.
python -m venv gpep_snakemake
source gpep_snakemake/bin/activatebrew install pyenv
brew install pyenv-virtualenv
pyenv install 3.9.16
pyenv virtualenv 3.9.16 gpep_snakemake
pyenv activate gpep_snakemakeVerify:
which pythonpip install -r requirements.txtipython kernel install --name "gpep_snakemake" --userbrew install nco
brew install graphviz # optional, for DAG visualizationOptional testing via Jupyter Notebook:
cd notebooks/
jupyter notebookFollow instructions in the included example notebook for verifying your environment.
- Tang, G, AW Wood, AJ Newman, MP Clark, and SM Papalexiou, 2023. GPEP v1.0: a Geospatial Probabilistic Estimation Package to support Earth Science applications. Geosci. Mod. Dev. https://doi.org/10.5194/gmd-2023-172, 2024.
If this repository is relevant or informative to your research, please cite:
