Skip to content

CH-Earth/gpep_snakemake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

gpep_snakemake

Snakemake workflow for preparing data and managing runs for the Geospatial Probabilistic Estimation Package (GPEP)


Table of Contents


Overview

This repository contains a Snakemake workflow for setting up data preprocessing and run configuration for GPEP. The workflow structures data preparation and organizes run configurations in a transparent, reproducible manner.

The repository design follows a structure similar to other Snakemake-driven hydrometeorological workflows, including gpep_to_summa_snakemake.


Introduction to Snakemake

Snakemake is a workflow management system that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in Python style. It is widely used in bioinformatics for creating data analysis pipelines, but it's useful in any context where a series of steps (i.e., a workflow) needs to be performed on some input data to produce some output data. The official website can be located here.

Snakemake workflows consist of a set of rules, where each rule describes how to create a certain part of the output. The rules describe both what needs to be done (the actions or commands), and under what circumstances (the input files, output files, and conditions).

Example rule:

rule clip_dem:
    input:
        full_dem = full_dem
    output:
        domain_dem = domain_dem
    params:
        dem_bbox = config['dem_bbox']
    run:
        clip_tif_with_bbox(input.full_dem, params.dem_bbox, output.domain_dem)

Example Workflow DAG

Preprocessing of data is a key initial task that requires transparency and reproducibility. In the generated DAG below, the steps taken in GPEP data prep are shown visually.

This corresponds to the overall rule run_gpep_prep.smk.


Getting Started

1. Clone the Repository

git clone https://github.com/DaveCasson/gpep_snakemake.git
cd gpep_snakemake

2. Set Up a Virtual Environment (Optional)

Python 3.8+ is recommended.

Option 1: Using venv

python -m venv gpep_snakemake
source gpep_snakemake/bin/activate

Option 2: Using pyenv

brew install pyenv
brew install pyenv-virtualenv

pyenv install 3.9.16
pyenv virtualenv 3.9.16 gpep_snakemake
pyenv activate gpep_snakemake

Verify:

which python

3. Install Dependencies

pip install -r requirements.txt

4. Install as Jupyter Kernel

ipython kernel install --name "gpep_snakemake" --user

5. Install Additional Tools

brew install nco
brew install graphviz    # optional, for DAG visualization

7. Test a Simple Snakemake Workflow

Optional testing via Jupyter Notebook:

cd notebooks/
jupyter notebook

Follow instructions in the included example notebook for verifying your environment.


Related References

GPEP reference

  • Tang, G, AW Wood, AJ Newman, MP Clark, and SM Papalexiou, 2023. GPEP v1.0: a Geospatial Probabilistic Estimation Package to support Earth Science applications. Geosci. Mod. Dev. https://doi.org/10.5194/gmd-2023-172, 2024.

Citation

If this repository is relevant or informative to your research, please cite:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published