ocean-pipeline

This repo provides pipelines to consume several different data sources and clean, interpolate, and derrive quantities of interest in preparation for consumption for a downstream consumer.

Overview

Pipelines in this repo typically consist of a few steps:

Data acquisition / sorting: download links, scripts or other info for getting the raw data from an upstream provider, and organizing in a rational manner (typically one subdirectory per month of raw data).
Processing pipeline: a series of serially dependent slurm scripts that manage the main transforms, typically data cleaning and selection -> derived variable construction -> interpolation and integration -> downsampling -> output format construction.

As is generically true for essentially all data pipelines, provenance is crucial. Provenance records will look a bit different for each pipeline, but make sure to keep records of, at least:

how to acquire the original upstream data (DOIs are the gold standard)
what parameters were used in the pipeline
git hash or release of the codebase of this repo reflecting its exact state when the pipeline was ran.

the provenance/ subdirectory here is an appropriate place for these records.

Argo netCDF -> localGP

Argo's GDACs provide the complete Argo dataset as netCDF files; they also publish a DOI-stamped release regularly. Prep this data for localGP as follows:

Data sorting

After downloading the DOI of interest or rsyncing one of the GDACs, sort the profile netCDF files into folders organized by month:

The Argo DOI zips core and BGC profiles separately; at the time of writing, localgp-input only considers core profiles and assumes only the core archives are unzipped.
Sort handled by sort_argonc.py, see top of that file for usage instructions.
Slurm it with sort_argonc.slurm if desired; those who feel fancy could write a loop over DACs (aoml bodc csio incois kma meds coriolis csiro jma kordi nmdis), or even parallelize at the sub-DAC level (aoml takes forever).
Note this copies, and does not move, profile .nc, which means you'll need enough disk to accommodate.
Will create subdirectories YYYY_MM for each month of data under your target location.

Processing pipeline

Once input netCDF files are sorted by month, pipeline4localgp.sh supports preparing these files for consumption by localGP. Start by setting appropriate variables for this run in the block at the top. This script launches an appropriate pipeline of jobs on a slurm-managed cluster.

Argovis JSON -> localGP

Preparing Argo data as represented by Argovis for consumption by localGP proceeds as follows:

Data acquisition

argovis-dl.sh will manage the download of Argo data from Argovis month by month; data will be placed in per-month folders in the same directory as the download script.

Processing pipeline

pipeline4localgp.sh works similarly for Argovis as it does for Argo netCDF files; choose appropriate parameters in the block at the top, and then loop over months to generate viable localGP inputs.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.github/workflows		.github/workflows
helpers		helpers
provenance		provenance
tests		tests
validation		validation
Dockerfile		Dockerfile
README.md		README.md
argonc.slurm		argonc.slurm
argonc_input.py		argonc_input.py
argovis-dl.sh		argovis-dl.sh
argovis.slurm		argovis.slurm
argovis_input.py		argovis_input.py
derivedvar.slurm		derivedvar.slurm
derivedvar_input.py		derivedvar_input.py
downsample.py		downsample.py
downsample.slurm		downsample.slurm
integrate.py		integrate.py
integrate.slurm		integrate.slurm
interpolate.py		interpolate.py
interpolate.slurm		interpolate.slurm
matlab4derivedvar.py		matlab4derivedvar.py
matlab4derivedvar.slurm		matlab4derivedvar.slurm
matlab4localgp.py		matlab4localgp.py
matlab4localgp.slurm		matlab4localgp.slurm
pipeline4derivedvar.sh		pipeline4derivedvar.sh
pipeline4localgp.sh		pipeline4localgp.sh
sort_argonc.py		sort_argonc.py
sort_argonc.slurm		sort_argonc.slurm
summarize_derivedvar.py		summarize_derivedvar.py
summarize_derivedvar.slurm		summarize_derivedvar.slurm
variable_creation.py		variable_creation.py
variable_creation.slurm		variable_creation.slurm
wod.slurm		wod.slurm
wod_input.py		wod_input.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ocean-pipeline

Overview

Argo netCDF -> localGP

Data sorting

Processing pipeline

Argovis JSON -> localGP

Data acquisition

Processing pipeline

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

argovis/ocean_pipeline

Folders and files

Latest commit

History

Repository files navigation

ocean-pipeline

Overview

Argo netCDF -> localGP

Data sorting

Processing pipeline

Argovis JSON -> localGP

Data acquisition

Processing pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages