Skip to content

data-workspaces/buoy-data-analysis

 
 

Repository files navigation

Investigating Climate Change Through Buoy Data

A science fair project by Jason Fischer

Copyright 2019

In the media, there is a lot of discussion about global warming but not as much information on how it's computed. I was interested in this so I decided to do a science experiment to explore how to calculate climate change. NASA’s Earth Observatory concludes that there is an average global rise in temperature of between .15-0.20℃ per decade after studying data from as far back as 1880.

The National Oceanic and Atmospheric Administration (NOAA) has buoys out at sea that monitor air temperature (will be referred to as ATMP) and water temperature (will be referred to as WTMP) and other values but these are the important ones. These buoys are spread across the world and their data has been published from as far back as 1979. The data from these buoys are freely available on the web as text files. Using the data from NOAA, I can use code to determine the average, min, max, etc. over time for each buoy. I have selected four different buoys off the coast of the continental United States to analyze. One is off the coast of California, one is off the coast of Louisiana, one is off the coast of Maine, and one is off the coast of Virginia.

The project has been modified to use Data Workspaces to track experiments and data lineage.

File Layout

This project has the following subdirectories:

  • data - this directory contains the raw text files for buoys in subdirectories by buoy number.
  • intermediate-data - this directory contains the preprocessed data for each buoy in CSV format, one file per buoy.
  • code - the code for this project, including the preprocessor script and Jupyter notebooks.
  • results - the output charts for each buoy showing temperature change per decade, along with lineage metadata.

Installation

To set this up as a data workspace in its own Conda environment, run:

pip install dataworkspaces
dws clone [email protected]:data-workspaces/buoy-data-analysis.git
cd buoy-data-analysis
conda env create -f environment.yml
conda activate buoy-data-analysis

Execution

To run the experiments and take a snapshot for each buoy's results, run the following commands from the current directory:

# preprocess the data
for buoy in 42040 44005 44014 46026; do python3 ./code/preprocess.py $buoy;  done

# run notebooks and take snapshot after each notebook
cd code
for buoy in 42040 44005 44014 46026
do
    jupyter nbconvert --to notebook --execute anomaly-analysis-buoy-$buoy.ipynb --output anomaly-analysis-buoy-$buoy.ipynb
    dws snapshot -m "results from buoy $buoy"
done

Example DWS Commands

Here are some Data Workspaces commands you might try running in this workspace.

To see the snapshot history:

dws report history

To see the lineage for the snapshot buoy-44014-better-slope:

dws report lineage --snapshot buoy-44014-better-slope

After running a notebook, but before taking a snapshot, you can see the current lineage as follows:

dws report lineage

About

Data Workspaces Example - Climate Analysis of Buoy Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.2%
  • Python 0.8%