Skip to content

gitony0101/aquaculture-risk-forecasting

Repository files navigation

Aquaculture Risk Forecasting

Overview

Time-series forecasting and risk-scoring workflow for dissolved oxygen (DO) monitoring using water-quality sensor data, lag features, rolling statistics, and baseline/tree-based machine learning models. Provides 24-hour risk alerts and aeration decision support to prevent fish mortality.

Project Structure

Aquaculture_RiskForecasting/
├── data/
│   ├── raw/             # Unprocessed sensor and API data
│   └── processed/       # Cleaned features and windowed datasets
├── notebooks/           # EDA, model training, evaluation
├── src/
│   ├── data_ingestion/  # USGS, NEON, on-site sensor loaders
│   ├── features/        # Lag features, rolling stats, weather covariates
│   ├── models/          # Baseline and advanced forecasting models
│   ├── evaluation/      # Risk threshold analysis, alert scoring
│   └── dashboard/       # Flask/Streamlit UI for forecasts and alerts
├── reports/             # Technical reports and planning documents
└── assets/              # Images, diagrams, and external resources

MVP Scope

  • End-to-end pipeline: data ingestion → features → model → forecast
  • 24-hour ahead dissolved oxygen prediction
  • Risk alerts when DO falls below 4 mg/L
  • Simple dashboard for farm managers

Key Files

  • data_dictionary.md: Variable definitions and source tracking
  • methodology.md: Modeling approach and evaluation framework
  • reports/mvp_plan.md: Detailed task breakdown and timeline

Data Sources

  • USGS Water Data API: Temperature, flow, dissolved oxygen
  • NEON Water Quality: Dissolved oxygen, turbidity, pH
  • On-site sensors (optional): Timestamped DO, temperature, salinity

Models

  • Persistence (naive forecast baseline)
  • Linear Regression
  • Random Forest
  • Gradient Boosting

Risk Thresholds

  • Critical: DO < 3 mg/L — Immediate aeration required
  • Warning: DO 3–4 mg/L — Prepare aeration systems
  • Normal: DO > 4 mg/L — No action needed

Dashboard MVP (Streamlit)

The Streamlit MVP dashboard visualizes observed dissolved oxygen and 24‑hour forecasts from the GradientBoostingRegressor model, applying the two‑tier risk framework (warning < 6.0 mg/L, critical < 5.0 mg/L).

Prerequisites

  • Python 3.10+
  • Install required packages (already part of the project environment):
    pip install streamlit pandas matplotlib
  • Ensure the forecast CSV files are present (generated by the tree‑forecast pipeline).

Running the dashboard

streamlit run src/dashboard/app.py

The app launches locally at http://localhost:8501.

Expected input files

  • data/processed/tree_forecast_predictions_01638500_2023_oct_2025_mar_6.0.csv (warning threshold)
  • data/processed/tree_forecast_predictions_01638500_2023_oct_2025_mar_5.0.csv (critical threshold)
  • Corresponding metrics CSVs (tree_forecast_metrics_…_6.0.csv and …_5.0.csv).

Dashboard sections

  1. Site & Threshold selector – Choose the USGS site and risk threshold.
  2. Observed DO plot – Time‑series of measured dissolved oxygen.
  3. Forecast DO plot – GradientBoosting 24‑hour forecast with a horizontal line indicating the selected threshold.
  4. Risk events table – Timestamps where the forecast risk flag (gb_risk) is true.
  5. Model comparison metrics – MAE, RMSE, R², precision, recall, F1 for all models (highlighting GradientBoosting).
  6. False‑positive / False‑negative summary – Counts based on forecast vs. actual risk.
  7. Decision recommendation panel – Concise guidance (e.g., “Prepare aeration”, “Activate aeration”).

Model used

The dashboard uses the GradientBoostingRegressor model (identified as the strongest performer in the tree‑forecast evaluation).

Warning vs Critical interpretation

  • Warning (DO < 6.0 mg/L) – Triggers a precautionary alert; operators should be ready to aerate.
  • Critical (DO < 5.0 mg/L) – Indicates imminent risk; immediate aeration is recommended.

Limitations

  • The dashboard reads static CSV files; it does not pull live USGS data.
  • Early timestamps may lack forecast values due to model warm‑up.
  • Only the James River site (01638500) is currently supported.
  • No deployment infrastructure (cloud, mobile) is provided yet.

Dashboard Screenshots

Dashboard Overview – Full layout showing site/threshold selectors, observed DO time‑series, 24‑hour forecast, and decision panels.

Dashboard Overview

Risk Alert Table – Timestamps where the forecast risk flag (gb_risk) is active for the selected threshold (6.0 mg/L warning / 5.0 mg/L critical).

Risk Alert Table

Model Comparison – Regression and classification metrics for Persistence, Linear Regression, Random Forest, and Gradient Boosting (the MVP model).

Model Comparison


Resume Bullet

Developed a time-series forecasting system for dissolved oxygen in aquaculture environments, comparing Persistence, Linear Regression, Random Forest, and Gradient Boosting models on a single USGS site. Best model (Random Forest) achieved F1=0.87 for risk classification (Warning threshold, 6.0 mg/L) with a two-tier warning/critical alert dashboard for proactive farm management.


License

MIT License


Disclaimer

Disclaimer: This repository is a portfolio and learning prototype. The projects are simplified research and engineering demonstrations, not production systems, commercial products, or certified decision-support tools. Results are based on synthetic, public, or simplified data and should not be used for real operational decisions without further validation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages