Time-series forecasting and risk-scoring workflow for dissolved oxygen (DO) monitoring using water-quality sensor data, lag features, rolling statistics, and baseline/tree-based machine learning models. Provides 24-hour risk alerts and aeration decision support to prevent fish mortality.
Aquaculture_RiskForecasting/
├── data/
│ ├── raw/ # Unprocessed sensor and API data
│ └── processed/ # Cleaned features and windowed datasets
├── notebooks/ # EDA, model training, evaluation
├── src/
│ ├── data_ingestion/ # USGS, NEON, on-site sensor loaders
│ ├── features/ # Lag features, rolling stats, weather covariates
│ ├── models/ # Baseline and advanced forecasting models
│ ├── evaluation/ # Risk threshold analysis, alert scoring
│ └── dashboard/ # Flask/Streamlit UI for forecasts and alerts
├── reports/ # Technical reports and planning documents
└── assets/ # Images, diagrams, and external resources
- End-to-end pipeline: data ingestion → features → model → forecast
- 24-hour ahead dissolved oxygen prediction
- Risk alerts when DO falls below 4 mg/L
- Simple dashboard for farm managers
data_dictionary.md: Variable definitions and source trackingmethodology.md: Modeling approach and evaluation frameworkreports/mvp_plan.md: Detailed task breakdown and timeline
- USGS Water Data API: Temperature, flow, dissolved oxygen
- NEON Water Quality: Dissolved oxygen, turbidity, pH
- On-site sensors (optional): Timestamped DO, temperature, salinity
- Persistence (naive forecast baseline)
- Linear Regression
- Random Forest
- Gradient Boosting
- Critical: DO < 3 mg/L — Immediate aeration required
- Warning: DO 3–4 mg/L — Prepare aeration systems
- Normal: DO > 4 mg/L — No action needed
The Streamlit MVP dashboard visualizes observed dissolved oxygen and 24‑hour forecasts from the GradientBoostingRegressor model, applying the two‑tier risk framework (warning < 6.0 mg/L, critical < 5.0 mg/L).
- Python 3.10+
- Install required packages (already part of the project environment):
pip install streamlit pandas matplotlib
- Ensure the forecast CSV files are present (generated by the tree‑forecast pipeline).
streamlit run src/dashboard/app.pyThe app launches locally at http://localhost:8501.
data/processed/tree_forecast_predictions_01638500_2023_oct_2025_mar_6.0.csv(warning threshold)data/processed/tree_forecast_predictions_01638500_2023_oct_2025_mar_5.0.csv(critical threshold)- Corresponding metrics CSVs (
tree_forecast_metrics_…_6.0.csvand…_5.0.csv).
- Site & Threshold selector – Choose the USGS site and risk threshold.
- Observed DO plot – Time‑series of measured dissolved oxygen.
- Forecast DO plot – GradientBoosting 24‑hour forecast with a horizontal line indicating the selected threshold.
- Risk events table – Timestamps where the forecast risk flag (
gb_risk) is true. - Model comparison metrics – MAE, RMSE, R², precision, recall, F1 for all models (highlighting GradientBoosting).
- False‑positive / False‑negative summary – Counts based on forecast vs. actual risk.
- Decision recommendation panel – Concise guidance (e.g., “Prepare aeration”, “Activate aeration”).
The dashboard uses the GradientBoostingRegressor model (identified as the strongest performer in the tree‑forecast evaluation).
- Warning (DO < 6.0 mg/L) – Triggers a precautionary alert; operators should be ready to aerate.
- Critical (DO < 5.0 mg/L) – Indicates imminent risk; immediate aeration is recommended.
- The dashboard reads static CSV files; it does not pull live USGS data.
- Early timestamps may lack forecast values due to model warm‑up.
- Only the James River site (
01638500) is currently supported. - No deployment infrastructure (cloud, mobile) is provided yet.
Dashboard Overview – Full layout showing site/threshold selectors, observed DO time‑series, 24‑hour forecast, and decision panels.
Risk Alert Table – Timestamps where the forecast risk flag (gb_risk) is active for the selected threshold (6.0 mg/L warning / 5.0 mg/L critical).
Model Comparison – Regression and classification metrics for Persistence, Linear Regression, Random Forest, and Gradient Boosting (the MVP model).
Developed a time-series forecasting system for dissolved oxygen in aquaculture environments, comparing Persistence, Linear Regression, Random Forest, and Gradient Boosting models on a single USGS site. Best model (Random Forest) achieved F1=0.87 for risk classification (Warning threshold, 6.0 mg/L) with a two-tier warning/critical alert dashboard for proactive farm management.
MIT License
Disclaimer: This repository is a portfolio and learning prototype. The projects are simplified research and engineering demonstrations, not production systems, commercial products, or certified decision-support tools. Results are based on synthetic, public, or simplified data and should not be used for real operational decisions without further validation.


