WaterPythonTemp

This repository includes the python code of four models that were used to predict the water temperature of 83 rivers with limiting forcing data (with 98% of missing data). The results of this study are described in the following manuscript: Almeida, M.C. and Coelho P.S.: Modeling river water temperature with limiting forcing data: air2stream v1.0.0, machine learning and multiple regression:

Random Forest (vide sklearn webpage)
Artificial Neural Network (Momentum algorithm) (vide neupy webpage)
Support Vector Regression (vide sklearn webpage)
Multiple Regression (vide sklearn webpage)
We have also included the hybrid air2stream (vide Toffolon and Piccolroaz, 2015). This benchmark model was used to make results comparable with other studies.

The machine learning models hyperparameter optimization was implemented with the Tree-structured Parzen Estimators algorithm (TPE) (Bergstra et al 2011). The python code implementation of TPE with the Hyperot algorithm (Bergstra et al 2013) is also available.

The raw training datasets were modified with an under/oversampling technique. 100 different training datasets are derived for each station from the initial dataset through the application of the Synthetic Minority Over-Sampling Technique for regression with Gaussian Noise (SMOGN) (Branco et al. 2017). The python code implementation of SMOGN is also available. This code applies the TPE algorith, SMOGN and runs a random forest regressor.

Additionaly, we have included the python code that was used to quantify the features importance with a random forest regressor (vide sklearn webpage). The random forest regressor with the following parameters: n_estimators = 50, max_depth = 485, min_samples_split = 5, max_features = 'auto', bootstrap = True; was the best performing model for stations with 98% of missing data. (vide Almeida and Coelho, 2022).

Input data

In the folder Input data we have included 83 input files. These files include the following nine columns:

Date (e.g. 10/24/1988 12:00:00 AM);
Observed water temperature,(°C);
Mean daily air temperature,(°C);
Discharge,(m³s^-1);
Mean daily Global radiation,(Jm^-2);
Maximum day air temperature,(°C);
Minimum day air temperature,(°C);
Month of the year (e.g. 1, 2, 3,..., 12);
Day of the year (e.g. 1, 2, 3,..., 365).

Hyperparameter optimization

It is easy to find the model parameters in the code. Nonetheless, in the folowing table we have included the models parameters that are optimized with the TPE algorithm.

Table1. Model parameters and optimization range

Model	Prior distribution	Parameter	Optimization range
Random Forest	uniform	'n_estimators'	[50, 2000]
Random Forest	uniform	'max_depth'	[10, 1000]
Random Forest	uniform	'min_samples_split'	[2, 10]
Random Forest	-	'max_features'	[auto, sqrt]
Random Forest	-	'bootstrap'	[True, False]
ANN	categorical	'n_layers'	[1, 2]
ANN	uniform integer	'n_units_layer'	[10, 50]
ANN	categorical	'act_func_type'	['Relu', 'PRelu', 'Elu', 'Tanh', 'Sigmoid']
ANN	categorical	'regularization'	[True, False]
ANN	quantized distribution	'n_epochs'	With regularization: [500, 1000]; without regularization: [20, 300]
ANN	uniform	'dropout'	[0, 1.0]
ANN	loguniform	'batch_size'	[5, 20]
ANN	uniform	'initial_value'	[0.001, 0.1]
ANN	uniform	'reduction_freq'	[10, 200]
ANN	uniform	'decay_rate' (regularization)	[0.0001, 0.001]
SVR	Categorical	'C'	[0.1,1,100,1000]
SVR	Categorical	'kernel'	['rbf','poly','sigmoid','linear']
SVR	Categorical	'degree'	[1,2,3,4,5,6]
SVR	Categorical	'gamma'	[1, 0.1, 0.01, 0.001, 0.0001]
SVR	Categorical	'epsilon'	[0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10]

How to run the hyperoptimization algorithm

Install neupy from the neupy webpage;
Create an empty folder;
In this folder place the python code file (e.g. Hyper_ANN.py) and the input file (e.g. st1.xlsx); In the code file (e.g. Hyper_ANN.py) set the training and validation percentages of the dataset (e.g. train_size=0.7, test_size=0.3);
Run the code. The output includes: file with score for each model run; file with the parameters for each model run; file with the Mean Average Error (MAE) for the training dataset; file with the MAE for the validation dataset.

How to run the optimized models

Create an empty folder;
In this folder place the python code file (e.g. ANN.py) and the input file or files (e.g. st1.xlsx; st2.xlsx; st3.xlsx;...;st100.xlsx). In the code file (e.g. ANN.py.py) set the training and validation percentages of the dataset (e.g. train_size=0.7, test_size=0.3; Replace the model parameters with the value obtained in 4;
Run the code. The output includes: file with the predicted values for the training dataset (1-st1.xlsxtrain.xlsx) and a file with the predicted values for the testing dataset (2-st1.xlsxtest.xlsx).

How to run SMOGN

Install SMOGN from https://pypi.org/project/smogn/;
Create an empty folder;
In this folder place the python code file (e.g. Random_forest_Hyperopt_SMOGN.py) and the input file (e.g. st46.xlsx); In the code file (e.g. Random_forest_Hyperopt_SMOGN.py) set the training and validation percentages of the dataset (e.g. train_size=0.7, test_size=0.3);
Run the code. The output includes: file with the modified training dataset (e.g.st46.xlsxSMOGN_out0.xlsx); file with the SMOGN parameters for the 100 modified training dataset (st46.xlsxSMOGN_parameters_out99.xlsx); file with the parameters for the ML model run (st46.xlsxparameters0.csv); file with the Mean Average Error (MAE) and the Nash–Sutcliffe model efficiency coefficient (NSE) for the 100 modified training datasets (A-st46.xlsxmodel_out99.xlsx); file with the predicted values for the training dataset (st46.xlsxtrain.xlsx) and a file with the predicted values for the testing dataset (st46.xlsxtest.xlsx).

Feature importance with random forest regressor

Create an empty folder;
In this folder place the python code file (Random Forest_Feature_importance.py) and the input files (e.g. st1.xlsx; st2.xlsx; st3.xlsx;...;st100.xlsx). In the code file (Random Forest_Feature_importance.py) set the training and validation percentages of the dataset (e.g. train_size=0.7, test_size=0.3. Change the path to the output file (importance.csv).

References

Almeida, M.C. and Coelho P.S.: Modeling river water temperature with limiting forcing data,...

Bergstra, J. S., Bardenet, R., Bengio, Y. and Kegl, B.: Algorithms for hyper-parameter optimization, in Advances in Neural Information Processing Systems, 2011, 2546–2554, 2011.

Bergstra, J., Yamins, D., Cox, D. D.: Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. TProc. of the 30th International Conference on Machine Learning (ICML 2013), 115-23, 2013.

Branco, P., Ribeiro, R. P., Torgo, L., Krawczyk, B., Moniz, N.: Smogn: a pre-processing approach for imbalanced regression, Proceedings of Machine Learning Research 74, 36–50, 2017.

Toffolon, M. and Piccolroaz, S.: A hybrid model for river water temperature as a function of air temperature and discharge, types for water temperature prediction in rivers, Journal Hydrology 529, 302–315, https://doi.org/10.1016/j.jhydrol.2015.07.044, 2015.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Air2stream code		Air2stream code
Feature importance (Random forest regressor)		Feature importance (Random forest regressor)
Hyperparameter Optimization (TPE)		Hyperparameter Optimization (TPE)
Input data		Input data
Machine Learning Algorithms		Machine Learning Algorithms
SMOGN		SMOGN
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WaterPythonTemp

Input data

Hyperparameter optimization

Table1. Model parameters and optimization range

How to run the hyperoptimization algorithm

How to run the optimized models

How to run SMOGN

Feature importance with random forest regressor

References

About

Uh oh!

Releases 2

Packages

Languages

License

mcvta/WaterPythonTemp

Folders and files

Latest commit

History

Repository files navigation

WaterPythonTemp

Input data

Hyperparameter optimization

Table1. Model parameters and optimization range

How to run the hyperoptimization algorithm

How to run the optimized models

How to run SMOGN

Feature importance with random forest regressor

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages