Python notebooks to accompany the paper A Tutorial on Ensembles in Machine Learningwith Python Examples
.
The notebooks are as follows:
Ensembles-Preliminaries
: Code demonstrating the impact of ensemble size and diversity on accuracy.Ensembles-Bagging
: Code for bagging and random subspace ensembles.Ensembles-RandomF
: Using random forest to generate feature importance scores and OOB estimates of generalisation accuracy.Ensembles-Boosting
: A simple AdaBoost example to illustrate the internal workings.Ensembles=GBoost
: Gradient Boosting compared with other ensemble methods.Ensembles-Hetero
: A heterogeneous ensemble with 7 estimators complared with a bagging ensemble.Ensemble-Stacking
: A comparison of a heterogeneous with some stacking alternatives.
ensemble_functions.py
: Some helper functions for ensembles.
The HotelRevHelpfulness
and AthleteSelection
datasets are made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/
The wine
dataset is covered by the copyright provisions of the UCI repository.
wine.csv
: The wine dataset from the UCI repository.HotelRevHelpfulness.csv
: The hotel review dataset - see: https://researchrepository.ucd.ie/handle/10197/1894AthleteSelection.csv
: A toy dataset with just two predictive features.