Python implementation of the Gap Statistic
Forked from original repository
Dynamically identify the suggested number of clusters in a data-set using the gap statistic.
- Correct dispersion formula (mean of log instead of log of mean)
- Compute gap statistic's standard deviation
- Add Scikit-learn KMeans and SphericalKMeans
- Scipy kmeans2 looks very unstable, that's why it's not the default algorithm anymore
Full example available in a notebook HERE
Bleeding edge:
pip install git+git://github.com/druogury/clustering-gap-statistic.git
PyPi:
pip install --upgrade gap-stat
pip uninstall gap-stat