This is the notebook and environment for the Workshop on Explorative Data Analysis and Clustering that was run at PyData Cardiff on 13th June 2019.
The topics covered include the following:
- K-Means Clustering
- Gaussian Mixture Models
- Metrics to assess the performance of clustering algorithms
- A brief introduction to Principle Component Analysis (PCA)
- Hierarchical Clustering and Dendrograms with Seaborn and Scipy
If you are familiar with using the Anaconda distribution, then this notebook can be run locally after all of the dependencies listed in the requirements.txt file are installed. However, this is not necessary if you just wish to work through the notebook. There are the 2 following options:
- Click on the following link to launch the notebook server in a Binder environment. Note that this could take a little time, as a Docker container is built to serve this in the browser.
- Use Google Colab. This option is even quicker
- Go to the following link to create a new Colab Notebook
- Select the
Githubtab - Paste in the following URL: https://github.com/pydatacardiff/pydata_cardiff_workshop1
- Then select:
eda_clustering_notebook.ipynb
- Select the
If you are not familiar with using these Notebooks, there are tutorials available in the Colab intro page, and another good resource for using Jupyter notebooks can be read at: realpython.com/jupyter-notebook-introduction/