This is a Udacity machine learning nanodegree project deliverable, please use in accordance with Udacity honor code.
- Implement unsupervised techniques to see what sort of patterns exist among existing customers, and what exactly makes them different.
- Review unstructured data to understand the patterns and natural categories that the data fits into.
- Use multiple algorithms and both empirically and theoretically compare and contrast their results.
- Make predictions about the natural categories of multiple types in a dataset, then check these predictions against the result of unsupervised analysis.
The following SW was used in the first part of the project:
- Python 2.7
- NumPy
- scikit-learn
- pandas
- matplotlib
- iPython Notebook
In the last part of this project, R was used as an EDA tool:
- R 3.2.3
- ggplot
- ggbiplot
The dataset refers to clients of a wholesale distributor. It includes the annual spending in monetary units (m.u.) on diverse product categories. It is part of a larger database published with the following paper:
Abreu, N. (2011). Analise do perfil do cliente Recheio e desenvolvimento de um sistema promocional. Mestrado em Marketing, ISCTE-IUL, Lisbon.
Final report and IPython notebook are included in this repository. IPython notebook is straightforward to use, please refer to http://cs231n.github.io/ipython-tutorial/ for a quick tutorial.