Skip to content

varun-suresh/Clustering

Folders and files

NameName
Last commit message
Last commit date
Jul 27, 2019
Jan 28, 2018
Jan 21, 2018
Jun 25, 2017
Jul 10, 2017
Jun 16, 2017
Feb 2, 2018
Jul 27, 2019
Jul 27, 2019
Jul 27, 2019
Jan 23, 2025
Jul 27, 2019

Repository files navigation

Approximate Rank Order Clustering

This repository contains an implementation of this paper.

What's in this repository

clustering.py - Contains the implementaion of the clustering algorithm.

demo.py - An example to demonstrate usage. To run this, you need to download the LFW data from here. For the face vectors, I used the results from Alfred Xiang Wu's Face Verification Experiment. Also evaluates clustering on the LFW dataset using evaluation.py.

evaluation.py - Script to calculate pairwise precision and recall as explained in the paper. TODO

server.py - Script to visualize the results.

Setup

You will need cmake for this installation.

Step 1:

Create a new virtual environment and clone the repository.

mkvirtualenv (env-name)
workon (env-name)
git clone https:github.com/varun-suresh/Clustering.git

Step 2:

Follow the instructions here to install pyflann.

Step 3:

For the demo, download the LFW data and the face vectors as mentioned above and run

cd Clustering
python demo.py --lfw_path path_to_lfw_dir -v vector_file

Results

Visualization

There is a very basic visualization script in place to examine the clusters. To use the script, download the LFW images and store them in yourpath/Clustering/ directory.

Before you can run the visualization script, you must run the demo script to save the clusters. I have also uploaded the clusters file. You can download that and visualize the clusters as well.

python visualize.py --lfw_path lfw/

On your browser, open this link and you should see the clusters.

Clusters Page

Single Cluster

f1 score:

We get a f1 score of 0.88 ~ 0.9 on the LFW dataset.

Contributions

Thanks Mengyue for looking closely at the precision drop and correcting the error.

Timing:

Using python's multiprocessing module, clustering LFW faces took about ~40 seconds. I did this on an 8-core machine using 4 processes(Using all 8 does not improve it by much because some cores are needed for background processes). The same experiment took 7 seconds on a 20 core machine.

Citations

You should cite the following paper if you use the algorithm.

@ARTICLE{2016arXiv160400989O,
   author = {{Otto}, C. and {Wang}, D. and {Jain}, A.~K.},
    title = "{Clustering Millions of Faces by Identity}",
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1604.00989},

Face verification experiment

@article{wulight,
  title={A Light CNN for Deep Face Representation with Noisy Labels},
  author={Wu, Xiang and He, Ran and Sun, Zhenan and Tan, Tieniu}
  journal={arXiv preprint arXiv:1511.02683},
  year={2015}
}

If you use this implementation, please consider citing this implementation and code repository.