Benchmarking methods for classification data labeled by multiple annotators

Code to reproduce results from the paper:

CROWDLAB: Supervised learning to infer consensus labels and quality scores for data with multiple annotators
NeurIPS 2022 Human in the Loop Learning Workshop

This repository benchmarks algorithms that estimate:

A consensus label for each example that aggregates the individual annotations.
A confidence score for the correctness of each consensus label.
A rating for each annotator which estimates the overall correctness of their labels.

This repository is only for intended for scientific purposes. To apply the CROWDLAB algorithm to your own multi-annotator data, you should instead use the implementation from the official cleanlab library.

Code to benchmark methods for active learning with multiple data annotators can be found in the active_learning_benchmarks folder.

Install Dependencies

To run the model training and benchmark, you need to install the following dependencies:

pip install ./cleanlab
pip install ./crowd-kit
pip install -r requirements.txt

Note that our cleanlab/ and crowd-kit/ folders here contain forks of the cleanlab and crowd-kit libraries. These forks differ from the main libraries as follows:

The cleanlab fork contains various multi-annotator algorithms studied in the benchmark (to obtain consensus labels and compute consensus and annotator quality scores) that are not present in the main library.
The crowd-kit fork addresses some numeric underflow issues in the original library (needed for properly ranking examples by their quality). Instead of operating directly on probabilities, our fork does calculations on log-probabilities with the log-sum-exp trick.

Run Benchmarks

To benchmark various multi-annotator algorithms using given predictions from already trained classifier models, run the following notebooks:

benchmark.ipynb - runs the benchmarks and saves results to csv
benchmark_results_[...].ipynb - visualize benchmark results in plots

Generate Data and Train Classfier Model

To generate the multi-annotator datasets and train the image classifier considered in our benchmarks, run the following notebooks:

preprocess_data.ipynb - preprocesses the dataset
create_labels_df.ipynb - generates correct absolute label paths for images in preprocessed data
xval_model_train.ipynb / xval_model_train_perfect_model.ipynb - trains a model and obtains predicted class probabilities for each image

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
active_learning_benchmarks		active_learning_benchmarks
benchmark_results		benchmark_results
cleanlab		cleanlab
crowd-kit		crowd-kit
utils		utils
.gitignore		.gitignore
0_create_labels_df.ipynb		0_create_labels_df.ipynb
0_preprocess_data.ipynb		0_preprocess_data.ipynb
1_xval_model_train.ipynb		1_xval_model_train.ipynb
1_xval_model_train_perfect_model.ipynb		1_xval_model_train_perfect_model.ipynb
2_benchmark.ipynb		2_benchmark.ipynb
3_benchmark_results_complete.ipynb		3_benchmark_results_complete.ipynb
3_benchmark_results_truelabels.ipynb		3_benchmark_results_truelabels.ipynb
3_benchmark_results_uniform.ipynb		3_benchmark_results_uniform.ipynb
3_benchmark_results_worst.ipynb		3_benchmark_results_worst.ipynb
3_benchmark_results_worst_npw.ipynb		3_benchmark_results_worst_npw.ipynb
3_side_benchmark_results.ipynb		3_side_benchmark_results.ipynb
LICENSE		LICENSE
README.md		README.md
paper.pdf		paper.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmarking methods for classification data labeled by multiple annotators

Install Dependencies

Run Benchmarks

Generate Data and Train Classfier Model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

cleanlab/multiannotator-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Benchmarking methods for classification data labeled by multiple annotators

Install Dependencies

Run Benchmarks

Generate Data and Train Classfier Model

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages