Code to reproduce results from the paper:
CROWDLAB: Supervised learning to infer consensus labels and quality scores for data with multiple annotators
NeurIPS 2022 Human in the Loop Learning Workshop
This repository benchmarks algorithms that estimate:
- A consensus label for each example that aggregates the individual annotations.
- A confidence score for the correctness of each consensus label.
- A rating for each annotator which estimates the overall correctness of their labels.
This repository is only for intended for scientific purposes. To apply the CROWDLAB algorithm to your own multi-annotator data, you should instead use the implementation from the official cleanlab library.
Code to benchmark methods for active learning with multiple data annotators can be found in the active_learning_benchmarks folder.
To run the model training and benchmark, you need to install the following dependencies:
pip install ./cleanlab
pip install ./crowd-kit
pip install -r requirements.txt
Note that our cleanlab/ and crowd-kit/ folders here contain forks of the cleanlab and crowd-kit libraries. These forks differ from the main libraries as follows:
- The
cleanlabfork contains various multi-annotator algorithms studied in the benchmark (to obtain consensus labels and compute consensus and annotator quality scores) that are not present in the main library. - The
crowd-kitfork addresses some numeric underflow issues in the original library (needed for properly ranking examples by their quality). Instead of operating directly on probabilities, our fork does calculations on log-probabilities with the log-sum-exp trick.
To benchmark various multi-annotator algorithms using given predictions from already trained classifier models, run the following notebooks:
- benchmark.ipynb - runs the benchmarks and saves results to csv
- benchmark_results_[...].ipynb - visualize benchmark results in plots
To generate the multi-annotator datasets and train the image classifier considered in our benchmarks, run the following notebooks:
- preprocess_data.ipynb - preprocesses the dataset
- create_labels_df.ipynb - generates correct absolute label paths for images in preprocessed data
- xval_model_train.ipynb / xval_model_train_perfect_model.ipynb - trains a model and obtains predicted class probabilities for each image