Skip to content

Commit 7a512d6

Browse files
Effectively Annotate Text Data for Transformers via Active Learning using Cleanlab (#63)
# What does this PR do? Demonstrating how to effectively annotate text data for Transformer models using active learning, specifically leveraging the Cleanlab open-source package. * Introduction to active learning and its importance in efficiently utilizing labeling efforts under budget constraints. * Implementation of the ActiveLab algorithm, which assists in prioritizing data for annotation based on the potential impact on model performance. This is particularly beneficial when dealing with noisy annotators, as it helps in deciding whether to seek additional annotations for previously labeled data or new data. * A detailed walkthrough on iteratively improving a text classification model by selecting the most impactful data points for annotation, retraining the model, and evaluating its performance. ## Who can review? @MKhalusova appreciate your review. --------- Co-authored-by: Maria Khalusova <[email protected]>
1 parent fdb688f commit 7a512d6

File tree

3 files changed

+2299
-1
lines changed

3 files changed

+2299
-1
lines changed

notebooks/en/_toctree.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
title: Create a legal preference dataset
3333
- local: semantic_cache_chroma_vector_database
3434
title: Implementing semantic cache to improve a RAG system.
35+
- local: annotate_text_data_transformers_via_active_learning
36+
title: Annotate text data using Active Learning with Cleanlab
3537
- local: llm_judge
3638
title: Using LLM-as-a-judge for an automated and versatile evaluation
37-

0 commit comments

Comments
 (0)