Zero Shot Classification

Machine Learning is commonly used to classify data into pre-defined categories.

---
title: Email classification
---
flowchart LR
    Input[[Email]] --> X(((Classifier)))
    X --> A(Car)
    X --> B(Boat)
    X --> C(Housing)
    X --> D(Health)

Loading

Typically, to reach high classification performance, models need to be trained on context specific labeled data. Zero-shot classification is a type of classification that uses a pre-trained model and does not require further training on context specific data.

Tutorial intro

In this tutorial we want to detect insatisfaction in an email dataset. Let's create a basic dataset:

--8<--
docs/docs_src/BasicClassification/tutorial001.py:create_dataset
--8<--

	header	body
0	Dossier 123456	Merci beaucoup pour votre gentillesse et votre écoute !
1	Réclamation (Dossier 987654)	Bonjour, je ne suis pas satisfait de cette situation, répondez-moi rapidement svp!

Classify with Zero-Shot-Classification

The transformers library makes it really simple to use pre-trained models for zero shot classification.

--8<--
docs/docs_src/BasicClassification/tutorial001.py:transformers
--8<--

The classifier returns a score for the "positif" and "négatif" label for each input text:

[
    {
        'sequence': "Quelle belle journée aujourd'hui",
        'labels': ['positif', 'négatif'],
        'scores': [0.95, 0.05]
    },
    {
        'sequence': 'La marée est haute',
        'labels': ['positif', 'négatif'],
        'scores': [0.76, 0.24]
    },
    {'sequence': 'Ce film est une catastrophe, je suis en colère',
     'labels': ['négatif', 'positif'],
     'scores': [0.97, 0.03]
     }
]

Implement a Dissatisfaction detector

A full email processing pipeline could contain multiple models. Melusine uses the MelusineDetector template class to standardise how models are integrated into a pipeline.

--8<--
docs/docs_src/BasicClassification/tutorial001.py:detector_init
--8<--

The pre_detect method assembles the text that we want to use for classification.

--8<--
docs/docs_src/BasicClassification/tutorial001.py:pre_detect
--8<--

The detect method runs the classification model on the text.

--8<--
docs/docs_src/BasicClassification/tutorial001.py:detect
--8<--

The post_detect method applies a threshold on the prediction score to determine the detection result.

--8<--
docs/docs_src/BasicClassification/tutorial001.py:post_detect
--8<--

On top of that, the detector takes care of building debug data to make the result explicable.

Run detection

Putting it all together, we run the detector on the input dataset.

--8<--
docs/docs_src/BasicClassification/tutorial001.py:run
--8<--

As a result, we get a new column dissatisfaction_result with the detection result. We could have detection details by running the detector in debug mode.

	header	body	dissatisfaction_result
0	Dossier 123456	Merci beaucoup pour votre gentillesse et votre écoute !	False
1	Réclamation (Dossier 987654)	Bonjour, je ne suis pas satisfait de cette situation, répondez-moi rapidement svp!	True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

07_BasicClassification.md

07_BasicClassification.md

Zero Shot Classification

Tutorial intro

Classify with Zero-Shot-Classification

Implement a Dissatisfaction detector

Run detection

Files

07_BasicClassification.md

Latest commit

History

07_BasicClassification.md

File metadata and controls

Zero Shot Classification

Tutorial intro

Classify with Zero-Shot-Classification

Implement a Dissatisfaction detector

Run detection