Getting started with Melusine

Let's run emergency detection with melusine :

Load a fake email dataset
Load a demonstration pipeline
Run the pipeline
- Apply email cleaning transformations
- Apply emergency detection

Input data

Email datasets typically contain information about:

Email sender
Email recipients
Email subject/header
Email body
Attachments data

The present tutorial only makes use of the body and header data.

	body	header
0	This is an ëmèrgénçy	Help
1	How is life ?	Hey !
2	Urgent update about Mr. Annoying	Latest news
3	Please call me now	URGENT

Code

A typical code for a melusine-based application looks like this :

--8<--
docs/docs_src/GettingStarted/tutorial001.py:simple_pipeline
--8<--

This tutorial uses one of the default pipeline configuration demo_pipeline. Melusine users will typically define their own pipeline configuration. See more in the Configurations tutorial{target=_blank}

Output data

The pipeline created extra columns in the dataset. Some columns are temporary variables required by detectors (ex: normalized_body) and some are detection results with direct business value (ex: emergency_result).

	body	header	normalized_body	emergency_result
0	This is an ëmèrgénçy	Help	This is an emergency	True
1	How is life ?	Hey !	How is life ?	False
2	Urgent update about Mr. Annoying	Latest news	Urgent update about Mr. Annoying	False
3	Please call me now	URGENT	Please call me now	True

Pipeline steps

Illustration of the pipeline used in the present tutorial :

---
title: Demonstration pipeline
---
flowchart LR
    Input[[Email]] --> A(Cleaner)
    A(Cleaner) --> C(Normalizer)
    C --> F(Emergency\nDetector)
    F --> Output[[Qualified Email]]

Loading

Cleaner : Cleaning transformations such as uniformization of line breaks (\r\n -> \n)
Normalizer : Text normalisation to delete/replace non utf8 characters (éöà -> eoa)
EmergencyDetector : Detection of urgent emails

!!! info This demonstration pipeline is kept minimal but typical pipelines include more complex preprocessing and a variety of detectors. For example, pipelines may contain:

- Email Segmentation : Split email conversation into unitary messages
- ContentTagging : Associate tags (SIGNATURE, FOOTER, BODY) to parts of messages
- Appointment detection : For exemple, detect "construction work will take place on 01/01/2024" as an appointment email.
- More on preprocessing in the [MelusineTransformers tutorial](02_MelusineTransformers.md){target=_blank}
- More on detectors in the [MelusineDetector tutorial](05a_MelusineDetectors.md){target=_blank}

Debug mode

End users typically want to know what lead melusine to a specific detection result. The debug mode generates additional explainability info.

--8<--
docs/docs_src/GettingStarted/tutorial002.py:debug_pipeline
--8<--

A new column debug_emergency is created.

	...	emergency_result	debug_emergency
0	...	True	[details_below]
1	...	False	[details_below]
2	...	False	[details_below]
3	...	True	[details_below]

Inspecting the debug data gives a lot of info:

text : Effective text considered for detection.
EmergencyRegex : melusine used an EmergencyRegex object to run detection.
match_result : The EmergencyRegex did not match the text
positive_match_data : The EmergencyRegex matched positively the text pattern "Urgent" (Required condition)
negative_match_data : The EmergencyRegex matched negatively the text pattern "Mr. Annoying" (Forbidden condition)
BLACKLIST : Detection groups can be defined to easily link a matching pattern to the corresponding regex. DEFAULT is used if no detection group is specified.

# print(df.iloc[2]["debug_emergency"])
{
  'text': 'Latest news\nUrgent update about Mr. Annoying'},
  'EmergencyRegex': {
    'match_result': False,
    'negative_match_data': {
      'BLACKLIST': [
        {'match_text': 'Mr. Annoying', 'start': 32, 'stop': 44}
      ]},
    'neutral_match_data': {},
    'positive_match_data': {
      'DEFAULT': [
        {'match_text': 'Urgent', 'start': 12, 'stop': 18}
      ]
    }
  }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

00_GettingStarted.md

00_GettingStarted.md

Getting started with Melusine

Input data

Code

Output data

Pipeline steps

Debug mode

Files

00_GettingStarted.md

Latest commit

History

00_GettingStarted.md

File metadata and controls

Getting started with Melusine

Input data

Code

Output data

Pipeline steps

Debug mode