Skip to content

Commit 2652f20

Browse files
committed
labeling_functions
1 parent 65669dc commit 2652f20

File tree

23 files changed

+1093
-53
lines changed

23 files changed

+1093
-53
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,3 +73,6 @@ docs/reference
7373
docs/changelog.md
7474
docs/contributing.md
7575
.vercel
76+
77+
# Work development
78+
dev/*

changelog_unreleased.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
### Added
2+
- New `edsnlp.external_information_qualifier` qualifies spans in a document based on external information and a defined distance to these contextual/external elements as in Distant Supervision.
3+
- New `eds.contextual_qualifier` pipeline component to qualify spans based on contextual information.
4+
- Add the fixture `edsnlp_blank_nlp` for the test.
5+
6+
### Fixed
7+
- Correct the contributing documentation. Delete `$ pre-commit run --all-files`recommendation.
8+
- Fix the the `Obj Class` in the doc template `class.html`.
9+
- Fix the `get_pipe_meta` function.

contributing.md

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -43,16 +43,7 @@ $ pre-commit install
4343

4444
The pre-commit hooks defined in the [configuration](https://github.com/aphp/edsnlp/blob/master/.pre-commit-config.yaml) will automatically run when you commit your changes, letting you know if something went wrong.
4545

46-
The hooks only run on staged changes. To force-run it on all files, run:
47-
48-
<div class="termy">
49-
50-
```console
51-
$ pre-commit run --all-files
52-
---> 100%
53-
color:green All good !
54-
```
55-
46+
The hooks only run on staged changes.
5647
</div>
5748

5849
## Proposing a merge request
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# External Information & Context qualifiers
2+
3+
This tutorial shows the use of two pipes to qualify spans or entities by using the `ContextualQualifier` and the `ExternalInformationQualifier`
4+
5+
### Import dependencies
6+
```python
7+
import datetime
8+
9+
import pandas as pd
10+
11+
import edsnlp
12+
from edsnlp.pipes.qualifiers.contextual.contextual import (
13+
ClassPatternsContext,
14+
ContextualQualifier,
15+
)
16+
from edsnlp.pipes.qualifiers.external_information.external_information import (
17+
ExternalInformation,
18+
ExternalInformationQualifier,
19+
)
20+
from edsnlp.utils.collections import get_deep_attr
21+
```
22+
23+
### Data
24+
Lets start creating a toy example
25+
```python
26+
# Create context dates
27+
# The elements under this attribute should be a list of dicts with keys value and class
28+
context_dates = [
29+
{
30+
"value": datetime.datetime(2024, 2, 15),
31+
"class": "Magnetic resonance imaging (procedure)",
32+
},
33+
{"value": datetime.datetime(2024, 2, 17), "class": "Biopsy (procedure)"},
34+
{"value": datetime.datetime(2024, 2, 17), "class": "Colonoscopy (procedure)"},
35+
]
36+
37+
# Texy
38+
text = """
39+
RCP du 18/12/2024 : DUPONT Jean
40+
41+
Homme de 68 ans adressé en consultation d’oncologie pour prise en charge d’une tumeur du colon.
42+
Antécédents : HTA, diabète de type 2, dyslipidémie, tabagisme actif (30 PA), alcoolisme chronique (60 g/jour).
43+
44+
Examen clinique : patient en bon état général, poids 80 kg, taille 1m75.
45+
46+
47+
HISTOIRE DE LA MALADIE :
48+
Lors du PET-CT (14/02/2024), des dépôts pathologiques ont été observés qui coïncidaient avec les résultats du scanner.
49+
Le 15/02/2024, une IRM a été réalisée pour évaluer l’extension de la tumeur.
50+
Une colonoscopie a été réalisée le 17/02/2024 avec une biopsie d'adénopathie sous-carinale.
51+
Une deuxième a été biopsié le 18/02/2024. Les résultats de la biopsie ont confirmé un adénocarcinome du colon.
52+
Il a été opéré le 20/02/2024. L’examen anatomopath ologique de la pièce opératoire a confirmé un adénocarcinome du colon stade IV avec métastases hépatiques et pulmonaires.
53+
Trois mois après la fin du traitement de chimiothérapie (abril 2024), le patient a signalé une aggravation progressive des symptômes
54+
55+
CONCLUSION : Adénocarcinome du colon stade IV avec métastases hépatiques et pulmonaires.
56+
"""
57+
58+
59+
# Create a toy dataframe
60+
df = pd.DataFrame.from_records(
61+
[
62+
{
63+
"person_id": 1,
64+
"note_id": 1,
65+
"note_text": text,
66+
"context_dates": context_dates,
67+
}
68+
]
69+
)
70+
df
71+
```
72+
73+
### Define the nlp pipeline
74+
```python
75+
import edsnlp.pipes as eds
76+
77+
nlp = edsnlp.blank("eds")
78+
79+
nlp.add_pipe(eds.sentences())
80+
nlp.add_pipe(eds.normalizer())
81+
nlp.add_pipe(eds.dates())
82+
83+
84+
nlp.add_pipe(
85+
ContextualQualifier(
86+
span_getter="dates",
87+
patterns={
88+
"lf1": {
89+
"Magnetic resonance imaging (procedure)": ClassPatternsContext(
90+
**{
91+
"terms": {"irm": ["IRM", "imagerie par résonance magnétique"]},
92+
"regex": None,
93+
"context_words": 0,
94+
"context_sents": 1,
95+
"attr": "TEXT",
96+
}
97+
)
98+
},
99+
"lf2": {
100+
"Biopsy (procedure)": {
101+
"regex": {"biopsy": ["biopsie", "biopsié"]},
102+
"context_words": (10, 10),
103+
"context_sents": 0,
104+
"attr": "TEXT",
105+
}
106+
},
107+
"lf3": {
108+
"Surgical procedure (procedure)": {
109+
"regex": {"chirurgie": ["chirurgie", "exerese", "opere"]},
110+
"context_words": 0,
111+
"context_sents": (2, 2),
112+
"attr": "NORM",
113+
},
114+
},
115+
},
116+
)
117+
)
118+
119+
nlp.add_pipe(
120+
ExternalInformationQualifier(
121+
nlp=nlp,
122+
span_getter="dates",
123+
external_information={
124+
"lf4": ExternalInformation(
125+
doc_attr="_.context_dates",
126+
span_attribute="_.date.to_datetime()",
127+
threshold=datetime.timedelta(days=0),
128+
)
129+
},
130+
)
131+
)
132+
```
133+
134+
### Apply the pipeline to texts
135+
```python
136+
doc_iterator = edsnlp.data.from_pandas(
137+
df, converter="omop", doc_attributes=["context_dates"]
138+
)
139+
140+
docs = list(nlp.pipe(doc_iterator))
141+
```
142+
143+
### Lets inspect the results
144+
```python
145+
doc = docs[0]
146+
dates = doc.spans["dates"]
147+
148+
for date in dates:
149+
for attr in ["lf1", "lf2", "lf3", "lf4"]:
150+
value = get_deep_attr(date, "_." + attr)
151+
152+
if value:
153+
print(date.start, date.end, date, attr, value)
154+
```
155+
156+
```python
157+
# Out : 120 125 15/02/2024 lf1 Magnetic resonance imaging (procedure)
158+
# Out : 120 125 15/02/2024 lf4 ['Magnetic resonance imaging (procedure)']
159+
# Out : 147 152 17/02/2024 lf2 Biopsy (procedure)
160+
# Out : 147 152 17/02/2024 lf4 ['Biopsy (procedure)', 'Colonoscopy (procedure)']
161+
# Out : 168 173 18/02/2024 lf2 Biopsy (procedure)
162+
# Out : 192 197 20/02/2024 lf3 Surgical procedure (procedure)
163+
```
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Contextual {: #edsnlp.pipes.qualifiers.contextual.factory.create_component }
2+
3+
::: edsnlp.pipes.qualifiers.contextual.factory.create_component
4+
options:
5+
heading_level: 2
6+
show_bases: true
7+
show_source: true
8+
only_class_level: true
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# External Information {: #edsnlp.pipes.qualifiers.external_information.factory.create_component }
2+
3+
::: edsnlp.pipes.qualifiers.external_information.factory.create_component
4+
options:
5+
heading_level: 2
6+
show_bases: true
7+
show_source: true
8+
only_class_level: true

edsnlp/core/pipeline.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,7 @@ def get_pipe_meta(self, name: str) -> FactoryMeta:
338338
Dict[str, Any]
339339
"""
340340
pipe = self.get_pipe(name)
341-
return PIPE_META.get(pipe, {})
341+
return PIPE_META.get(pipe, FactoryMeta([], [], False, {}))
342342

343343
def make_doc(self, text: str) -> Doc:
344344
"""

edsnlp/pipes/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,8 @@
7474
from .qualifiers.negation.factory import create_component as negation
7575
from .qualifiers.reported_speech.factory import create_component as reported_speech
7676
from .qualifiers.reported_speech.factory import create_component as rspeech
77+
from .qualifiers.contextual.factory import create_component as contextual_qualifier
78+
from .qualifiers.external_information.factory import create_component as external_information_qualifier
7779
from .trainable.ner_crf.factory import create_component as ner_crf
7880
from .trainable.biaffine_dep_parser.factory import create_component as biaffine_dep_parser
7981
from .trainable.extractive_qa.factory import create_component as extractive_qa

edsnlp/pipes/qualifiers/contextual/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)