Adicap : enhancement of regex to match local spelling #158

GuillaumePressiat · 2022-11-03T15:49:43Z

Description

In my hospital (CHU de Brest), ADICAP codes are written like this:


ADICAP :B.H.HP.A7A0

Cotations :
ZZQX217      R-AHC-100-A001 R-AHC-10-A015

In this case dots spells adicap structure and dictionnaries for (d1-d8) part of code.

Your regex in adicap ner is without dots, here

Are you ok if I propose this modified regex?

just add 3 conditionnal dots \.{0,1} in d1_4 = r"[A-Z]\.{0,1}[A-Z]\.{0,1}[A-Z]{2}\.{0,1}"

d1_4 = r"[A-Z]\.{0,1}[A-Z]\.{0,1}[A-Z]{2}\.{0,1}"
d5_8_v1 = r"\d{4}"
d5_8_v2 = r"\d{4}|[A-Z][0-9A-Z][A-Z][0-9]"
d5_8_v3 = r"[0-9A-Z][0-9][09A-Z][0-9]"
d5_8_v4 = r"0[A-Z][0-9]{2}"


adicap_prefix = r"(?i)(codification|adicap)"
base_code = (
    r"("
    + d1_4
    + r"(?:"
    + d5_8_v1
    + r"|"
    + d5_8_v2
    + r"|"
    + d5_8_v3
    + r"|"
    + d5_8_v4
    + r"))"
)

test :

Many thanks

The text was updated successfully, but these errors were encountered:

percevalw · 2022-11-04T08:38:01Z

Hi @GuillaumePressiat, thanks for this feedback! Of course, feel free to make a PR to improve this pattern!

GuillaumePressiat · 2022-11-07T09:47:31Z

Hi @percevalw, thanks for your answer.
I will do it a bit later.

Nice module by the way. (I was at "DIM siege AP-HP" when nlp-segmenter and uima pipelines were developed (@parisni and co.) and I had tried to participate few improvements on sections definition (edition of file sections.csv)).

percevalw · 2022-11-07T10:58:49Z

I see! The eds.section extraction module was partly inspired by earlier work at APHP's EDS, maybe you can find some of your contributions there :)

etienneguevel · 2022-11-24T16:30:11Z

Hi @GuillaumePressiat! Thanks for the feedback :)
I think that the regex you mentionned should do the trick, but the newly detected ADICAP codes will not be decoded correctly.
Modifying the decode function in the class ADICAP like:

def decode(self, code):

        code = code.replace(".", "")
        exploded = list(code) 
        adicap = AdicapCode(
            code=code,
            sampling_mode=self.decode_dict["D1"]["codes"].get(exploded[0]),
            technic=self.decode_dict["D2"]["codes"].get(exploded[1]),
            organ=self.decode_dict["D3"]["codes"].get("".join(exploded[2:4])),
        )

        for d in ["D4", "D5", "D6", "D7"]:
            adicap_short = self.decode_dict[d]["codes"].get("".join(exploded[4:8]))
            adicap_long = self.decode_dict[d]["codes"].get("".join(exploded[2:8]))

            if (adicap_short is not None) | (adicap_long is not None):
                adicap.pathology = self.decode_dict[d]["label"]
                adicap.behaviour_type = self.decode_dict[d]["codes"].get(exploded[5])

                if adicap_short is not None:
                    adicap.pathology_type = adicap_short

                else:
                    adicap.pathology_type = adicap_long

        return adicap

should solve this issue!

GuillaumePressiat · 2023-01-08T17:04:10Z

Hello @etienneguevel,

thanks for the tip!

I've modified the two scripts here (patterns.py and adicap.py).
And then install edsnlp as mentioned in the docs.
When I try to detect Codification : B.H.HP.A7A0 for instance it doesn't work yet.

Today I doesn't see where the thing is.
Can somebody take a look and help me?

Many thanks

etienneguevel · 2023-01-11T14:59:55Z

Hello @GuillaumePressiat,
You're welcome!

I've looked for the reasons your modifications didn't lead to the expected results, and found that there is an issue between the model used for the ADICAP pipeline (eds.contextual-matcher) and the way the edsnlp sentencizer cuts the ADICAP codes like "B.H.HP.A7A0".

The model used look like this :

import spacy
info = dict(
    source="adicap",
    regex=r"(?i)(codification|adicap)",
    regex_attr="TEXT",
    assign=[
        dict(
            name="code",
            regex=base_code,
            window=(-100,100),
            replace_entity=True,
            reduce_mode=None,
        ),
    ]
)

nlp = spacy.blank("eds")
nlp.add_pipe("eds.normalizer")
nlp.add_pipe("eds.sentences")

nlp.add_pipe("eds.contextual-matcher",
    name="adicap",
    config=dict(
        patterns = [info]
    ),
)
print(nlp("Codification : B.H.HP.A7A0").ents)
()

print(nlp("ADICAP: B.H.HP.A7A0".replace(".", "")).ents)
(BHHPA7A0,)

I've made an issue to describe how the sentencizer deal with codes like your ADICAP example at : #178

The eds.sentences pipelines is currently being reworked (#177), and there should be a modification that would solve the explosion of the ADICAP codes into several sentences.

GuillaumePressiat · 2023-01-11T22:12:39Z

Hello @etienneguevel,

Thank you for the feedback! It's quite logical indeed.

For now I've just removed all dots in my anapath documents and the basic eds.adicap pipeline works just fine.

Thanks for the other issue related to this (eds.sentences cutting codes in different sentences). I will follow this!

Guillaume

percevalw · 2023-03-07T16:58:51Z

Hi @GuillaumePressiat, the ADICAP matcher should now work (in the master branch) without having to modify the text upstream. Please let us know if you still have issues with this component ! :)

GuillaumePressiat · 2023-03-07T19:41:16Z

Hi @percevalw,

It's ok now!

Thank you very much!

Guillaume

percevalw added the enhancement New feature or request label Nov 4, 2022

percevalw added a commit that referenced this issue Mar 1, 2023

feat: update adicap patterns to match dot formats as in #158

63b835b

percevalw added a commit that referenced this issue Mar 1, 2023

feat: update adicap patterns to match dot formats as in #158

a4bea46

percevalw mentioned this issue Mar 1, 2023

Add tokenization exceptions and detect some false positive EOS #192

Merged

3 tasks

percevalw added a commit that referenced this issue Mar 7, 2023

feat: update adicap patterns to match dot formats as in #158

26b07e0

percevalw closed this as completed Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adicap : enhancement of regex to match local spelling #158

Adicap : enhancement of regex to match local spelling #158

GuillaumePressiat commented Nov 3, 2022 •

edited

Loading

percevalw commented Nov 4, 2022

GuillaumePressiat commented Nov 7, 2022

percevalw commented Nov 7, 2022

etienneguevel commented Nov 24, 2022

GuillaumePressiat commented Jan 8, 2023

etienneguevel commented Jan 11, 2023

GuillaumePressiat commented Jan 11, 2023

percevalw commented Mar 7, 2023 •

edited

Loading

GuillaumePressiat commented Mar 7, 2023

Adicap : enhancement of regex to match local spelling #158

Adicap : enhancement of regex to match local spelling #158

Comments

GuillaumePressiat commented Nov 3, 2022 • edited Loading

Description

percevalw commented Nov 4, 2022

GuillaumePressiat commented Nov 7, 2022

percevalw commented Nov 7, 2022

etienneguevel commented Nov 24, 2022

GuillaumePressiat commented Jan 8, 2023

etienneguevel commented Jan 11, 2023

GuillaumePressiat commented Jan 11, 2023

percevalw commented Mar 7, 2023 • edited Loading

GuillaumePressiat commented Mar 7, 2023

GuillaumePressiat commented Nov 3, 2022 •

edited

Loading

percevalw commented Mar 7, 2023 •

edited

Loading