Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where does preprocessing use match.py #9

Open
mingyangligithub opened this issue Apr 19, 2023 · 4 comments
Open

Where does preprocessing use match.py #9

mingyangligithub opened this issue Apr 19, 2023 · 4 comments

Comments

@mingyangligithub
Copy link

I studied the code in preprocess folder. I could understand how description of the code and synonyms are combined. But I didn't find the result is used in the real training process. Because icd_dict{} generated in generate_data_new.ipynb isn't the one in match.py. Where is the result of match.py used? Or which file import match.py? Do I need to preprocess by myself according to the code in preprocess and then generate new data?

Thanks,
Best regards.

@GanjinZero
Copy link
Owner

I think match.py only used to generate synonyms embedding/icd_mimic3_random_sort.json. I have provided it.

@GanjinZero
Copy link
Owner

You do not need to rerun it, unless you want to train on another dataset with different ICD codes that you need.

@lynnolson
Copy link

There appears to be another source for synonyms besides UMLS's MRCONSO.RRF file (version 2024AA). For example, running preprocess/match.py generates 5 synonyms for E870.9 (Accidental cut, puncture, perforation or hemorrhage during unspecified medical care):

['accidental cut, puncture, perforation or hemorrhage during medical care',
'accidental cut, puncture, perforation, or hemorrhage during medical care',
'accidental cut, puncture, perforation or hemorrhage during unspecified medical care',
'accidental cut, puncture, perforation or hemorrhage during medical care (navigational concept)',
'accidental cut, puncture, perforation or haemorrhage during medical care’]

But embedding/icd_mimic3_random_sort.json file has 26!

['accidental cut, puncture, perforation or haemorrhage during medical care, nos (disorder)',
'accidental cut, puncture, perforation or hemorrhage during medical care (navigational concept)',
'acc cut in med care',
'accidental cut, puncture, perforation, or hemorrhage during medical care',
'surg.accid.-medical care nos',
'accidental cut, puncture, perforation or hemorrhage during medical care, (finding)',
'accidental cut, puncture, perforation or hemorrhage during medical care (finding)',
'accidental cut, puncture, perforation or haemorrhage during medical care',
'accidental cut, puncture, perforation or hemorrhage during medical care, nos (finding)',
'accidental cut, puncture, perforation or hemorrhage during medical care,',
"accidental cut, puncture, perforation ,h'ge medical care",
'surg.accid. medical care',
'accidental cut, puncture, perforation or hemorrhage during medical care, nos (navigational concept)',
'accidental cut, puncture, perforation or hemorrhage during medical care, (navigational concept)',
'accidental cut, puncture, perforation or haemorrhage during medical care,',
'accidental cut, puncture, perforation or hemorrhage during medical care',
'accidental cut, puncture, perforation or hemorrhage during medical care, nos',
'acc cut in med care nos',
"accid.cut/punct/perf/h'ge-med.",
"accid cut,puncture,perf,h'ge medical care",
"accid cut,puncture,perf,h'ge - medical care nos",
"accidental cut, puncture, perforation ,h'ge - medical care",
'accidental cut, puncture, perforation or haemorrhage during medical care, nos',
"accid.cut/punct/perf/h'ge med.",
'accidental cut, puncture, perforation or hemorrhage during unspecified medical care',
'accidental cut, puncture, perforation or haemorrhage during medical care, (disorder)']

One clearly corresponds to the short title, but where do the other ones come from? For example, "accid.cut/punct/perf/h'ge med"?

@GanjinZero
Copy link
Owner

We use the UMLS 2020AA release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants