-
Notifications
You must be signed in to change notification settings - Fork 0
Create function to assess valueset accuracy #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Create function to assess valueset accuracy #175
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #175 +/- ##
=======================================
Coverage 93.58% 93.58%
=======================================
Files 17 17
Lines 561 561
=======================================
Hits 525 525
Misses 36 36 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| [ | ||
| { | ||
| "example_idx": 1, | ||
| "k-run": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sample files are super helpful, thank you for adding them!
Is k-run always expected to be 1? Are you selecting 1 because we want to return the "top" result (which is 1 in the list of returned results until we have the re-ranker)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k is 1 here mostly because I was lazy in generating dummy data 😅
data/accuracy_evaluation/sample_data/evaluation_results_eval_results_snippet_with_loinc_codes.json for example has multiple k-runs (1, 3, 5, 10).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! haha makes sense 😆
Description
Creates a function that expects a small payload of data on the returned/expected LOINC to evaluate degree of accuracy in our algorithm. Definitions of how we are currently considering accuracy are outlined here: https://docs.google.com/document/d/1yA5NJ06mf1EfLZRmNrrNKopWL6ExMj-dPYKy8wlVDGs/edit?tab=t.0#heading=h.b1r0q3mit8hy
This updates additional parts of the code to add the expected LOINC to the example data. HOWEVER, open questions:
I have a dummy notebook with small edits to the
performance.ipynb(here: https://ml.azure.com/fileexplorerAzNB?wsid=/subscriptions/6848426c-8ca8-4832-b493-fed851be1f95/resourcegroups/dibbs-ttc-training/providers/Microsoft.MachineLearningServices/workspaces/dibbsttc&tid=28cf58df-efe8-4135-b2d1-f697ee74c00c&activeFilePath=Users/robert.a.mitchell/performance-copy.ipynb¬ebookPivot=0) that does the following:{"example_idx": 0, "query_input": "Hester Davis fall risk scale", "expected_label": "Hester Davis fall risk scale", "k": 10, "encoding_time_s": 0.32662534713745117, "search_time_s": 0.0002624988555908203, "expected_match": {"rank": null, "score": null, "is_correct_in_topk": false, "is_correct_top1": false}, "results": [{"rank": 1, "corpus_id": 934, "label": "Coronavirus anxiety scale", "loinc_type": "Order", "score": 0.8334473371505737}, {"rank": 2, "corpus_id": 70, "label": "Abbreviated Injury Scale panel AAAM", "loinc_type": "Order", "score": 0.8256902694702148}, {"rank": 3, "corpus_id": 886, "label": "Goal attainment scale - Reported", "loinc_type": "Order", "score": 0.8089544177055359}, {"rank": 4, "corpus_id": 22, "label": "17-Hydroxyprogesterone [Measurement] in DBS", "loinc_type": "Order", "score": 0.8040225505828857}, {"rank": 5, "corpus_id": 712, "label": "Bacterial susceptibility panel by Disk diffusion (KB)", "loinc_type": "Order", "score": 0.8019613027572632}, {"rank": 6, "corpus_id": 117, "label": "Active range of motion panel Quantitative", "loinc_type": "Order", "score": 0.7989861369132996}, {"rank": 7, "corpus_id": 166, "label": "ADL functional rehabilitation potential Set", "loinc_type": "Order", "score": 0.7988804578781128}, {"rank": 8, "corpus_id": 676, "label": "Cholinesterase activity panel - Serum or Plasma", "loinc_type": "Order", "score": 0.7974450588226318}, {"rank": 9, "corpus_id": 906, "label": "Centers for Environmental Health trace metals screen panel [Mass/volume] - Urine", "loinc_type": "Order", "score": 0.7957779169082642}, {"rank": 10, "corpus_id": 499, "label": "Anemia evaluation panel - Serum or Blood", "loinc_type": "Order", "score": 0.7948352098464966}]} {"example_idx": 1, "query_input": "F9 gene familial mut Doc analysis molecular genetics (Bld/Tiss)", "expected_label": "F9 gene familial mut analysis Molgen Doc (Bld/Tiss)", "k": 1, "encoding_time_s": 0.5327327251434326, "search_time_s": 0.0006182193756103516, "expected_match": {"rank": null, "score": null, "is_correct_in_topk": false, "is_correct_top1": false}, "results": [{"rank": 1, "corpus_id": 885, "label": "Glycosylation congenital disorders multigene analysis in Blood or Tissue by Molecular genetics method", "loinc_type": "Order", "score": 0.8319992423057556}]}I've only run it on a small fraction of 1/283 files. I think ideally we would have on the examples file we load in the LOINC ID and LOINC type so that we would be able to do a more comprehensive check of whether
Orderbut there's a 85% that's aObservationbut a 83% that's anOrder, we would in theory want to use the 83%?)For the sake of an end-to-end product, I've taken a small snippet of the data and ran the text fields against the LOINC API to get a LOINC ID. I've also rewritten the scripts for generating key-pairs for examples to include the LOINC codes so that in future runs we can skip a call to the LOINC API.
Related Issues
Closes #173
Additional Notes
The logic of the third-degree match is still a tad shaky. The sample data shows two different kinds: one where the LOINCs and OIDs differ but connect to the same condition and another where the LOINCs and OIDs differ but connect to several conditions, all the same.
Related to this code itself, I think the only other function we may want to add down the line is either a function to transition the data we need from the matching protocol into the right shape, but that should be relatively straightforward since this script really only needs two columns worth of data.
Checklist
Please review and complete the following checklist before submitting your pull request:
Checklist for Reviewers
Please review and complete the following checklist during the review process: