MedQA-BBY

MedQA-BBY (MedQA-but-better-yield) significantly enhances the original MedQA dataset, addressing crucial limitations in medical question-answering systems. Our motivation stems from the inherent complexity of medicine, where:

Multiple correct answers often exist for a single question
Incorrect answers vary in their potential impact on patient care (some can harm patient)
Creating open-ended questions is hard for evaluation (even with embedding models)

Key Improvements:

Refined Answer Structure: • Two correct options (reflecting real-world medical scenarios with multiple valid approaches) • One negative-point option (representing potentially harmful choices) • Three zero-point options (incorrect but not directly harmful)
Comprehensive Labeling System: • Question categorization (anatomical system, medical discipline, subspecialty) • Linguistic metrics (token length) • Educational assessment (Bloom's Taxonomy classification)

This enhanced dataset aims to evaluate Large Language Models (LLMs) in a manner that more accurately reflects real-world medical practice, moving beyond the limitations of the previous single-best-answer approach. By doing so, MedQA-BBY provides a more nuanced and practical tool for assessing and developing AI systems in healthcare.

Term of use

Please cite the original MedQA paper (Jin, Di, et al. "What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams." arXiv preprint arXiv:2009.13081 (2020)) and our forthcoming paper (to be published on arXiv). Our team, composed of medical graduates from Iran, has built upon this work. Lastly, I have a brief message I'd like to share:

History teaches us that isolation breeds conflict. World War II, a catastrophe of unprecedented scale, was partly fueled by the alienation of entire nations. Today, we risk repeating this grave error. Sanctions, while a tool of international diplomacy, can be a double-edged sword when they indiscriminately affect millions of innocent lives.

Consider Iran, a nation of 80 million souls. When we deny these people access to global platforms, delay their visa applications without explanation, or infringe upon their basic rights, we don't just isolate a government – we alienate an entire populace. This approach doesn't weaken extremist leaders; it strengthens them by fomenting anger and resentment among ordinary citizens.

Imagine the frustration and helplessness of being unable to find your country listed on a simple website. This daily indignity is the reality for millions of Iranians. While not all sanctions are inherently flawed, when we sever the connections between people and the wider world, we sow the seeds of future conflicts rather than resolve current ones.

Our challenge, then, is to craft a world for our children where peace is not a distant aspiration, but a carefully cultivated reality. This requires us to recognize the humanity in all people, even those whose governments we oppose. May our children inherit a world where peace is not just a dream, but a reality we've worked tirelessly to create.

MedQA-BBY Description

Utilising 40 labelled instances (development set), we identified the optimal OpenAI model, prompt, and temperature settings for performing four text classification tasks: system/organ, discipline, medicine-or-pharmacy, and sub-speciality. We then employed GPT-4 for initial labeling. Subsequently, each question underwent review by an MD graduate during the enrichment phase. This process involved adding one contradictory answer and one additional correct answer to the list of options. Finally, another MD-level reviewer conducted a second check of all options to ensure accuracy and consistency.

Special thanks

Thanks to Jin Di et al. for providing the MedQA dataset (paper, github)
Thanks to the Streamlit team for making life easier for developers
Thanks to our team for dedicating their time to crafting this dataset

Team Zone

Link to the app: MedQA-BBY-app
Link to Google sheet and assigned batch for each person
Instruction Video

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.devcontainer		.devcontainer
MedQA-BBY-evaluation-app		MedQA-BBY-evaluation-app
Raw_Chunks		Raw_Chunks
Round1_Upload		Round1_Upload
Round2_Upload		Round2_Upload
.gitignore		.gitignore
Dermatology_1_modified .jsonl		Dermatology_1_modified .jsonl
Dermatology_2_modified.jsonl		Dermatology_2_modified.jsonl
LICENSE		LICENSE
MedQA_LabelTranslateChunk.ipynb		MedQA_LabelTranslateChunk.ipynb
Neurology_5_modified.jsonl		Neurology_5_modified.jsonl
README.md		README.md
Step1_nonClinical_6_modified.jsonl		Step1_nonClinical_6_modified.jsonl
Step1_nonClinical_7_modified.jsonl		Step1_nonClinical_7_modified.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedQA-BBY

Term of use

MedQA-BBY Description

Special thanks

Team Zone

About

Releases

Packages

Contributors 12

Languages

License

Sdamirsa/Parsbench-Med-Eng

Folders and files

Latest commit

History

Repository files navigation

MedQA-BBY

Term of use

MedQA-BBY Description

Special thanks

Team Zone

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 12

Languages

Packages