Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
SuperBruceJia committed Dec 4, 2024
1 parent 5c92108 commit 66e0c12
Show file tree
Hide file tree
Showing 170 changed files with 18,503 additions and 4,187 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -629,7 +629,7 @@ to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

MedPodGPT: A multilingual audio-augmented large language model for medical research and education
PodGPT: An Audio-augmented Large Language Model for Research and Education
Copyright (C) 2024 Kolachalama Laboratory at Boston University

This program is free software: you can redistribute it and/or modify
Expand Down
191 changes: 72 additions & 119 deletions README.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion benchmark/README.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
All the benchmarks are put here.
We put all the benchmarks here.
170 changes: 170 additions & 0 deletions benchmark/chinese_cmmlu/biology/high_school_biology.csv

Large diffs are not rendered by default.

133 changes: 133 additions & 0 deletions benchmark/chinese_cmmlu/chemistry/high_school_chemistry.csv

Large diffs are not rendered by default.

205 changes: 205 additions & 0 deletions benchmark/chinese_cmmlu/computer_science/computer_science.csv

Large diffs are not rendered by default.

172 changes: 172 additions & 0 deletions benchmark/chinese_cmmlu/computer_science/computer_security.csv

Large diffs are not rendered by default.

123 changes: 123 additions & 0 deletions benchmark/chinese_cmmlu/computer_science/machine_learning.csv

Large diffs are not rendered by default.

107 changes: 107 additions & 0 deletions benchmark/chinese_cmmlu/engineering/college_engineering_hydrology.csv

Large diffs are not rendered by default.

173 changes: 173 additions & 0 deletions benchmark/chinese_cmmlu/engineering/electrical_engineering.csv

Large diffs are not rendered by default.

107 changes: 107 additions & 0 deletions benchmark/chinese_cmmlu/math/college_actuarial_science.csv

Large diffs are not rendered by default.

106 changes: 106 additions & 0 deletions benchmark/chinese_cmmlu/math/college_mathematics.csv

Large diffs are not rendered by default.

231 changes: 231 additions & 0 deletions benchmark/chinese_cmmlu/math/elementary_mathematics.csv

Large diffs are not rendered by default.

165 changes: 165 additions & 0 deletions benchmark/chinese_cmmlu/math/high_school_mathematics.csv

Large diffs are not rendered by default.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
166 changes: 166 additions & 0 deletions benchmark/chinese_cmmlu/physics/astronomy.csv

Large diffs are not rendered by default.

148 changes: 148 additions & 0 deletions benchmark/chinese_cmmlu/physics/conceptual_physics.csv

Large diffs are not rendered by default.

111 changes: 111 additions & 0 deletions benchmark/chinese_cmmlu/physics/high_school_physics.csv

Large diffs are not rendered by default.

107 changes: 107 additions & 0 deletions benchmark/chinese_cmmlu/statistics/college_medical_statistics.csv

Large diffs are not rendered by default.

310 changes: 310 additions & 0 deletions benchmark/english_mmlu/biology/high_school_biology_test.csv

Large diffs are not rendered by default.

116 changes: 116 additions & 0 deletions benchmark/english_mmlu/chemistry/college_chemistry_test.csv

Large diffs are not rendered by default.

203 changes: 203 additions & 0 deletions benchmark/english_mmlu/chemistry/high_school_chemistry_test.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

112 changes: 112 additions & 0 deletions benchmark/english_mmlu/computer_science/computer_security_test.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

112 changes: 112 additions & 0 deletions benchmark/english_mmlu/computer_science/machine_learning_test.csv

Large diffs are not rendered by default.

145 changes: 145 additions & 0 deletions benchmark/english_mmlu/engineering/electrical_engineering_test.csv

Large diffs are not rendered by default.

100 changes: 100 additions & 0 deletions benchmark/english_mmlu/math/abstract_algebra_test.csv

Large diffs are not rendered by default.

148 changes: 148 additions & 0 deletions benchmark/english_mmlu/math/college_mathematics_test.csv

Large diffs are not rendered by default.

388 changes: 388 additions & 0 deletions benchmark/english_mmlu/math/elementary_mathematics_test.csv

Large diffs are not rendered by default.

270 changes: 270 additions & 0 deletions benchmark/english_mmlu/math/high_school_mathematics_test.csv

Large diffs are not rendered by default.

216 changes: 216 additions & 0 deletions benchmark/english_mmlu/math/high_school_statistics_test.csv

Large diffs are not rendered by default.

File renamed without changes.
147 changes: 147 additions & 0 deletions benchmark/english_mmlu/medicine/college_biology_test.csv

Large diffs are not rendered by default.

152 changes: 152 additions & 0 deletions benchmark/english_mmlu/physics/astronomy_test.csv

Large diffs are not rendered by default.

102 changes: 102 additions & 0 deletions benchmark/english_mmlu/physics/college_physics_test.csv

Large diffs are not rendered by default.

235 changes: 235 additions & 0 deletions benchmark/english_mmlu/physics/conceptual_physics_test.csv

Large diffs are not rendered by default.

154 changes: 154 additions & 0 deletions benchmark/english_mmlu/physics/high_school_physics_test.csv

Large diffs are not rendered by default.

310 changes: 310 additions & 0 deletions benchmark/french_mmlu/biology/high_school_biology_test.csv

Large diffs are not rendered by default.

116 changes: 116 additions & 0 deletions benchmark/french_mmlu/chemistry/college_chemistry_test.csv

Large diffs are not rendered by default.

203 changes: 203 additions & 0 deletions benchmark/french_mmlu/chemistry/high_school_chemistry_test.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

112 changes: 112 additions & 0 deletions benchmark/french_mmlu/computer_science/computer_security_test.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

112 changes: 112 additions & 0 deletions benchmark/french_mmlu/computer_science/machine_learning_test.csv

Large diffs are not rendered by default.

145 changes: 145 additions & 0 deletions benchmark/french_mmlu/engineering/electrical_engineering_test.csv

Large diffs are not rendered by default.

100 changes: 100 additions & 0 deletions benchmark/french_mmlu/math/abstract_algebra_test.csv

Large diffs are not rendered by default.

143 changes: 143 additions & 0 deletions benchmark/french_mmlu/math/college_mathematics_test.csv

Large diffs are not rendered by default.

388 changes: 388 additions & 0 deletions benchmark/french_mmlu/math/elementary_mathematics_test.csv

Large diffs are not rendered by default.

270 changes: 270 additions & 0 deletions benchmark/french_mmlu/math/high_school_mathematics_test.csv

Large diffs are not rendered by default.

216 changes: 216 additions & 0 deletions benchmark/french_mmlu/math/high_school_statistics_test.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

152 changes: 152 additions & 0 deletions benchmark/french_mmlu/physics/astronomy_test.csv

Large diffs are not rendered by default.

102 changes: 102 additions & 0 deletions benchmark/french_mmlu/physics/college_physics_test.csv

Large diffs are not rendered by default.

235 changes: 235 additions & 0 deletions benchmark/french_mmlu/physics/conceptual_physics_test.csv

Large diffs are not rendered by default.

154 changes: 154 additions & 0 deletions benchmark/french_mmlu/physics/high_school_physics_test.csv

Large diffs are not rendered by default.

310 changes: 310 additions & 0 deletions benchmark/hindi_mmlu/biology/high_school_biology_test.csv

Large diffs are not rendered by default.

116 changes: 116 additions & 0 deletions benchmark/hindi_mmlu/chemistry/college_chemistry_test.csv

Large diffs are not rendered by default.

203 changes: 203 additions & 0 deletions benchmark/hindi_mmlu/chemistry/high_school_chemistry_test.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

112 changes: 112 additions & 0 deletions benchmark/hindi_mmlu/computer_science/computer_security_test.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

112 changes: 112 additions & 0 deletions benchmark/hindi_mmlu/computer_science/machine_learning_test.csv

Large diffs are not rendered by default.

145 changes: 145 additions & 0 deletions benchmark/hindi_mmlu/engineering/electrical_engineering_test.csv

Large diffs are not rendered by default.

100 changes: 100 additions & 0 deletions benchmark/hindi_mmlu/math/abstract_algebra_test.csv

Large diffs are not rendered by default.

140 changes: 140 additions & 0 deletions benchmark/hindi_mmlu/math/college_mathematics_test.csv

Large diffs are not rendered by default.

388 changes: 388 additions & 0 deletions benchmark/hindi_mmlu/math/elementary_mathematics_test.csv

Large diffs are not rendered by default.

270 changes: 270 additions & 0 deletions benchmark/hindi_mmlu/math/high_school_mathematics_test.csv

Large diffs are not rendered by default.

216 changes: 216 additions & 0 deletions benchmark/hindi_mmlu/math/high_school_statistics_test.csv

Large diffs are not rendered by default.

147 changes: 147 additions & 0 deletions benchmark/hindi_mmlu/medicine/mmlu_Hindi_test_college_biology_test.csv

Large diffs are not rendered by default.

152 changes: 152 additions & 0 deletions benchmark/hindi_mmlu/physics/astronomy_test.csv

Large diffs are not rendered by default.

102 changes: 102 additions & 0 deletions benchmark/hindi_mmlu/physics/college_physics_test.csv

Large diffs are not rendered by default.

235 changes: 235 additions & 0 deletions benchmark/hindi_mmlu/physics/conceptual_physics_test.csv

Large diffs are not rendered by default.

154 changes: 154 additions & 0 deletions benchmark/hindi_mmlu/physics/high_school_physics_test.csv

Large diffs are not rendered by default.

310 changes: 310 additions & 0 deletions benchmark/spanish_mmlu/biology/high_school_biology_test.csv

Large diffs are not rendered by default.

113 changes: 113 additions & 0 deletions benchmark/spanish_mmlu/chemistry/college_chemistry_test.csv

Large diffs are not rendered by default.

203 changes: 203 additions & 0 deletions benchmark/spanish_mmlu/chemistry/high_school_chemistry_test.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

112 changes: 112 additions & 0 deletions benchmark/spanish_mmlu/computer_science/computer_security_test.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

112 changes: 112 additions & 0 deletions benchmark/spanish_mmlu/computer_science/machine_learning_test.csv

Large diffs are not rendered by default.

145 changes: 145 additions & 0 deletions benchmark/spanish_mmlu/engineering/electrical_engineering_test.csv

Large diffs are not rendered by default.

100 changes: 100 additions & 0 deletions benchmark/spanish_mmlu/math/abstract_algebra_test.csv

Large diffs are not rendered by default.

143 changes: 143 additions & 0 deletions benchmark/spanish_mmlu/math/college_mathematics_test.csv

Large diffs are not rendered by default.

388 changes: 388 additions & 0 deletions benchmark/spanish_mmlu/math/elementary_mathematics_test.csv

Large diffs are not rendered by default.

270 changes: 270 additions & 0 deletions benchmark/spanish_mmlu/math/high_school_mathematics_test.csv

Large diffs are not rendered by default.

216 changes: 216 additions & 0 deletions benchmark/spanish_mmlu/math/high_school_statistics_test.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

152 changes: 152 additions & 0 deletions benchmark/spanish_mmlu/physics/astronomy_test.csv

Large diffs are not rendered by default.

102 changes: 102 additions & 0 deletions benchmark/spanish_mmlu/physics/college_physics_test.csv

Large diffs are not rendered by default.

235 changes: 235 additions & 0 deletions benchmark/spanish_mmlu/physics/conceptual_physics_test.csv

Large diffs are not rendered by default.

151 changes: 151 additions & 0 deletions benchmark/spanish_mmlu/physics/high_school_physics_test.csv

Large diffs are not rendered by default.

428 changes: 428 additions & 0 deletions config_benchmark.yml

Large diffs are not rendered by default.

55 changes: 0 additions & 55 deletions config_chatgpt.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,61 +18,6 @@ model_name : "gpt-4-0613"
openai_api_key : "YOUR_OPENAI_API_KEY"
max_new_tokens : 1024

# Dataset path
# English Benchmarks
english_medqa : "./benchmark/english_medqa/MedQA_USMLE_test.jsonl"
english_pubmedqa : "./benchmark/english_pubmedqa/PubMedQA_test.json"
english_medmcqa : "./benchmark/english_medmcqa/MedMCQA_test.json"
english_usmle_step1 : "./benchmark/english_usmle/USMLE_STEP_1.json"
english_usmle_step2 : "./benchmark/english_usmle/USMLE_STEP_2.json"
english_usmle_step3 : "./benchmark/english_usmle/USMLE_STEP_3.json"
english_usmle_ethics : "./benchmark/english_usmle/USMLE_ethics.json"
english_mmlu_anatomy : "./benchmark/english_mmlu/anatomy_test.csv"
english_mmlu_clinical_knowledge : "./benchmark/english_mmlu/clinical_knowledge_test.csv"
english_mmlu_college_biology : "./benchmark/english_mmlu/college_biology_test.csv"
english_mmlu_college_medicine : "./benchmark/english_mmlu/college_medicine_test.csv"
english_mmlu_medical_genetics : "./benchmark/english_mmlu/medical_genetics_test.csv"
english_mmlu_professional_medicine : "./benchmark/english_mmlu/professional_medicine_test.csv"
english_medexpqa : "./benchmark/english_medexpqa/test.en.casimedicos.rag.jsonl"

# Chinese Benchmarks
chinese_mcmle : "./benchmark/chinese_mcmle/MedQA-MCMLE.jsonl"
chinese_cmmlu_anatomy : "./benchmark/chinese_cmmlu/anatomy.csv"
chinese_cmmlu_clinical_knowledge : "./benchmark/chinese_cmmlu/clinical_knowledge.csv"
chinese_cmmlu_college_medicine : "./benchmark/chinese_cmmlu/college_medicine.csv"
chinese_cmmlu_genetics : "./benchmark/chinese_cmmlu/genetics.csv"
chinese_cmmlu_nutrition : "./benchmark/chinese_cmmlu/nutrition.csv"
chinese_cmmlu_tcm: "./benchmark/chinese_cmmlu/traditional_chinese_medicine.csv"
chinese_cmmlu_virology : "./benchmark/chinese_cmmlu/virology.csv"

# French Benchmarks
french_medmcqa : "./benchmark/french_medmcqa/FrenchMedMCQA-test.json"
french_mmlu_anatomy : "./benchmark/french_mmlu/mmlu_French_test_anatomy_test.csv"
french_mmlu_clinical_knowledge : "./benchmark/french_mmlu/mmlu_French_test_clinical_knowledge_test.csv"
french_mmlu_college_biology : "./benchmark/french_mmlu/mmlu_French_test_college_biology_test.csv"
french_mmlu_college_medicine : "./benchmark/french_mmlu/mmlu_French_test_college_medicine_test.csv"
french_mmlu_medical_genetics : "./benchmark/french_mmlu/mmlu_French_test_medical_genetics_test.csv"
french_mmlu_professional_medicine : "./benchmark/french_mmlu/mmlu_French_test_professional_medicine_test.csv"
french_medexpqa : "./benchmark/french_medexpqa/test.fr.casimedicos.rag.jsonl"

# Spanish Benchmarks
spanish_headqa : "./benchmark/spanish_headqa/HEAD-QA-test.json"
spanish_mmlu_anatomy : "./benchmark/spanish_mmlu/mmlu_Spanish_test_anatomy_test.csv"
spanish_mmlu_clinical_knowledge : "./benchmark/spanish_mmlu/mmlu_Spanish_test_clinical_knowledge_test.csv"
spanish_mmlu_college_biology : "./benchmark/spanish_mmlu/mmlu_Spanish_test_college_biology_test.csv"
spanish_mmlu_college_medicine : "./benchmark/spanish_mmlu/mmlu_Spanish_test_college_medicine_test.csv"
spanish_mmlu_medical_genetics : "./benchmark/spanish_mmlu/mmlu_Spanish_test_medical_genetics_test.csv"
spanish_mmlu_professional_medicine : "./benchmark/spanish_mmlu/mmlu_Spanish_test_professional_medicine_test.csv"
spanish_medexpqa : "./benchmark/spanish_medexpqa/test.es.casimedicos.rag.jsonl"

# Hindi Benchmarks
hindi_mmlu_anatomy : "./benchmark/hindi_mmlu/mmlu_Hindi_test_anatomy_test.csv"
hindi_mmlu_clinical_knowledge : "./benchmark/hindi_mmlu/mmlu_Hindi_test_clinical_knowledge_test.csv"
hindi_mmlu_college_biology : "./benchmark/hindi_mmlu/mmlu_Hindi_test_college_biology_test.csv"
hindi_mmlu_college_medicine : "./benchmark/hindi_mmlu/mmlu_Hindi_test_college_medicine_test.csv"
hindi_mmlu_medical_genetics : "./benchmark/hindi_mmlu/mmlu_Hindi_test_medical_genetics_test.csv"
hindi_mmlu_professional_medicine : "./benchmark/hindi_mmlu/mmlu_Hindi_test_professional_medicine_test.csv"

# Saving path
result_dir : "./results"
response_dir : "./Pretrain-Results"
Expand Down
66 changes: 3 additions & 63 deletions config_large.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
# Model info
# Mistral 8x7B MoE 46B: mistralai/Mixtral-8x7B-Instruct-v0.1
# LLaMA 3 70B: meta-llama/Meta-Llama-3-70B-Instruct
model_name : "meta-llama/Meta-Llama-3-70B-Instruct"
model_name : "meta-llama/Meta-Llama-3.1-70B-Instruct"

# Dataset info
# English podcasts: shuyuej/English-Pretraining-Dataset
# Spanish podcasts: shuyuej/Spanish-Pretraining-Dataset
# French podcasts: shuyuej/French-Pretraining-Dataset
# Multilingual podcasts: shuyuej/Multilingual-Pretraining-Dataset
dataset_hf : "shuyuej/MedPodGPT-Demo-Data"
dataset_hf : "shuyuej/PodGPT-Demo-Data"

# This is my Hugging Face `read` and `write` tokens. Please replace it to yours.
# `read` token: for downloading models
Expand All @@ -28,61 +22,6 @@ lora_alpha : 32
# Dropout probability for LoRA layers
lora_dropout : 0.1

# Dataset path
# English Benchmarks
english_medqa : "./benchmark/english_medqa/MedQA_USMLE_test.jsonl"
english_pubmedqa : "./benchmark/english_pubmedqa/PubMedQA_test.json"
english_medmcqa : "./benchmark/english_medmcqa/MedMCQA_test.json"
english_usmle_step1 : "./benchmark/english_usmle/USMLE_STEP_1.json"
english_usmle_step2 : "./benchmark/english_usmle/USMLE_STEP_2.json"
english_usmle_step3 : "./benchmark/english_usmle/USMLE_STEP_3.json"
english_usmle_ethics : "./benchmark/english_usmle/USMLE_ethics.json"
english_mmlu_anatomy : "./benchmark/english_mmlu/anatomy_test.csv"
english_mmlu_clinical_knowledge : "./benchmark/english_mmlu/clinical_knowledge_test.csv"
english_mmlu_college_biology : "./benchmark/english_mmlu/college_biology_test.csv"
english_mmlu_college_medicine : "./benchmark/english_mmlu/college_medicine_test.csv"
english_mmlu_medical_genetics : "./benchmark/english_mmlu/medical_genetics_test.csv"
english_mmlu_professional_medicine : "./benchmark/english_mmlu/professional_medicine_test.csv"
english_medexpqa : "./benchmark/english_medexpqa/test.en.casimedicos.rag.jsonl"

# Chinese Benchmarks
chinese_mcmle : "./benchmark/chinese_mcmle/MedQA-MCMLE.jsonl"
chinese_cmmlu_anatomy : "./benchmark/chinese_cmmlu/anatomy.csv"
chinese_cmmlu_clinical_knowledge : "./benchmark/chinese_cmmlu/clinical_knowledge.csv"
chinese_cmmlu_college_medicine : "./benchmark/chinese_cmmlu/college_medicine.csv"
chinese_cmmlu_genetics : "./benchmark/chinese_cmmlu/genetics.csv"
chinese_cmmlu_nutrition : "./benchmark/chinese_cmmlu/nutrition.csv"
chinese_cmmlu_tcm: "./benchmark/chinese_cmmlu/traditional_chinese_medicine.csv"
chinese_cmmlu_virology : "./benchmark/chinese_cmmlu/virology.csv"

# French Benchmarks
french_medmcqa : "./benchmark/french_medmcqa/FrenchMedMCQA-test.json"
french_mmlu_anatomy : "./benchmark/french_mmlu/mmlu_French_test_anatomy_test.csv"
french_mmlu_clinical_knowledge : "./benchmark/french_mmlu/mmlu_French_test_clinical_knowledge_test.csv"
french_mmlu_college_biology : "./benchmark/french_mmlu/mmlu_French_test_college_biology_test.csv"
french_mmlu_college_medicine : "./benchmark/french_mmlu/mmlu_French_test_college_medicine_test.csv"
french_mmlu_medical_genetics : "./benchmark/french_mmlu/mmlu_French_test_medical_genetics_test.csv"
french_mmlu_professional_medicine : "./benchmark/french_mmlu/mmlu_French_test_professional_medicine_test.csv"
french_medexpqa : "./benchmark/french_medexpqa/test.fr.casimedicos.rag.jsonl"

# Spanish Benchmarks
spanish_headqa : "./benchmark/spanish_headqa/HEAD-QA-test.json"
spanish_mmlu_anatomy : "./benchmark/spanish_mmlu/mmlu_Spanish_test_anatomy_test.csv"
spanish_mmlu_clinical_knowledge : "./benchmark/spanish_mmlu/mmlu_Spanish_test_clinical_knowledge_test.csv"
spanish_mmlu_college_biology : "./benchmark/spanish_mmlu/mmlu_Spanish_test_college_biology_test.csv"
spanish_mmlu_college_medicine : "./benchmark/spanish_mmlu/mmlu_Spanish_test_college_medicine_test.csv"
spanish_mmlu_medical_genetics : "./benchmark/spanish_mmlu/mmlu_Spanish_test_medical_genetics_test.csv"
spanish_mmlu_professional_medicine : "./benchmark/spanish_mmlu/mmlu_Spanish_test_professional_medicine_test.csv"
spanish_medexpqa : "./benchmark/spanish_medexpqa/test.es.casimedicos.rag.jsonl"

# Hindi Benchmarks
hindi_mmlu_anatomy : "./benchmark/hindi_mmlu/mmlu_Hindi_test_anatomy_test.csv"
hindi_mmlu_clinical_knowledge : "./benchmark/hindi_mmlu/mmlu_Hindi_test_clinical_knowledge_test.csv"
hindi_mmlu_college_biology : "./benchmark/hindi_mmlu/mmlu_Hindi_test_college_biology_test.csv"
hindi_mmlu_college_medicine : "./benchmark/hindi_mmlu/mmlu_Hindi_test_college_medicine_test.csv"
hindi_mmlu_medical_genetics : "./benchmark/hindi_mmlu/mmlu_Hindi_test_medical_genetics_test.csv"
hindi_mmlu_professional_medicine : "./benchmark/hindi_mmlu/mmlu_Hindi_test_professional_medicine_test.csv"

# Saving path
result_dir : "./results"
save_dir : "./save_folder"
Expand Down Expand Up @@ -142,3 +81,4 @@ device_map : "auto"
# https://docs.vllm.ai/en/latest/serving/distributed_serving.html
num_gpus_vllm : 4
gpu_utilization_vllm : 0.95
max_model_len_vllm : 2048
61 changes: 1 addition & 60 deletions config_quantization.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,7 @@
model_name : "shuyuej/MedLLaMA3-70B-BASE-MODEL-QUANT"

# Dataset info
# English podcasts: shuyuej/English-Pretraining-Dataset
# Spanish podcasts: shuyuej/Spanish-Pretraining-Dataset
# French podcasts: shuyuej/French-Pretraining-Dataset
# Multilingual podcasts: shuyuej/Multilingual-Pretraining-Dataset
dataset_hf : "shuyuej/MedPodGPT-Demo-Data"
dataset_hf : "shuyuej/PodGPT-Demo-Data"

# This is my Hugging Face `read` and `write` tokens. Please replace it to yours.
# `read` token: for downloading models
Expand All @@ -26,61 +22,6 @@ lora_alpha : 32
# Dropout probability for LoRA layers
lora_dropout : 0.1

# Dataset path
# English Benchmarks
english_medqa : "./benchmark/english_medqa/MedQA_USMLE_test.jsonl"
english_pubmedqa : "./benchmark/english_pubmedqa/PubMedQA_test.json"
english_medmcqa : "./benchmark/english_medmcqa/MedMCQA_test.json"
english_usmle_step1 : "./benchmark/english_usmle/USMLE_STEP_1.json"
english_usmle_step2 : "./benchmark/english_usmle/USMLE_STEP_2.json"
english_usmle_step3 : "./benchmark/english_usmle/USMLE_STEP_3.json"
english_usmle_ethics : "./benchmark/english_usmle/USMLE_ethics.json"
english_mmlu_anatomy : "./benchmark/english_mmlu/anatomy_test.csv"
english_mmlu_clinical_knowledge : "./benchmark/english_mmlu/clinical_knowledge_test.csv"
english_mmlu_college_biology : "./benchmark/english_mmlu/college_biology_test.csv"
english_mmlu_college_medicine : "./benchmark/english_mmlu/college_medicine_test.csv"
english_mmlu_medical_genetics : "./benchmark/english_mmlu/medical_genetics_test.csv"
english_mmlu_professional_medicine : "./benchmark/english_mmlu/professional_medicine_test.csv"
english_medexpqa : "./benchmark/english_medexpqa/test.en.casimedicos.rag.jsonl"

# Chinese Benchmarks
chinese_mcmle : "./benchmark/chinese_mcmle/MedQA-MCMLE.jsonl"
chinese_cmmlu_anatomy : "./benchmark/chinese_cmmlu/anatomy.csv"
chinese_cmmlu_clinical_knowledge : "./benchmark/chinese_cmmlu/clinical_knowledge.csv"
chinese_cmmlu_college_medicine : "./benchmark/chinese_cmmlu/college_medicine.csv"
chinese_cmmlu_genetics : "./benchmark/chinese_cmmlu/genetics.csv"
chinese_cmmlu_nutrition : "./benchmark/chinese_cmmlu/nutrition.csv"
chinese_cmmlu_tcm: "./benchmark/chinese_cmmlu/traditional_chinese_medicine.csv"
chinese_cmmlu_virology : "./benchmark/chinese_cmmlu/virology.csv"

# French Benchmarks
french_medmcqa : "./benchmark/french_medmcqa/FrenchMedMCQA-test.json"
french_mmlu_anatomy : "./benchmark/french_mmlu/mmlu_French_test_anatomy_test.csv"
french_mmlu_clinical_knowledge : "./benchmark/french_mmlu/mmlu_French_test_clinical_knowledge_test.csv"
french_mmlu_college_biology : "./benchmark/french_mmlu/mmlu_French_test_college_biology_test.csv"
french_mmlu_college_medicine : "./benchmark/french_mmlu/mmlu_French_test_college_medicine_test.csv"
french_mmlu_medical_genetics : "./benchmark/french_mmlu/mmlu_French_test_medical_genetics_test.csv"
french_mmlu_professional_medicine : "./benchmark/french_mmlu/mmlu_French_test_professional_medicine_test.csv"
french_medexpqa : "./benchmark/french_medexpqa/test.fr.casimedicos.rag.jsonl"

# Spanish Benchmarks
spanish_headqa : "./benchmark/spanish_headqa/HEAD-QA-test.json"
spanish_mmlu_anatomy : "./benchmark/spanish_mmlu/mmlu_Spanish_test_anatomy_test.csv"
spanish_mmlu_clinical_knowledge : "./benchmark/spanish_mmlu/mmlu_Spanish_test_clinical_knowledge_test.csv"
spanish_mmlu_college_biology : "./benchmark/spanish_mmlu/mmlu_Spanish_test_college_biology_test.csv"
spanish_mmlu_college_medicine : "./benchmark/spanish_mmlu/mmlu_Spanish_test_college_medicine_test.csv"
spanish_mmlu_medical_genetics : "./benchmark/spanish_mmlu/mmlu_Spanish_test_medical_genetics_test.csv"
spanish_mmlu_professional_medicine : "./benchmark/spanish_mmlu/mmlu_Spanish_test_professional_medicine_test.csv"
spanish_medexpqa : "./benchmark/spanish_medexpqa/test.es.casimedicos.rag.jsonl"

# Hindi Benchmarks
hindi_mmlu_anatomy : "./benchmark/hindi_mmlu/mmlu_Hindi_test_anatomy_test.csv"
hindi_mmlu_clinical_knowledge : "./benchmark/hindi_mmlu/mmlu_Hindi_test_clinical_knowledge_test.csv"
hindi_mmlu_college_biology : "./benchmark/hindi_mmlu/mmlu_Hindi_test_college_biology_test.csv"
hindi_mmlu_college_medicine : "./benchmark/hindi_mmlu/mmlu_Hindi_test_college_medicine_test.csv"
hindi_mmlu_medical_genetics : "./benchmark/hindi_mmlu/mmlu_Hindi_test_medical_genetics_test.csv"
hindi_mmlu_professional_medicine : "./benchmark/hindi_mmlu/mmlu_Hindi_test_professional_medicine_test.csv"

# Saving path
result_dir : "./results"
save_dir : "./save_folder"
Expand Down
73 changes: 7 additions & 66 deletions config_small.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,8 @@
# Model info
# Gemma 2B Instruction-tuned: google/gemma-2b-it
# Gemma 7B Instruction-tuned: google/gemma-7b-it
# Mistral 7B Instruction-tuned: mistralai/Mistral-7B-Instruct-v0.3
# LLaMA 3 8B Instruction-tuned: meta-llama/Meta-Llama-3-8B-Instruct
# Medical LLMs 7B: medalpaca/medalpaca-7b
model_name : "meta-llama/Meta-Llama-3-8B-Instruct"
model_name : "meta-llama/Meta-Llama-3.1-8B-Instruct"

# Dataset info
# English podcasts: shuyuej/English-Pretraining-Dataset
# Spanish podcasts: shuyuej/Spanish-Pretraining-Dataset
# French podcasts: shuyuej/French-Pretraining-Dataset
# Multilingual podcasts: shuyuej/Multilingual-Pretraining-Dataset
dataset_hf : "shuyuej/MedPodGPT-Demo-Data"
dataset_hf : "shuyuej/PodGPT-Demo-Data"

# This is my Hugging Face `read` and `write` tokens. Please replace it to yours.
# `read` token: for downloading models
Expand All @@ -23,61 +14,6 @@ hf_write_token : "YOUR_HUGGING_FACE_WRITE_TOKEN" # Hugging Face `write` Token
# Evaluate the original pre-trained model's performance
eval_pretrain : False

# Dataset path
# English Benchmarks
english_medqa : "./benchmark/english_medqa/MedQA_USMLE_test.jsonl"
english_pubmedqa : "./benchmark/english_pubmedqa/PubMedQA_test.json"
english_medmcqa : "./benchmark/english_medmcqa/MedMCQA_test.json"
english_usmle_step1 : "./benchmark/english_usmle/USMLE_STEP_1.json"
english_usmle_step2 : "./benchmark/english_usmle/USMLE_STEP_2.json"
english_usmle_step3 : "./benchmark/english_usmle/USMLE_STEP_3.json"
english_usmle_ethics : "./benchmark/english_usmle/USMLE_ethics.json"
english_mmlu_anatomy : "./benchmark/english_mmlu/anatomy_test.csv"
english_mmlu_clinical_knowledge : "./benchmark/english_mmlu/clinical_knowledge_test.csv"
english_mmlu_college_biology : "./benchmark/english_mmlu/college_biology_test.csv"
english_mmlu_college_medicine : "./benchmark/english_mmlu/college_medicine_test.csv"
english_mmlu_medical_genetics : "./benchmark/english_mmlu/medical_genetics_test.csv"
english_mmlu_professional_medicine : "./benchmark/english_mmlu/professional_medicine_test.csv"
english_medexpqa : "./benchmark/english_medexpqa/test.en.casimedicos.rag.jsonl"

# Chinese Benchmarks
chinese_mcmle : "./benchmark/chinese_mcmle/MedQA-MCMLE.jsonl"
chinese_cmmlu_anatomy : "./benchmark/chinese_cmmlu/anatomy.csv"
chinese_cmmlu_clinical_knowledge : "./benchmark/chinese_cmmlu/clinical_knowledge.csv"
chinese_cmmlu_college_medicine : "./benchmark/chinese_cmmlu/college_medicine.csv"
chinese_cmmlu_genetics : "./benchmark/chinese_cmmlu/genetics.csv"
chinese_cmmlu_nutrition : "./benchmark/chinese_cmmlu/nutrition.csv"
chinese_cmmlu_tcm: "./benchmark/chinese_cmmlu/traditional_chinese_medicine.csv"
chinese_cmmlu_virology : "./benchmark/chinese_cmmlu/virology.csv"

# French Benchmarks
french_medmcqa : "./benchmark/french_medmcqa/FrenchMedMCQA-test.json"
french_mmlu_anatomy : "./benchmark/french_mmlu/mmlu_French_test_anatomy_test.csv"
french_mmlu_clinical_knowledge : "./benchmark/french_mmlu/mmlu_French_test_clinical_knowledge_test.csv"
french_mmlu_college_biology : "./benchmark/french_mmlu/mmlu_French_test_college_biology_test.csv"
french_mmlu_college_medicine : "./benchmark/french_mmlu/mmlu_French_test_college_medicine_test.csv"
french_mmlu_medical_genetics : "./benchmark/french_mmlu/mmlu_French_test_medical_genetics_test.csv"
french_mmlu_professional_medicine : "./benchmark/french_mmlu/mmlu_French_test_professional_medicine_test.csv"
french_medexpqa : "./benchmark/french_medexpqa/test.fr.casimedicos.rag.jsonl"

# Spanish Benchmarks
spanish_headqa : "./benchmark/spanish_headqa/HEAD-QA-test.json"
spanish_mmlu_anatomy : "./benchmark/spanish_mmlu/mmlu_Spanish_test_anatomy_test.csv"
spanish_mmlu_clinical_knowledge : "./benchmark/spanish_mmlu/mmlu_Spanish_test_clinical_knowledge_test.csv"
spanish_mmlu_college_biology : "./benchmark/spanish_mmlu/mmlu_Spanish_test_college_biology_test.csv"
spanish_mmlu_college_medicine : "./benchmark/spanish_mmlu/mmlu_Spanish_test_college_medicine_test.csv"
spanish_mmlu_medical_genetics : "./benchmark/spanish_mmlu/mmlu_Spanish_test_medical_genetics_test.csv"
spanish_mmlu_professional_medicine : "./benchmark/spanish_mmlu/mmlu_Spanish_test_professional_medicine_test.csv"
spanish_medexpqa : "./benchmark/spanish_medexpqa/test.es.casimedicos.rag.jsonl"

# Hindi Benchmarks
hindi_mmlu_anatomy : "./benchmark/hindi_mmlu/mmlu_Hindi_test_anatomy_test.csv"
hindi_mmlu_clinical_knowledge : "./benchmark/hindi_mmlu/mmlu_Hindi_test_clinical_knowledge_test.csv"
hindi_mmlu_college_biology : "./benchmark/hindi_mmlu/mmlu_Hindi_test_college_biology_test.csv"
hindi_mmlu_college_medicine : "./benchmark/hindi_mmlu/mmlu_Hindi_test_college_medicine_test.csv"
hindi_mmlu_medical_genetics : "./benchmark/hindi_mmlu/mmlu_Hindi_test_medical_genetics_test.csv"
hindi_mmlu_professional_medicine : "./benchmark/hindi_mmlu/mmlu_Hindi_test_professional_medicine_test.csv"

# Saving path
result_dir : "./results"
response_dir : "./Final-Results"
Expand Down Expand Up @@ -132,3 +68,8 @@ bf16 : True

# Choose which GPU to use
device_map : "auto"

# The number of GPUs and GPU utilization for the vLLM Engine
# https://docs.vllm.ai/en/latest/serving/distributed_serving.html
num_gpus_vllm : 1
gpu_utilization_vllm : 0.75
2 changes: 1 addition & 1 deletion download_files/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ cp -r ./download_files/download_model_to_local.py ./
```

## 💻 Download a repo to a local folder
We support downloading `model` and `dataset`.
We support downloading `model` and `dataset` from Hugging Face.

```shell
python download_model_from_hf.py --repo "shuyuej/MedGemma7B-Multilingual" --repo_type "model" --save_dir "./save_folder"
Expand Down
2 changes: 1 addition & 1 deletion download_files/download_model_from_hf.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#
# GNU Affero General Public License v3.0 License
#
# MedPodGPT: A multilingual audio-augmented large language model for medical research and education
# PodGPT: An Audio-augmented Large Language Model for Research and Education
# Copyright (C) 2024 Kolachalama Laboratory at Boston University

from argparse import ArgumentParser
Expand Down
Loading

0 comments on commit 66e0c12

Please sign in to comment.