📄Paper: ACL
/dataset
- /public: 52 questions with no licensing issues
base_mcq_dataset_public.csv: Base MCQ Datasetstudent_choice_dataset_public.json: Student Choice Datasetpr_sft_train.csv,pr_dpo_train.csv: Training data for Pairwise Rankerdg_sft_train.csv,dg_dpo_train.csv: Training data for Distractor Generator
- /synthetic: Newly generated CS questions created using GPT-4o
student_choice_dataset_synthetic.json: Student Choice Dataset
Important
In the Student Choice Dataset, the higher the d_scores, the more plausible the distractor.
- /english: High school English exam questions
- These are English questions from the 2025, 2024, and 2023 CSAT (College Scholastic Ability Test). In the experiment, the original Korean questions were translated into English for use. You can check the original questions at [link].
- The distractor selection rates for the CSAT English questions were obtained from an online education platform specializing in CSAT preparation. You can check this at [link].
/pairwise_ranker
pr_sft.py: SFT code for Pairwise Rankerpr_dpo.py: DPO code for Pairwise Rankerpr_inference.py: Inference code for Pairwise Ranker
/distractor_generator
dg_sft.py: SFT code for Distractor Generatordg_dpo.py: DPO code for Distractor Generatordg_inference.py: Inference code for Distractor Generatordg_evaluation_rank.py: Evaluation code for Distractor Generator'
pip install -r requirements.txtFirst, fine-tune the model to determine which of the two distractors is more challenging to students.
python pr_sft.pyMerge the LoRA adapter into the base model.
python merge_adapter.pyIf the model’s reasoning is not yet fully optimized, apply DPO to the training dataset for further improvement.
python pr_dpo.pyThe trained model is now ready for inference.
python pr_inference.pyThe output JSON file is structured as follows:
question: questionanswer: correct answerA: distractor AB: distractor Breview_ab: reasoning results for AB input (list)review_ba: reasoning results for BA input (list)choice: the final choice made by the model (the one with the higher score)true: the distractor with the higher actual selection rate (used to check model accuracy)
First, SFT to train the model to generate distractors that follow the output format specified in the instruction.
python dg_sft.pyMerge the LoRA adapter into the base model.
python merge_adapter.pyFollowing SFT, apply DPO to improve the model’s ability to generate distractors that are more challenging for students.
python dg_dpo.pyNow, run inference with the trained model.
python dg_inference.pyThe output JSON file is structured as follows:
question: questionanswer: correct answeroptions: human-authored distractorstypes: type of distractor (Correct/Incorrect knowledge)distractors: model-generated distractors
Compare the plausibility of distractors generated by the baseline and our model using the previously trained pairwise ranker.
python dg_evaluation_rank.pyThe pairwise ranker is loaded for inference and generates the following output:
question: questionanswer: correct answerd_list: list of distractorsd_reasoning: reasoning results from the pairwise rankerd_scores: scores from each model
Yooseop Lee, Suin Kim, and Yohan Jo. 2025. Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 23669–23692, Vienna, Austria. Association for Computational Linguistics.
@inproceedings{lee-etal-2025-generating-plausible,
title = "Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction",
author = "Lee, Yooseop and
Kim, Suin and
Jo, Yohan",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.1154/",
pages = "23669--23692",
ISBN = "979-8-89176-251-0",
abstract = "In designing multiple-choice questions (MCQs) in education, creating plausible distractors is crucial for identifying students' misconceptions and gaps in knowledge and accurately assessing their understanding. However, prior studies on distractor generation have not paid sufficient attention to enhancing the difficulty of distractors, resulting in reduced effectiveness of MCQs. This study presents a pipeline for training a model to generate distractors that are more likely to be selected by students. First, we train a pairwise ranker to reason about students' misconceptions and assess the relative plausibility of two distractors. Using this model, we create a dataset of pairwise distractor ranks and then train a distractor generator via Direct Preference Optimization (DPO) to generate more plausible distractors. Experiments on computer science subjects (Python, DB, MLDL) demonstrate that our pairwise ranker effectively identifies students' potential misunderstandings and achieves ranking accuracy comparable to human experts. Furthermore, our distractor generator outperforms several baselines in generating plausible distractors and produces questions with a higher item discrimination index (DI)."
}
