Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration
This repository provides the official PyTorch implementation of our paper:
"Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration"
Jucheng Shen, Gaurav Sarkar, Yeonju Ro, Sharath Nittur Sridhar, Zhangyang Wang, Aditya Akella, Souvik Kundu
Findings of ACL 2026 | OpenReview | arXiv
- Apr 2026: Our paper has been accepted to Findings of ACL 2026!
CadLLM is a training‑free, plug‑and‑play controller that improves the inference throughput of masked diffusion language models (dLLMs) by adapting decoding policies based on lightweight confidence signals produced by the model itself. Across GSM8K, MATH, MBPP and HumanEval, CadLLM delivers up to 2.28× throughput over strong Fast‑dLLM baselines while maintaining competitive accuracy.
# Python 3.10+ recommended
pip install -r requirements.txtYou will also need access to LLaDA and DREAM model weights. You should not need to worry about downloading them manually as huggingface will automatically download the model when you run the scripts. However, if any issue arises, you can go to their github repo for more detailed download instructions.
See eval.md in llada/ and dream/ for specific instructions.
If you find this repository useful, please consider citing:
@inproceedings{shen2026improvingthroughputdiffusionbasedlarge,
title={Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration},
author={Jucheng Shen and Gaurav Sarkar and Yeonju Ro and Sharath Nittur Sridhar and Zhangyang Wang and Aditya Akella and Souvik Kundu},
year={2026},
booktitle={Findings of the Association for Computational Linguistics: ACL 2026},
note={Preprint available on arXiv:2512.07173},
url={https://arxiv.org/abs/2512.07173},
}