Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts
Jihoon Lee1, Hoyeon Moon1, Kevin Zhai5, Arun Kumar Chithanar, Anit Kumar Sahu2, Soummya Kar3, Chul Lee, Souradip Chakraborty4, Amrit Singh Bedi5
1Yonsei University, 2Oracle, 3CMU, 4UMD, 5UCF
Diffusion-based large language models (dLLMs) are trained flexibly to model extreme dependence in the data distribution; however, how to best utilize this information at inference time remains an open problem. In this work, we uncover an interesting property of these models: dLLMs trained on textual data implicitly learn a mixture of semi-autoregressive experts, where different generation orders reveal different specialized behaviors. We show that committing to any single, fixed inference time schedule, a common practice, collapses performance by failing to leverage this latent ensemble. To address this, we introduce HEX (Hidden semiautoregressive EXperts for test-time scaling), a training-free inference method that ensembles across heterogeneous block schedules. By doing a majority vote over diverse block-sized generation paths, HEX robustly avoids failure modes associated with any single fixed schedule. On reasoning benchmarks such as GSM8K, it boosts accuracy by up to 3.56× (from 24.72% to 88.10%), outperforming top-K margin inference and specialized fine-tuned methods like GRPO, without additional training. HEX even yields significant gains on MATH benchmark from 16.40% to 40.00%, scientific reasoning on ARC-C from 54.18% to 87.80%, and TruthfulQA from 28.36% to 57.46%. Our results establish a new paradigm for test-time scaling in diffusion-based LLMs (dLLMs), revealing that the sequence in which masking is performed plays a critical role in determining performance during inference.
- ✨ Hidden Semi-Autoregressive Experts: Reveals that diffusion LLMs implicitly learn multiple semi-AR experts, each specializing in distinct generation orders.
- 🚀 Training-Free Test-Time Scaling: Ensembles diverse block-sized decoding schedules at inference to unlock latent reasoning capabilities without retraining.
# Clone the repository
git clone https://github.com/junos-ai-org/Test-Time-Scaling
cd HEX
# Create a virtual environment
conda env create -f env.yml
conda activate dllm_ttsInside the HEX/eval directory, review the arguments described in run_eval_HEX.sh and run the script accordingly.
cd eval
bash run_eval_HEX.shTest-Time-Scaling/
└── HEX/ # HEX Source code
If you find this work useful, please cite our paper:
@article{lee2025hex,
title={Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts},
author={Lee, Jihoon and Moon, Hoyeon and Zhai, Kevin and Chithanar, Arun Kumar and Sahu, Anit Kumar and Kar, Soummya and Lee, Chul and Chakraborty, Souradip and Bedi, Amrit Singh},
journal={Under Submission},
year={2025}
}This project is licensed under the MIT License - see the LICENSE file for details.
Most of the code of /HEX/ is based on d1.
For questions or issues, please open an issue on GitHub.

