We include comprehensive evaluation code for:
- ✅ d3LLM (our method)
- ✅ AR Model (e.g., Qwen-2.5-7B-it) - Autoregressive baselines
- ✅ Vanilla LLaDA - Original LLaDA model
- ✅ Vanilla Dream - Original Dream model
- ✅ Fast-dLLM - Training-free acceleration with KV cache
- ✅ D2F - Discrete diffusion forcing
- ✅ dParallel - Distilled dLLMs
- ✅ Fast-dLLM v2 - Block-wise diffusion
# GSM8K
bash dream_gsm8k_cot.sh
bash llada_gsm8k_cot.sh
# MATH
bash dream_math.sh
bash llada_math.sh
# Code Generation (HumanEval & MBPP)
bash dream_humaneval.sh
bash dream_mbpp.sh
bash llada_humaneval.sh
bash llada_mbpp.sh
bash dream-coder.sh
# Long-Context GSM8K
bash dream_long_gsm8k.sh
bash llada_long_gsm8k.sh