📄 Paper: Dream 7B: Diffusion Large Language Models | 💻 Code: github.com/DreamLM/Dream
Resources and examples for training (finetuning & pretraining) and evaluating diffusion language models Dream.
# Pipeline modules relevant to Dream
dllm/pipelines/dream
├── __init__.py # Package initialization
├── models/
│ ├── __init__.py
│ ├── configuration_dream.py # Dream model configuration
│ ├── generation_utils.py # Diffusion-based generation logic
│ ├── modeling_dream.py # Core Dream model architecture
│ └── tokenization_dream.py # Tokenizer implementation for Dream
├── eval.py # Evaluation module
├── sampler.py # Inference module
├── trainer.py # Training module (pretraining and SFT)
└── utils.py # Auxiliary utilities and helper functions
# Example entry points for training / inference / evaluation
examples/dream
├── chat.py # Interactive inference example
├── eval.sh # Automatic evaluation example
├── sample.py # Inference example
├── pt.py # Pretraining example
├── README.md # Documentation (you are here)
└── sft.py # Supervised finetuning example
Read Useful tips for training and (optional) Slurm setup before training.
For example, to SFT Dream-v0-Base-7B on the alpaca dataset for instruction following on 8 GPUs, run:
accelerate launch \
--config_file scripts/accelerate_configs/fsdp.yaml \
examples/dream/sft.py \
--model_name_or_path "Dream-org/Dream-v0-Base-7B" \
--dataset_args "tatsu-lab/alpaca" \
--max_length 1024 \
--num_train_epochs 5 \
--learning_rate 2e-5 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--output_dir ".models/Dream-v0-Base-7B/alpaca"If you are using slurm and want to train across, for example, 2 nodes (16 GPUs total), run:
sbatch --nodes=2 --gres=gpu:8 scripts/train.slurm.sh \
--accelerate_config "fsdp" \
--script_path "examples/dream/sft.py" \
--model_name_or_path "Dream-org/Dream-v0-Base-7B" \
--dataset_args "tatsu-lab/alpaca" \
--max_length 1024 \
--num_train_epochs 5 \
--learning_rate 2e-5 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--output_dir ".models/Dream-v0-Base-7B/alpaca"Reproducing Dream-v0-Instruct-7B with SFT
We tried our best to reproduce Dream-v0-Instruct-7B by finetuning Dream-v0-Base-7B with SFT on the allenai/tulu-3-sft-mixture dataset:
# Preprocessing SFT data (optional, but can avoid redundant preprocessing for multi-node training)
python dllm/tools/preprocess_sft_dataset.py \
--model_name_or_path "Dream-org/Dream-v0-Base-7B" \
--sft_map_fn_path "dllm.utils.default_sft_map_fn" \
--dataset_args "allenai/tulu-3-sft-mixture" \
--output_dir ".data/sft/dream/tulu-3-sft-mixture" \
--num_proc 64
# Train on 24*8=192 A100s with FSDP, take about 8 hours
sbatch --nodes=24 --gres=gpu:8 scripts/train.slurm.sh \
--accelerate_config "fsdp" \
--script_path "examples/dream/sft.py" \
--model_name_or_path "Dream-org/Dream-v0-Base-7B" \
--dataset_args ".data/sft/dream/tulu-3-sft-mixture" \
--load_preprocessed_data True \
--max_length 1024 \
--num_train_epochs 5 \
--learning_rate 2e-5 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--output_dir ".models/Dream-v0-Base-7B/tulu-3-sft-mixture"Pretrain on mlfoundations/dclm-baseline-1.0 from scratch using 192 GPUs (24x8) and FSDP:
sbatch --nodes=24 --gres=gpu:8 scripts/train.slurm.sh \
--accelerate_config "fsdp" \
--script_path "examples/dream/pt.py" \
--model_name_or_path "Dream-org/Dream-v0-Base-7B" \
--dataset_args "mlfoundations/dclm-baseline-1.0" \
--max_length 1024 \
--max_steps 2000 \
--learning_rate 1e-4 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--output_dir ".models/Dream-v0-Base-7B/dclm-baseline-1.0"We support batch inference for standard sampling and infilling:
python examples/dream/sample.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B"We also support interactive multi-turn dialogue with visualization:
python examples/dream/chat.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B"Read (optional) Evaluation setup before running evaluation.
For example, to evaluate Dream-v0-Instruct-7B on gsm8k using 4 GPUs, run:
# Use model_args to adjust the sampler arguments for evaluation.
accelerate launch --num_processes 4 \
dllm/pipelines/dream/eval.py \
--tasks "gsm8k_cot" \
--model "dream" \
--apply_chat_template \
--num_fewshot 0 \
--model_args "pretrained=Dream-org/Dream-v0-Instruct-7B,max_new_tokens=256,steps=256,temperature=0.1,top_p=0.9,alg=entropy,dtype=bfloat16,add_bos_token=False"To automatically evaluate Dream-v0-Base-7B and Dream-v0-Instruct-7B on all benchmarks, run:
bash examples/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Instruct-7B" --instruct True
bash examples/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Base-7B" --instruct FalseFor Fast-dLLM sampling and evaluation with Dream, see the Fast-dLLM README.
Results (Reproduced) are evaluated using our framework, while results (Official) come from the original paper. All evaluation settings follow the configurations in the Dream repository, with minor adjustments.
| MMLU | ARC‑C | ARC‑E | Hellaswag | WinoGrande | PIQA | GSM8K | Math | BBH | GPQA | HumanEval | MBPP | RACE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dream-v0-Base-7B (Official) |
69.5 | 59.9 | 83.9 | 73.3 | 74.8 | 75.8 | 77.2 | 39.6 | 57.9 | 36.6 | 57.9 | 56.2 | 44.7 |
Dream-v0-Base-7B (Reproduced) |
70.0 | 59.0 | 83.8 | 73.5 | 72.5 | 76.4 | 77.0 | 42.4 | 63.7 | 34.6 | 56.7 | 56.0 | 45.6 |
Table 1. Evaluation results of
Dream-v0-Base-7B
.
| MMLU | MMLU‑Pro | ARC‑C | Hellaswag | GSM8K | Math | GPQA | HumanEval | MBPP | IFEval | |
|---|---|---|---|---|---|---|---|---|---|---|
Dream-v0-Instruct-7B (Official) |
67.0 | 43.3 | — | — | 81.0 | 39.2 | 33.0 | 55.5 | 58.8 | 62.5 |
Dream-v0-Instruct-7B (Reproduced) |
69.8 | 45.5 | 61.4 | 71.8 | 82.0 | 48.6 | 31.5 | 57.9 | 58.2 | 59.7 |
Table 2. Evaluation results of
Dream-v0-Instruct-7B
.
— indicates that the metric is not reported in the official paper.