Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
chat.py	chat.py
eval.sh	eval.sh
pt.py	pt.py
sample.py	sample.py
sft.py	sft.py

Dream

📄 Paper: Dream 7B: Diffusion Large Language Models ｜ 💻 Code: github.com/DreamLM/Dream

Resources and examples for training (finetuning & pretraining) and evaluating diffusion language models Dream.

Files
Training
Inference
Evaluation

Files

# Pipeline modules relevant to Dream
dllm/pipelines/dream
├── __init__.py                     # Package initialization
├── models/
│   ├── __init__.py
│   ├── configuration_dream.py      # Dream model configuration
│   ├── generation_utils.py         # Diffusion-based generation logic
│   ├── modeling_dream.py           # Core Dream model architecture
│   └── tokenization_dream.py       # Tokenizer implementation for Dream
├── eval.py                         # Evaluation module
├── sampler.py                      # Inference module
├── trainer.py                      # Training module (pretraining and SFT)
└── utils.py                        # Auxiliary utilities and helper functions

# Example entry points for training / inference / evaluation
examples/dream
├── chat.py                         # Interactive inference example
├── eval.sh                         # Automatic evaluation example
├── sample.py                       # Inference example
├── pt.py                           # Pretraining example
├── README.md                       # Documentation (you are here)
└── sft.py                          # Supervised finetuning example

Training

Read Useful tips for training and (optional) Slurm setup before training.

SFT

For example, to SFT Dream-v0-Base-7B on the alpaca dataset for instruction following on 8 GPUs, run:

accelerate launch \
    --config_file scripts/accelerate_configs/fsdp.yaml \
    examples/dream/sft.py \
    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
    --dataset_args "tatsu-lab/alpaca" \
    --max_length 1024 \
    --num_train_epochs 5 \
    --learning_rate 2e-5 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --output_dir ".models/Dream-v0-Base-7B/alpaca"

If you are using slurm and want to train across, for example, 2 nodes (16 GPUs total), run:

sbatch --nodes=2 --gres=gpu:8 scripts/train.slurm.sh \
    --accelerate_config "fsdp" \
    --script_path "examples/dream/sft.py" \
    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
    --dataset_args "tatsu-lab/alpaca" \
    --max_length 1024 \
    --num_train_epochs 5 \
    --learning_rate 2e-5 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --output_dir ".models/Dream-v0-Base-7B/alpaca"

Reproducing `Dream-v0-Instruct-7B` with SFT

We tried our best to reproduce Dream-v0-Instruct-7B by finetuning Dream-v0-Base-7B with SFT on the allenai/tulu-3-sft-mixture dataset:

# Preprocessing SFT data (optional, but can avoid redundant preprocessing for multi-node training)
python dllm/tools/preprocess_sft_dataset.py \
    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
    --sft_map_fn_path "dllm.utils.default_sft_map_fn" \
    --dataset_args "allenai/tulu-3-sft-mixture" \
    --output_dir ".data/sft/dream/tulu-3-sft-mixture" \
    --num_proc 64

# Train on 24*8=192 A100s with FSDP, take about 8 hours
sbatch --nodes=24 --gres=gpu:8 scripts/train.slurm.sh \
    --accelerate_config "fsdp" \
    --script_path "examples/dream/sft.py" \
    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
    --dataset_args ".data/sft/dream/tulu-3-sft-mixture" \
    --load_preprocessed_data True \
    --max_length 1024 \
    --num_train_epochs 5 \
    --learning_rate 2e-5 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --output_dir ".models/Dream-v0-Base-7B/tulu-3-sft-mixture"

Pretraining

Pretrain on mlfoundations/dclm-baseline-1.0 from scratch using 192 GPUs (24x8) and FSDP:

sbatch --nodes=24 --gres=gpu:8 scripts/train.slurm.sh \
    --accelerate_config "fsdp" \
    --script_path "examples/dream/pt.py" \
    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
    --dataset_args "mlfoundations/dclm-baseline-1.0" \
    --max_length 1024 \
    --max_steps 2000 \
    --learning_rate 1e-4 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --output_dir ".models/Dream-v0-Base-7B/dclm-baseline-1.0"

Inference

We support batch inference for standard sampling and infilling:

python examples/dream/sample.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B"

We also support interactive multi-turn dialogue with visualization:

python examples/dream/chat.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B"

Evaluation

Read (optional) Evaluation setup before running evaluation.

For example, to evaluate Dream-v0-Instruct-7B on gsm8k using 4 GPUs, run:

# Use model_args to adjust the sampler arguments for evaluation.
accelerate launch --num_processes 4 \
    dllm/pipelines/dream/eval.py \
    --tasks "gsm8k_cot" \
    --model "dream" \
    --apply_chat_template \
    --num_fewshot 0 \
    --model_args "pretrained=Dream-org/Dream-v0-Instruct-7B,max_new_tokens=256,steps=256,temperature=0.1,top_p=0.9,alg=entropy,dtype=bfloat16,add_bos_token=False"

To automatically evaluate Dream-v0-Base-7B and Dream-v0-Instruct-7B on all benchmarks, run:

bash examples/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Instruct-7B" --instruct True
bash examples/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Base-7B" --instruct False

For Fast-dLLM sampling and evaluation with Dream, see the Fast-dLLM README.

Evaluation results

Results (Reproduced) are evaluated using our framework, while results (Official) come from the original paper. All evaluation settings follow the configurations in the Dream repository, with minor adjustments.

	MMLU	ARC‑C	ARC‑E	Hellaswag	WinoGrande	PIQA	GSM8K	Math	BBH	GPQA	HumanEval	MBPP	RACE
`Dream-v0-Base-7B` (Official)	69.5	59.9	83.9	73.3	74.8	75.8	77.2	39.6	57.9	36.6	57.9	56.2	44.7
`Dream-v0-Base-7B` (Reproduced)	70.0	59.0	83.8	73.5	72.5	76.4	77.0	42.4	63.7	34.6	56.7	56.0	45.6

Table 1. Evaluation results of Dream-v0-Base-7B .

	MMLU	MMLU‑Pro	ARC‑C	Hellaswag	GSM8K	Math	GPQA	HumanEval	MBPP	IFEval
`Dream-v0-Instruct-7B` (Official)	67.0	43.3	—	—	81.0	39.2	33.0	55.5	58.8	62.5
`Dream-v0-Instruct-7B` (Reproduced)	69.8	45.5	61.4	71.8	82.0	48.6	31.5	57.9	58.2	59.7

Table 2. Evaluation results of Dream-v0-Instruct-7B . — indicates that the metric is not reported in the official paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Dream

Table of Contents

Files

Training

SFT

Reproducing `Dream-v0-Instruct-7B` with SFT

Pretraining

Inference

Evaluation

Evaluation results

FilesExpand file tree

dream

Directory actions

More options

Directory actions

More options

Latest commit

History

dream

Folders and files

parent directory

README.md

Dream

Table of Contents

Files

Training

SFT

Reproducing Dream-v0-Instruct-7B with SFT

Pretraining

Inference

Evaluation

Evaluation results

Reproducing `Dream-v0-Instruct-7B` with SFT