This project provides tools for emulating and visualizing pipeline parallelism strategies used in large language model training.
Try it online! This tool is deployed and accessible on Hugging Face Spaces:
🔗 https://huggingface.co/spaces/Victarry/PP-schedule-visualizer
No installation required - just visit the link and start exploring pipeline parallelism scheduling strategies directly in your browser!
Pipeline parallelism is a technique used to train large models by partitioning the model across multiple devices and processing data in a pipelined fashion. This project allows you to:
- Simulate different pipeline parallelism strategies (1F1B, Interleaved, Zero-Bubble, etc.)
- Visualize the execution schedule on multiple devices
- Compare different strategies for efficiency
-
Supported Pipeline Strategies:
- 1F1B (One-Forward-One-Backward)
- Interleaved 1F1B
- Zero-Bubble 1F1B (ZB-1P)
- 1F1B with computation-communication overlap
- Interleaved 1F1B with computation-communication overlap
- DualPipe (Bidirectional pipeline parallelism with full forward-backward overlap)
-
Visualization:
- Interactive visualization dashboard using Plotly/Dash
-
Configuration:
- Configurable simulation parameters through Hydra
- Customizable stage latency and communication costs
This project uses uv for dependency management.
Setup uv
if not installed on your computer:
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
To visualize schedules interactively:
uv run app.py
This will start a Dash server (usually on http://127.0.0.1:8050/
). Open this URL in your web browser.
You can then adjust parameters like the number of devices, stages, batches, operation times, and select different scheduling strategies to see the resulting pipeline visualization.
uv run python main.py strategy=1f1b num_devices=4 num_stages=4 num_batches=8
uv run python main.py strategy=interleave num_devices=4 num_stages=8 num_batches=8
You can optionally setting microbatch_group_size_per_vp_stage
.
uv run python main.py strategy=zb1p num_devices=4 num_stages=4 num_batches=8
uv run python main.py strategy=dualpipe num_devices=8 num_stages=8 num_batches=20
uv run python main.py strategy=dualpipe_v num_devices=4 num_stages=8 num_batches=10
uv run python main.py strategy=1f1b_overlap num_devices=4 num_stages=4 num_batches=8
uv run python main.py strategy=1f1b_interleave_overlap num_devices=4 num_stages=8 num_batches=8
The default configuration is in conf/config.yaml
. You can override any parameter on the command line or create configuration groups for different scenarios.
You can override specific parameters at runtime:
uv run python main.py op_times.forward=0.5 op_times.backward=1.0 num_batches=6
Use DualPipe as an example, you can manually set different time for forward/backward/backward_D/backward_W/overlapped_forward_backward:
uv run python main.py strategy=dualpipe num_devices=8 num_stages=8 num_batches=32 op_times.forward=1.0 op_times.backward=2.0 op_times.backward_D=1.0 op_times.backward_W=1.0 op_times.overlapped_forward_backward=2.5
You can use different configuration files with Hydra in several ways:
-
Create multiple configuration files in the
conf
directory for different use cases:conf/ ├── config.yaml # Default configuration └── model_A.yaml # Create your own config with stage-specific latency for performance projection
-
Run with your desired configuration using the
--config-name
flag:uv run python main.py --config-name=model_A
PP-Emulation/
├── conf/ # Hydra configuration files
│ └── config.yaml # Default configuration
├── src/ # Source code
│ ├── __init__.py # Package initialization
│ ├── execution_model.py # Schedule execution models
│ ├── strategies.py # Pipeline parallelism strategies
│ └── visualizer.py # Visualization utilities
├── main.py # Main entry point
├── pyproject.toml # Project metadata and dependencies
└── README.md # This file
- PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arxiv
- Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. arxiv
- Zero Bubble Pipeline Parallelism. arxiv
- Communication-Computation Overlap in MoE Training with 1F1B Pipeline Parallelism. blog
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.