Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/performance-summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ The performance data includes:

| Model | #-GPUs | GBS | MBS | Sequence Length | FSDP | TP | SP | PP | CP | VP | EP | Model TFLOP / sec / GPU |
|-------|--------|-----|-----|-----------------|------|----|----|----|----|----|----|-------------------------|
|Wan 2.1 14B|32|64|1|37440|0|1|0|1|2|0|0|1,022.26|
|Wan 2.1 14B|32|64|1|37440|0|1|0|1|2|0|0|1,030.67|

#### System: DGX-H100

Expand Down
16 changes: 11 additions & 5 deletions examples/megatron/recipes/wan/README_perf_test.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This guide provides concise steps to set up the environment and run WAN pretrain
## Container Launch

```bash
CONT="nvcr.io/nvidia/nemo:25.09.00"
CONT="nvcr.io/nvidia/nemo:25.11"
MOUNT="/lustre/fsw/:/lustre/fsw/"

srun -t 02:00:00 \
Expand All @@ -28,18 +28,18 @@ cd /opt/

# DFM (pinned)
git clone --no-checkout https://github.com/NVIDIA-NeMo/DFM.git
git -C DFM checkout 174bb7b34de002ebbbcae1ba8e2b12363c7dee01
git -C DFM checkout 9eaace14995a724c982fe53726a909be2edc93cb
export DFM_PATH=/opt/DFM

# Megatron-Bridge (pinned)
rm -rf /opt/Megatron-Bridge
git clone --no-checkout https://github.com/huvunvidia/Megatron-Bridge.git
git -C Megatron-Bridge checkout 713ab548e4bfee307eb94a7bb3f57c17dbb31b50
git clone --no-checkout https://github.com/NVIDIA-NeMo/Megatron-Bridge.git
git -C Megatron-Bridge checkout 953aabf75c0500180dc14a6a76cf9e7e7c4baec7

# Megatron-LM (pinned)
rm -rf /opt/Megatron-LM
git clone --no-checkout https://github.com/NVIDIA/Megatron-LM.git
git -C Megatron-LM checkout ce8185cbbe04f38beb74360e878450f2e8525885
git -C Megatron-LM checkout 2d398b42fd4237fffb553109563d73ac099751c3

# Python path
export PYTHONPATH="${DFM_PATH}/.:/opt/Megatron-Bridge/.:/opt/Megatron-LM"
Expand Down Expand Up @@ -143,6 +143,12 @@ NVTE_FUSED_ATTN=1 torchrun --nproc_per_node=8 examples/megatron/recipes/wan/pret
- Using `--mock` argument.
- Adjust `video_size` (F_latents, H_latents, W_latents) and `number_packed_samples` of `WanMockDataModuleConfig` in `wan.py`. Total `seq_len = F * H * W * number_packed_samples`.

### Reproducing performance recipes

- Please use the appropriate system config recipe in `examples/megatron/recipes/wan/conf/<h100/gb200/gb300>_perf_pretrain_mock.yaml`
- Usage example `examples/megatron/recipes/wan/pretrain_wan.py --mock --training-mode pretrain --config-file examples/megatron/recipes/wan/conf/gb300_perf_pretrain_mock.yaml`
- Note that the FLOPs calculation for Wan 2.1 is not currently supported in Megatron-Bridge. Please use a manual calculator until a fix is made.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we share some of our pre-computed FLOPs here, as a reference point for users? It's not trivial calculating the correct FLOPs number.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that might help as well.


## Inference

```bash
Expand Down