Skip to content

Huvu/reve optim#102

Closed
huvunvidia wants to merge 141 commits intomainfrom
huvu/reve_optim
Closed

Huvu/reve optim#102
huvunvidia wants to merge 141 commits intomainfrom
huvu/reve_optim

Conversation

@huvunvidia
Copy link
Copy Markdown
Contributor

No description provided.

chtruong814 and others added 30 commits June 13, 2025 14:12
Signed-off-by: Charlie Truong <[email protected]>
Signed-off-by: Ethan He <[email protected]>
Signed-off-by: Ethan He <[email protected]>
The sparse_attention directory was incorrectly configured as a submodule
without a corresponding .gitmodules file, causing CI/CD checkout failures.
This commit converts it to a regular directory with tracked files.

Signed-off-by: Ethan He <[email protected]>
cp
Signed-off-by: Ethan He <[email protected]>
add diffusion;physical ai projects
Signed-off-by: oliver könig <[email protected]>
…efault-templates

Delete .github/ISSUE_TEMPLATE directory
chore: Update cherry-pick workflow to use v0.63.0
Initial commit related to dfm repo structure

Very nice structure and merge to work on other MR landings.
linnanwang and others added 21 commits November 20, 2025 16:39
* update

Signed-off-by: linnan wang <[email protected]>

* update

Signed-off-by: linnan wang <[email protected]>

* update

Signed-off-by: linnan wang <[email protected]>

* update

Signed-off-by: linnan wang <[email protected]>

* update

Signed-off-by: linnan wang <[email protected]>

---------

Signed-off-by: linnan wang <[email protected]>
* first commit

* workable code

* workable thd

* clean up, remove all CP for sbhd, CP now is only for thd

* run outside of Mbridge

* Update example scripts and add new data module for multimodal datasets

- Added comments to clarify file purposes in example_commands.sh, inference_wan.py, pretrain_wan.py, wan_provider.py, wan_step.py, and wan.py.
- Introduced EnergonMultiModalDataModule for handling multimodal datasets in nemo_vfm.
- Created SequentialMegatronSampler for efficient sequential sampling in large datasets.
- Added new files for DIT attention and base data handling.

This commit enhances documentation and introduces new functionalities for better data management and processing.

* workable code before refactoring

* refactor attention submodules + reorder files locations

* update refactor

* update refactor

* reorganize files

* reorganize files

* refactoring code

* add README for perf test

* using vae, t5, scheduler from Diffusers

* update repo, remove Wan's Github moduels

* fix Ruff

* fix ruff + copyright

* fix Ruff + Lint

* fix Ruff + Lint

* fix Ruff + Lint

* fix Ruff + Lint

* fix Ruff + Lint

* fix Ruff + Lint

* fix Ruff + Lint

* fix Ruff + Lint

* merged main + address comments

* remove example_commands.md, Google waits until mid Nov

* refactor inference_configs + mockdatamodule

* add dit_embeddings.py

* fix lint ruff

* add 'average_gradients_across_tp_domain' to torch.nn for when running sequence_parallelism

* add english negative prompt

* fix ruff lint

* Update uv.lock for deps: diffusers==0.35.1, easydict, imageio

* update dfm/src/megatron/data/dit

* change english negative prompt

* seem to workable seq_packing

* refactor with Sajad's PR - DiT data to common dir

* fix Ruff, lint

* fix Ruff, lint

* fix Ruff, lint

* workable mock datamodule (doesn't need setting path); updated training algo + hyper-parameters aligning with Linnan; tested training with anime dataset finetung

* bring wan_task encoders features to common, sharing with dit

* lint, ruff

* lint, ruff

* lint, ruff

* fix CP error (input of thd_split_inputs_cp to be cu_seqlens_q_padded instead of cu_seqlens_q)

* udpate README_perf_test.md

* fix lint, ruff

* update uv.lock, merge main

* uv.lock

* uv.lock

* uv.lock

* update uv.lock [using ci]

* Performance improvements to Wan

* Perf optimizations

* Tiny fix

* Remove CP disable as packed sequences not supported

* Fix comment

* Minor fixes. Revert video_latent comparison

* Fix missed check

* Lint fix

* H100 mock pretraining perf config

* Rename config file

* Lint check

Signed-off-by: Parth Mannan <[email protected]>

* Adding GB200 perf config

Signed-off-by: Parth Mannan <[email protected]>

* GB300 perf config

Signed-off-by: Parth Mannan <[email protected]>

* Refactor Energon data module to return wrapped dataloaders and add EnergonDataloader class for cyclic iteration. Introduce WAN pretrain mock data configuration for testing.

* Enhance DiffusionTaskEncoder to handle None attributes in stacking and concatenation methods. Add WAN pretrain mock data configuration for testing purposes.

* Refactor data processing in dit_data_step to simplify batch retrieval and update WAN pretrain configuration to include train_iters.

* Add op fusions

Signed-off-by: Parth Mannan <[email protected]>

* Update H100 config

Signed-off-by: Parth Mannan <[email protected]>

* Fix lint

Signed-off-by: Parth Mannan <[email protected]>

* Resolve conflict

Signed-off-by: Parth Mannan <[email protected]>

* Fix for mock dataloader test

Signed-off-by: Parth Mannan <[email protected]>

* Fix Dummyiter

Signed-off-by: Parth Mannan <[email protected]>

* Fix test

Signed-off-by: Parth Mannan <[email protected]>

* Make RoPE test only GPU

Signed-off-by: Parth Mannan <[email protected]>

* Rope cuda fix

Signed-off-by: Parth Mannan <[email protected]>

---------

Signed-off-by: Parth Mannan <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Abhinav Garg <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Signed-off-by: linnan wang <[email protected]>
* add focs

* updated README for Wab

* update README wan

* relocate teadme

---------

Co-authored-by: Huy Vu2 <[email protected]>
* Add DiT Readme.

Signed-off-by: sajadn <[email protected]>

* Update DiT readme.

Signed-off-by: Sajad Norouzi <[email protected]>

* Minor wording update.

Signed-off-by: Sajad Norouzi <[email protected]>

---------

Signed-off-by: sajadn <[email protected]>
Signed-off-by: Sajad Norouzi <[email protected]>
* inital commit, workable code

* add example

* fix lint

* fix lint

* bring all wan related codes to DFM

* add tests

* lint

---------

Co-authored-by: Huy Vu2 <[email protected]>
* Initial README commit

* Update README and add performance summary documentation

- Corrected the link in the README for the performance summary to point to the correct file.
- Introduced a new `performance-summary.md` document detailing performance benchmarks for large language models using DFM, including nomenclature, performance metrics, and system configurations.

* add DiT megatron links.

Signed-off-by: sajadn <[email protected]>

* Performance Docs update

Signed-off-by: Parth Mannan <[email protected]>

* Performance Docs update fix

Signed-off-by: Parth Mannan <[email protected]>

* Update README to enhance clarity and accuracy

- Removed redundant description of the framework.
- Clarified the relationship between Megatron Bridge and Megatron Core in the Dual-Path Architecture section.

* Enhance README with detailed performance optimizations and parallelism descriptions

- Updated the Megatron Bridge Path section to include 6D parallelism details.
- Added state-of-the-art performance optimizations to the Dual Training Paths section.
- Clarified parallelism terminology in the comparison table for better understanding.

* Update perf doc

Signed-off-by: Parth Mannan <[email protected]>

* update

Signed-off-by: linnan wang <[email protected]>

* Update README with fine-tuning command

Removed TODO comment and added a command for fine-tuning a video diffusion model.

* Apply suggestion from @akoumpa

* Apply suggestion from @akoumpa

* Apply suggestion from @akoumpa

* Update README, Wan-related.

Updated command syntax and improved clarity in README.

* Apply suggestion from @akoumpa

* Fixing typo @akoumpa

* fix automodel section

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* update DFM-specific readme

Signed-off-by: Pablo Garay <[email protected]>

* Update performance-summary.md

Thanks a lot @linnanwang for the bench numbers.

* Update performance-summary.md

* Update performance-summary.md

* Update README.md

Co-authored-by: Wenwen Gao <[email protected]>

* Update README.md

Co-authored-by: Wenwen Gao <[email protected]>

* Update README.md

Co-authored-by: Wenwen Gao <[email protected]>

* Update README.md

Co-authored-by: Wenwen Gao <[email protected]>

* Refactor README.md and performance-summary.md for clarity and conciseness

- Simplified descriptions of Megatron Bridge and AutoModel paths in README.md.
- Removed outdated comparison table to streamline content.
- Updated performance-summary.md to generalize model references and improve clarity.

Co-authored-by: Wenwen Gao <[email protected]>

* Fix typo in README.md: changed "Built" to "Build" in the container section header for consistency.

---------

Signed-off-by: sajadn <[email protected]>
Signed-off-by: Parth Mannan <[email protected]>
Signed-off-by: linnan wang <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Pablo Garay <[email protected]>
Co-authored-by: sajadn <[email protected]>
Co-authored-by: Parth Mannan <[email protected]>
Co-authored-by: linnan wang <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Huy Vu <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Wenwen Gao <[email protected]>
* report for public version

* fix image size

* Update report.md for Wan 2.1 convergence comparison, correcting formatting and ensuring clarity in experiment overview and caveats regarding training loss fluctuations between Diffusers and Megatron-Core implementations.

---------

Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Abhinav Garg <[email protected]>
- Introduced a new document detailing the comparison between Diffusers (Automodel path) and Megatron-Core (Megatron-Bridge path) for Wan 2.1.
- Included experiment overview, dataset specifications, training setup, and results with visual training curves.
- Added two binary images illustrating loss vs. steps for both text-to-image and text-to-video stages.

This documentation aims to provide insights into the model's performance and training dynamics during the partial convergence test.
* edm and data preprocess tests.

Signed-off-by: sajadn <[email protected]>

* Minor cleanings for DiT.

Signed-off-by: Sajad Norouzi <[email protected]>

* add dit unit test.

Signed-off-by: Sajad Norouzi <[email protected]>

* add iter to the DiffusionDataModule.

Signed-off-by: sajadn <[email protected]>

* add missing copyright.

Signed-off-by: sajadn <[email protected]>

* use 'no caption' if caption is not present.

Signed-off-by: sajadn <[email protected]>

* fix dit inference bug. Add wanbd to inference code.

Signed-off-by: sajadn <[email protected]>

* update the DiT configs to be aligned with the original paper.

Signed-off-by: sajadn <[email protected]>

* add wandb[video] and mediapy to uv.

Signed-off-by: sajadn <[email protected]>

* adjust pos_ids in mock_dataset to have batch dimension, fuse adaLN layers, use DiTSelfAttention.

Signed-off-by: sajadn <[email protected]>

* fix the diffusion sample size bug.

Signed-off-by: sajadn <[email protected]>

* fix broken tests.

Signed-off-by: sajadn <[email protected]>

---------

Signed-off-by: sajadn <[email protected]>
Signed-off-by: Sajad Norouzi <[email protected]>
Co-authored-by: Abhinav Garg <[email protected]>
- Add pre-flight job to detect docs-only changes using FW-CI-templates
- Skip cicd-wait-in-queue, unit tests, and e2e tests when docs_only is true
- Skip copyright-check when docs_only is true
- Skip build-test-publish-wheel when docs_only is true
- Linting and ruff checks remain enabled for all PRs

Signed-off-by: Pablo Garay <[email protected]>
* initial commit

* update import EnergonMultiModalDataModule

* update submodule Megatron-Bridge

* Update uv.lock [skip ci]

* update uv.lock

* small update

* small update

---------

Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Adding support for hunyuan finetuning

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Ensuring that activation checkpointing is gated with a flag

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Make the flow matching pipeline logic model agnostic

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Adding copyright to dataset processing file

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Linting fixes

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Fix linting

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* lintfix

Signed-off-by: Pablo Garay <[email protected]>

* lintfix

Signed-off-by: Pablo Garay <[email protected]>

* Update automodel dependencies

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Remove unused import

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Setting the minimum diffusers package version

Signed-off-by: Pranav Prashant Thombre <[email protected]>

---------

Signed-off-by: Pranav Prashant Thombre <[email protected]>
Signed-off-by: Pablo Garay <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
* workable code prepare_dataset_wan.py - tested matching automodel's preprocess_resize.py; encode-decode verified

* fix linr

* change location of prepare_dataset_wan.py

---------

Co-authored-by: Huy Vu2 <[email protected]>
* Add more comprehensive testing for the automodel path

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Adding unit tests for the flow matching pipeline

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Adding functional test for Wan

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Fixing linting errors

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Linting fixes

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Increase test timeout

Signed-off-by: Pranav Prashant Thombre <[email protected]>

* Remove flash attention3 as the default attention backend during training

Signed-off-by: Pranav Prashant Thombre <[email protected]>

---------

Signed-off-by: Pranav Prashant Thombre <[email protected]>
Added instructions for converting HuggingFace checkpoints to Megatron format and vice versa, including necessary commands and notes on exported checkpoints.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants