feat: add DualPipe bidirectional pipeline schedule#1157
Open
lishuangyuly wants to merge 1 commit intoflagos-ai:mainfrom
Open
feat: add DualPipe bidirectional pipeline schedule#1157lishuangyuly wants to merge 1 commit intoflagos-ai:mainfrom
lishuangyuly wants to merge 1 commit intoflagos-ai:mainfrom
Conversation
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
FlagScale lacked support for the DualPipe bidirectional pipeline schedule introduced in DeepSeek-V3, which overlaps forward/backward computation across both pipeline directions to reduce bubble ratio compared to standard 1F1B.
New file:
dualpipe_schedule.pyImplements the 8-step DualPipe algorithm as a drop-in replacement for Megatron's
get_forward_backward_func()output:WeightGradStore– defers weight-gradient computation to dedicated "W" steps, enabling zero-bubble scheduling_split_data_iterator– pre-buffers and splits a single data iterator into two halves (one per pipeline direction) without upstream data-pipeline changesforward_backward_dualpipe()– the schedule itself; same keyword interface as Megatron's built-in schedulesModel building (
training.py)When
--use-dualpipeis active each rank is allocated two model chunks:chunk[0]at pipeline positionpp_rank(forward direction, rank 0 → N-1)chunk[1]at pipeline positionN-1-pp_rank(mirror direction, rank N-1 → 0)A new
_fs_get_forward_backward_func()helper selects the DualPipe schedule or falls back to Megatron's standard selector — no change in behaviour when the flag is absent.Configuration flag (
arguments_fs.py)--use-dualpipewith validation:pipeline_model_parallel_size> 1--num-layers-per-virtual-pipeline-stage) and--use-dualpipev--untie-embeddings-and-output-weightsnum_microbatches≥pipeline_model_parallel_size × 2Usage