Refactor communication logic of DeepSeek for extensibility and understandability #6321

fzyzcjy · 2025-05-15T09:15:56Z

Motivation

Make code clean, not error-prune, extensible

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

This reverts commit 6c5c726.

python/sglang/srt/layers/communicator.py

ch-wan · 2025-05-17T01:30:59Z

python/sglang/srt/layers/communicator.py

+            context=self._compute_context(forward_batch),
+        )
+
+    def forward_layer_end(


forward_post_ffn

wondering why that is better than layer_end

Indeed it does the job of handling layer end transformations (e.g. make output shape correct for last layer). For non-last layers, it does not cooperate with pre_mlp, but indeed cooperate with next layer's pre_attn. So a verbose name (don't use it - only for illustration) should be forward_layer_end_transformation_or_cooperate_with_next_pre_attn

python/sglang/srt/models/deepseek_v2.py

ch-wan · 2025-05-17T02:02:21Z

python/sglang/srt/models/deepseek_v2.py

-            self._enable_moe_dense_fully_dp()
-            and (not self.info.is_sparse)
+            enable_moe_dense_fully_dp()
+            and (not self.is_layer_sparse)


Checking enable_moe_dense_fully_dp() and (not self.is_layer_sparse) should be handled by the communicator. We can simplify this statement like this:

if hidden_states.shape[0] > 0 or self.layer_communicator.require_ffn_sync():

I am a bit confused about this, why is it related to communicators?

I originally wanted to move it inside DeepseekMLP with logic like "if tp_size=1 + hidden states is empty, then I do not compute" though. -> This is moved now to make code clearer (though it does not fit the PR title and I originally want to separately handle it).

ch-wan · 2025-05-17T02:04:08Z

python/sglang/srt/models/deepseek_v2.py

+        self.is_layer_sparse = self._is_layer_sparse(layer_id, is_nextn=is_nextn)
+        is_previous_layer_sparse = self._is_layer_sparse(layer_id - 1, is_nextn=False)
+
+        self.layer_scatter_modes = LayerScatterModes.init_new(


It can be handled internally in LayerCommunicator. With this change, LayerScatterModes becomes a private class of communicator.py.

I make it exposed to allow in the future the LayerScatterModes to be used by modules other than LayerCommunicator, since LayerScatterModes tells us some facts about how layer input / attn / mlp / layer out shape looks like.

python/sglang/srt/models/deepseek_v2.py

# Conflicts: # python/sglang/srt/models/deepseek_v2.py

…tandability (sgl-project#6321)

fzyzcjy added 30 commits May 15, 2025 14:00

more

4c300a0

more

99dfb80

more

4c80649

more

a6f4911

more

802cbab

more

abe5c5f

more

037d1cb

more

30826d4

more

e2fd04b

more

5813212

more

45490cf

more

58ede60

more

e0c822d

more

8ee0142

more

5180336

more

e023a24

more

953d8d7

more

e0abc40

more

7f313a5

more

470589b

more

9983068

more

8640873

more

af4c1a5

more

70ef51a

more

86bb5da

rm

fb05193

more

60fb5e7

more

6c5c726

Revert "more"

352023d

This reverts commit 6c5c726.

fmt

ab2ff19

ch-wan self-assigned this May 17, 2025

ch-wan requested changes May 17, 2025

View reviewed changes

Merge branch 'main-upstream' into feat/refactor_comm_large

1e9dd1d

# Conflicts: # python/sglang/srt/models/deepseek_v2.py

fzyzcjy requested review from zhaochenyang20 and BBuf as code owners May 17, 2025 02:20

fzyzcjy added 10 commits May 17, 2025 10:21

copyright

912fb5f

copyright

b023e9d

simp

5c07b64

simp

2306793

more

e6f0340

more

650dd7c

more

caa6a8d

more

2af010f

more

2f9a5f0

rename

6a97bc6

ch-wan approved these changes May 17, 2025

View reviewed changes

yizhang2077 mentioned this pull request May 17, 2025

Refactor DeepSeek logic into atomic operations #6326

Merged

6 tasks

fzyzcjy and others added 3 commits May 18, 2025 08:15

Merge branch 'main' into feat/refactor_comm_large

a7905ea

Merge branch 'main' into feat/refactor_comm_large

583a4fa

Merge branch 'main' into feat/refactor_comm_large

8c2fbde

zhyncs added the high priority label May 18, 2025

fzyzcjy and others added 5 commits May 19, 2025 07:44

Merge branch 'main' into feat/refactor_comm_large

8dfeca8

Merge branch 'main' into feat/refactor_comm_large

527dc87

Merge branch 'main' into feat/refactor_comm_large

33638c6

Merge branch 'main-upstream' into feat/refactor_comm_large

8b47090

# Conflicts: # python/sglang/srt/models/deepseek_v2.py

fix

7b67983

zhyncs merged commit 1b19df4 into sgl-project:main May 20, 2025
1 of 37 checks passed

coco-alen pushed a commit to AI-Agentic/sglang that referenced this pull request May 20, 2025

Refactor communication logic of DeepSeek for extensibility and unders…

65c5828

…tandability (sgl-project#6321)

lifuhuang pushed a commit to lifuhuang/sglang that referenced this pull request May 23, 2025

Refactor communication logic of DeepSeek for extensibility and unders…

56b937e

…tandability (sgl-project#6321)

yizhang2077 mentioned this pull request May 25, 2025

refactor qwen moe code, use communicator to support tp+dp #6581

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor communication logic of DeepSeek for extensibility and understandability #6321

Refactor communication logic of DeepSeek for extensibility and understandability #6321

Uh oh!

fzyzcjy commented May 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ch-wan May 17, 2025

Uh oh!

fzyzcjy May 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

ch-wan May 17, 2025

Uh oh!

fzyzcjy May 17, 2025 •

edited

Loading

Uh oh!

ch-wan May 17, 2025

Uh oh!

fzyzcjy May 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Refactor communication logic of DeepSeek for extensibility and understandability #6321

Refactor communication logic of DeepSeek for extensibility and understandability #6321

Uh oh!

Conversation

fzyzcjy commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ch-wan May 17, 2025

Choose a reason for hiding this comment

Uh oh!

fzyzcjy May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ch-wan May 17, 2025

Choose a reason for hiding this comment

Uh oh!

fzyzcjy May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ch-wan May 17, 2025

Choose a reason for hiding this comment

Uh oh!

fzyzcjy May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fzyzcjy commented May 15, 2025 •

edited

Loading

fzyzcjy May 17, 2025 •

edited

Loading

fzyzcjy May 17, 2025 •

edited

Loading

fzyzcjy May 17, 2025 •

edited

Loading