Skip to content

Commit 750ef28

Browse files
authored
no message (#146)
1 parent 01f4540 commit 750ef28

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

blog/2025-05-05-large-scale-ep.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ We evaluated the end-to-end performance of different configurations of SGLang us
243243

244244
- **SGLang with TP16 x 6**: Every two nodes are paired with an independent group, running DeepSeek-V3 inference with a TP size of 16 and DP attention.
245245
- **SGLang with PD Disaggregation**: This version incorporates PD disaggregation and full EP optimization. For the EPLB, we adopt a distribution matching the input/output data, as real-time serving statistics are unavailable.
246-
- **SGLang with PD Disaggregation and simulated MTP**: To simulate MTP’s effects, we firstly double the batch size and halve the Key-Value KV cache length to maintain the same workload for GroupedGeMM computation and memory access. Moreover, we insert dummy kernels after the real attention computation to ensure the attention phase takes the same time as in DeepSeek’s profile, accurately reflecting the slowdown caused by MTP’s attention mechanism. We conservatively assume a 60% acceptance rate under MTP.
246+
- **SGLang with PD Disaggregation and simulated MTP**: To simulate MTP’s effects, we firstly double the batch size and halve the Key-Value KV cache length to maintain the same workload for GroupedGeMM computation and memory access. Moreover, we insert dummy kernels after the real attention computation to ensure the attention phase takes the same time as in DeepSeek’s profile, accurately reflecting the slowdown caused by MTP’s attention mechanism. We conservatively assume a 70% acceptance rate under MTP.
247247
- **DeepSeek Profile Results**: Throughput estimates are derived from [DeepSeek’s official profiling data](https://github.com/deepseek-ai/profile-data).
248248

249249
##### Performance Analysis of Prefill and Decode Phases
@@ -277,7 +277,7 @@ For decode, the results are shown below:
277277

278278
| | DeepSeek Blog | DeepSeek Profile | SGLang (Default) | SGLang + Simulated MTP (Slow Attention) |
279279
| --------------------- | ------------- | ---------------- | ---------------- | --------------------------------------- |
280-
| Batch Size | 128 | 128 | 256 | 128 |
280+
| Batch Size | N/A | 128 | 256 | 128 |
281281
| KV Cache Length | 4,989 | 4,096 | 2,000 | 4,000 |
282282
| Number of Nodes | 18 | 16 | 9 | 9 |
283283
| Throughput (per node) | 14,800 | 18,598 | 22,282 | 17,373 |

0 commit comments

Comments
 (0)