[Performance, Hardware] MoE weights padding to AMD MI300x GPUs #1836

HaiShaw · 2024-10-29T22:27:16Z

Motivation

Padding MoE weights (last dim) to minimize Memory Channel Contention (only to AMD Instinct GPUs)
Test shows approximate performance boost of prefill +2.2%, decode +3.0% for Grok-1 on setting: b32/i1024/o512

Modifications

As mentioned: fused_moe.py and layer.py
To enable this feature, set binary flag MOE_PADDING=1 at command line, or export MOE_PADDING=1 in console.

Checklist

[+] Format your code according to the Contributor Guide.
[+] Add unit tests as outlined in the Contributor Guide.
[+] Update documentation as needed, including docstrings or example tutorials.

python/sglang/srt/layers/fused_moe/__init__.py

python/sglang/srt/layers/fused_moe/layer.py

merrymercy · 2024-10-30T00:29:52Z

Please fix the CI.

HaiShaw · 2024-10-30T02:55:03Z

@merrymercy Fixed the CI just now. Thanks!

merrymercy · 2024-10-30T07:34:25Z

python/sglang/srt/layers/fused_moe/layer.py

@@ -572,6 +588,18 @@ def process_weights_after_loading(self, layer: Module) -> None:
                    start += shard_size

            layer.w13_scale = torch.nn.Parameter(max_w13_scales, requires_grad=False)
+            # If ROCm, apply weight padding (min. Mem channel contention) only if set
+            if is_hip() and bool(int(os.getenv("MOE_PADDING", "0"))):


move all is_hip under a single branch.
e.g., L555

@merrymercy understand, the order of data crunching makes me intend to keep dummy padding at very last to avoid error prone situation from intervening normalize_, _dequantize. _fp8_quant, etc., and easier to read.

[Performance, Hardware] MoE weights padding to AMD MI300x GPUs

7022d8c

HaiShaw requested review from merrymercy, Ying1123, zhyncs and ispobock as code owners October 29, 2024 22:27

deal with package import error

48aef22

merrymercy requested changes Oct 29, 2024

View reviewed changes

python/sglang/srt/layers/fused_moe/__init__.py Outdated Show resolved Hide resolved

python/sglang/srt/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

HaiShaw added 2 commits October 29, 2024 19:33

Merge branch 'sgl-project:main' into main

7ed23dc

fix package import error

8e698fc

HaiShaw requested a review from merrymercy October 30, 2024 05:30

merrymercy requested changes Oct 30, 2024

View reviewed changes

Merge branch 'main' into main

c29eb51

merrymercy approved these changes Oct 30, 2024

View reviewed changes

merrymercy merged commit 5f65e2b into sgl-project:main Oct 30, 2024
11 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance, Hardware] MoE weights padding to AMD MI300x GPUs #1836

[Performance, Hardware] MoE weights padding to AMD MI300x GPUs #1836

HaiShaw commented Oct 29, 2024

merrymercy commented Oct 30, 2024 •

edited

Loading

HaiShaw commented Oct 30, 2024

merrymercy Oct 30, 2024

HaiShaw Oct 30, 2024 •

edited

Loading

[Performance, Hardware] MoE weights padding to AMD MI300x GPUs #1836

[Performance, Hardware] MoE weights padding to AMD MI300x GPUs #1836

Conversation

HaiShaw commented Oct 29, 2024

Motivation

Modifications

Checklist

merrymercy commented Oct 30, 2024 • edited Loading

HaiShaw commented Oct 30, 2024

merrymercy Oct 30, 2024

Choose a reason for hiding this comment

HaiShaw Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

merrymercy commented Oct 30, 2024 •

edited

Loading

HaiShaw Oct 30, 2024 •

edited

Loading