Fixed invalid warpSize in host code #2

dongmin-ra · 2025-10-02T07:26:07Z

Motivation

Fixed invalid warpSize in host(.cpp) code

Technical Details

Problem

Right after this PR, GPU memory access fault occurred during internode EP execution.

Cause

The cause is that this PR changed the logic to set warpSize to 64 only when __GFX8__ or __GFX9__ is defined, and otherwise to 32.
- These macro variables are defined automatically by the compiler (amd clang++) when compiling device code.
- However, when compiling host code (.cpp files), these macros are not defined.
As a result, inside kernel code warpSize is set to 64, but in host code (e.g., dispatch_combine.cpp) it is set to 32.
When launching the dispatch and combine kernels, the block dimension is set as warpSize * actualWarpNumPerBlock.
- Since warpSize was incorrectly set to 32 in the host code, the block dimension ended up being half of the intended value, which caused the error.

Fix

Explicitly define __GFX8__ or __GFX9__ in the CMake configuration based on the detected GPU architecture.

Test Plan

Apply the following changes to the examples/ops/dispatch_combine/test_dispatch_combine_internode.py file

diff --git a/examples/ops/dispatch_combine/test_dispatch_combine_internode.py b/examples/ops/dispatch_combine/test_dispatch_combine_internode.py
index 55d4ef2..bcf87e8 100644
--- a/examples/ops/dispatch_combine/test_dispatch_combine_internode.py
+++ b/examples/ops/dispatch_combine/test_dispatch_combine_internode.py
@@ -45,7 +45,7 @@ class EpDispatchCombineTestCase:
             num_experts_per_rank=16,
             # num_experts_per_rank=256 // world_size,
             num_experts_per_token=8,
-            warp_num_per_block=16,
+            warp_num_per_block=1,
             block_num=64,
             max_token_type_size=2,
             kernel_type=mori.ops.EpDispatchCombineKernelType.InterNode,

Execute the example script

export MORI_DISABLE_P2P=1
torchrun --local-ranks-filter 0 \
                --role rank \
                --nnodes=1 \
                --node_rank=0 \
                --nproc_per_node=1 \
                --master_addr=127.0.0.1 \
                --master_port=1234 \
                examples/ops/dispatch_combine/test_dispatch_combine_internode.py --max-tokens 16

Test Result

Before modification : Memory acces fault error occurs.

Memory access fault by GPU node-3 (Agent handle: 0x820f5b0) on address 0x7efde4c00000. Reason: Unknown.
Memory access fault by GPU node-7 (Agent handle: 0xa625520) on address 0x7f6177600000. Reason: Unknown.
Memory access fault by GPU node-4 (Agent handle: 0x9df4740) on address 0x7f8066e00000. Reason: Unknown.
Memory access fault by GPU node-2 (Agent handle: 0x86d2ed0) on address 0x7ee328c00000. Reason: Unknown.
Memory access fault by GPU node-6 (Agent handle: 0x9f13da0) on address 0x7fae86a00000. Reason: Unknown.
Memory access fault by GPU node-8 (Agent handle: 0x9aa4c00) on address 0x7ef5c8c00000. Reason: Unknown.
Memory access fault by GPU node-9 (Agent handle: 0x976c110) on address 0x7f0b94c00000. Reason: Unknown.
Memory access fault by GPU node-5 (Agent handle: 0x8cb5700) on address 0x7fb4e7600000. Reason: Unknown.

After modification : no error occurs.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

hnts03-moreh

LGTM

kyuhyeon-an

👍

fixed invalid warpSize in host code

8e16beb

gitgod-bot assigned dongmin-ra Oct 2, 2025

dongmin-ra requested a review from hnts03-moreh October 2, 2025 07:26

hnts03-moreh marked this pull request as ready for review October 2, 2025 07:29

hnts03-moreh approved these changes Oct 2, 2025

View reviewed changes

kyuhyeon-an approved these changes Oct 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fixed invalid warpSize in host code #2

Fixed invalid warpSize in host code #2

Uh oh!

dongmin-ra commented Oct 2, 2025 •

edited

Loading

Uh oh!

hnts03-moreh left a comment

Uh oh!

kyuhyeon-an left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Fixed invalid warpSize in host code #2

Are you sure you want to change the base?

Fixed invalid warpSize in host code #2

Uh oh!

Conversation

dongmin-ra commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

hnts03-moreh left a comment

Choose a reason for hiding this comment

Uh oh!

kyuhyeon-an left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dongmin-ra commented Oct 2, 2025 •

edited

Loading