Skip to content

[CK][MoE] Add swiglu_oai (OAI SwiGLU) activation to XDL 2-stage MoE epilogue#8749

Open
LJ-underdog wants to merge 1 commit into
developfrom
dev/junlin12/swiglu-oai-moe
Open

[CK][MoE] Add swiglu_oai (OAI SwiGLU) activation to XDL 2-stage MoE epilogue#8749
LJ-underdog wants to merge 1 commit into
developfrom
dev/junlin12/swiglu-oai-moe

Conversation

@LJ-underdog

@LJ-underdog LJ-underdog commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Motivation

Enable the OAI-form SwiGLU activation (swiglu_oai, gate * sigmoid(1.702 * gate) * (up + 1), gpt-oss style) in the Composable Kernel XDL 2-stage MoE path. The MoE gridwise kernel epilogue currently supports only silu/gelu; this adds swiglu_oai so OAI-style MoE models can use this path.

Technical Details

  • gridwise_gemm_xdl_cshuffle_common.hpp: add Activation::swiglu_oai_and_mul = 3.
  • gridwise_moe_gemm.hpp: add the apply_swiglu_oai_activation helper (gate * sigmoid(1.702 * gate) * (up + 1), clamp gate <= 7 and up in [-7, 7], OAI/gpt-oss form) and wire it into all 4 epilogue paths (quant + non-quant x Run / Run_2Lds).
  • The activation is applied in fp32 in the epilogue and is orthogonal to the GEMM compute (MFMA/tile/pipeline untouched) and to quantization (existing per-token dequant reused). Only the non-blockscale gridwise kernel is changed.
  • Consumed by aiter via [MoE] Add swiglu_oai (OAI SwiGLU) for per-token fp8 CK XDL 2-stage MoE aiter#3886 (dispatch + codegen); review/merge together.

Test Plan

Validate the new epilogue branch against a torch fp32 OAI-SwiGLU reference through the aiter per-token fp8 MoE path (op-isolate on gfx942 / MI308X).

Test Result

cos_sim = 0.999993 vs the torch fp32 OAI-SwiGLU reference; no NaN. Confirmed the per-token fp8 path dispatches to this GridwiseMoeGemm kernel (rocprofv3) and runs the swiglu_oai epilogue branch.

Submission Checklist

@LJ-underdog LJ-underdog force-pushed the dev/junlin12/swiglu-oai-moe branch from 03150bb to 7994abc Compare June 24, 2026 02:09
@LJ-underdog LJ-underdog changed the title swiglu_oai activation in XDL 2-stage MoE epilogue (+blockscale) [CK][MoE] Add swiglu_oai (OAI SwiGLU) activation to XDL 2-stage MoE epilogue Jun 24, 2026
@LJ-underdog LJ-underdog marked this pull request as ready for review June 24, 2026 05:50
@LJ-underdog LJ-underdog requested a review from a team as a code owner June 24, 2026 05:50
@LJ-underdog LJ-underdog requested review from Copilot and poyenc June 24, 2026 05:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for the OAI-form SwiGLU activation (swiglu_oai_and_mul) to the Composable Kernel XDL 2-stage MoE epilogue so OAI/gpt-oss style MoE models can use this path.

Changes:

  • Extend the shared Activation enum with swiglu_oai_and_mul = 3.
  • Introduce apply_swiglu_oai_activation(gate, up) implementing gate * sigmoid(1.702 * gate) * (up + 1) with the specified clamps.
  • Wire the new activation into the four MoE epilogue paths (quant/non-quant × Run/Run_2Lds).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
projects/composablekernel/include/ck/tensor_operation/gpu/grid/gridwise_moe_gemm.hpp Adds the OAI SwiGLU helper and integrates the new activation into all relevant epilogue branches.
projects/composablekernel/include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_common.hpp Extends the shared activation enum to include the new OAI SwiGLU option.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +314 to +317
// sigmoid(alpha*gate) = 1 / (1 + exp(-alpha*gate)); to bit-match flatmm float path use
// __builtin_amdgcn_rcpf(1.0f + math::exp(kSwiGluOaiAlpha * -gate)) instead of 1.0f/(...).
const float sig = 1.0f / (1.0f + math::exp(kSwiGluOaiAlpha * -gate));
return gate * sig * (up + 1.0f); // OAI form
Comment on lines 31 to 35
gelu_and_mul = 0,
silu_and_mul = 1,
swiglustep_and_mul = 2
swiglustep_and_mul = 2,
swiglu_oai_and_mul = 3
};
@LJ-underdog LJ-underdog force-pushed the dev/junlin12/swiglu-oai-moe branch from 7994abc to 57a4302 Compare June 24, 2026 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants