[feature] support mctlassEx bf16 moe by metax-yi1zhang · Pull Request #266 · MetaX-MACA/vLLM-metax

metax-yi1zhang · 2026-06-02T08:39:28Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Yi Zhang (yi1zhang) - Linux Multimedia <yi1zhang@metax-tech.com>

gemini-code-assist

Code Review

This pull request introduces support for bf16 MoE (Mixture of Experts) utilizing the mctlass library, including custom operator registration and routing logic in the fused MoE execution path. The review feedback highlights critical issues where the code assumes mctlass is successfully imported based solely on environment variables, which could lead to runtime crashes if the library is missing; it is recommended to explicitly verify that the mctlass operators are not None. Additionally, several type annotations for scale and bias parameters should be updated to allow None values to prevent static type-checking errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T08:40:45Z

+            if (
+                hidden_states.dtype == torch.bfloat16
+                and not self.quant_config.use_int4_w4a8
+                and not self.quant_config.use_int4_w4a16
+                and not self.quant_config.use_int8_w8a8
+                and not self.quant_config.use_int8_w8a16
+                and mx_envs.MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE
+                and mx_envs.MACA_VLLM_ENABLE_MCTLASS_PYTHON_API
+            ):


The bf16 moe block checks mx_envs.MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE and mx_envs.MACA_VLLM_ENABLE_MCTLASS_PYTHON_API, but it does not verify if the optional mctlassEx library was successfully imported (i.e., mctlass_ops.mctlass_moe_gemm is not None). If these environment variables are enabled but mctlassEx is not installed, this will result in an AssertionError and crash the execution instead of falling back to the default Triton implementation.

Please add a check to ensure mctlass_ops.mctlass_moe_gemm is not None before entering this block.

if ( hidden_states.dtype == torch.bfloat16 and not self.quant_config.use_int4_w4a8 and not self.quant_config.use_int4_w4a16 and not self.quant_config.use_int8_w8a8 and not self.quant_config.use_int8_w8a16 and mx_envs.MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE and mx_envs.MACA_VLLM_ENABLE_MCTLASS_PYTHON_API and getattr(mctlass_ops, "mctlass_moe_gemm", None) is not None ):

gemini-code-assist · 2026-06-02T08:40:45Z

+    elif (
+        A.dtype == torch.bfloat16 and B.dtype == torch.bfloat16 and C.dtype == torch.bfloat16
+        and A_scale is None and B_scale is None
+        and mx_envs.MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE
+        and mx_envs.MACA_VLLM_ENABLE_MCTLASS_PYTHON_API
+    ):


Similar to the issue in triton_moe.py, this elif block will call mctlass_ops.cutlass_moe_mm_bf16 which asserts that mctlass_moe_gemm is not None. If mctlassEx is not installed, this will crash instead of falling back to the else block (Triton kernel).

Please add a check to ensure mctlass_ops.mctlass_moe_gemm is not None in the condition.

elif ( A.dtype == torch.bfloat16 and B.dtype == torch.bfloat16 and C.dtype == torch.bfloat16 and A_scale is None and B_scale is None and mx_envs.MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE and mx_envs.MACA_VLLM_ENABLE_MCTLASS_PYTHON_API and getattr(mctlass_ops, "mctlass_moe_gemm", None) is not None ):

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

[feature] support mctlassEx bf16 moe

e102fbc

Signed-off-by: Yi Zhang (yi1zhang) - Linux Multimedia <yi1zhang@metax-tech.com>

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

metax-yi1zhang and others added 3 commits June 2, 2026 16:42

Update vllm_metax/model_executor/layers/quantization/_python_api_ops.py

9f74a01

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update vllm_metax/model_executor/layers/quantization/_python_api_ops.py

eb62e4c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update vllm_metax/model_executor/layers/quantization/_python_api_ops.py

c9a77b0

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] support mctlassEx bf16 moe#266

[feature] support mctlassEx bf16 moe#266
metax-yi1zhang wants to merge 4 commits into
MetaX-MACA:v0.21.0-devfrom
metax-yi1zhang:v0.21.0-dev

metax-yi1zhang commented Jun 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

metax-yi1zhang commented Jun 2, 2026

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant