Skip to content

[feature] support mctlassEx bf16 moe#266

Open
metax-yi1zhang wants to merge 4 commits into
MetaX-MACA:v0.21.0-devfrom
metax-yi1zhang:v0.21.0-dev
Open

[feature] support mctlassEx bf16 moe#266
metax-yi1zhang wants to merge 4 commits into
MetaX-MACA:v0.21.0-devfrom
metax-yi1zhang:v0.21.0-dev

Conversation

@metax-yi1zhang
Copy link
Copy Markdown

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

(Optional) Documentation Update


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Yi Zhang (yi1zhang) - Linux Multimedia <yi1zhang@metax-tech.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for bf16 MoE (Mixture of Experts) utilizing the mctlass library, including custom operator registration and routing logic in the fused MoE execution path. The review feedback highlights critical issues where the code assumes mctlass is successfully imported based solely on environment variables, which could lead to runtime crashes if the library is missing; it is recommended to explicitly verify that the mctlass operators are not None. Additionally, several type annotations for scale and bias parameters should be updated to allow None values to prevent static type-checking errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +311 to +319
if (
hidden_states.dtype == torch.bfloat16
and not self.quant_config.use_int4_w4a8
and not self.quant_config.use_int4_w4a16
and not self.quant_config.use_int8_w8a8
and not self.quant_config.use_int8_w8a16
and mx_envs.MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE
and mx_envs.MACA_VLLM_ENABLE_MCTLASS_PYTHON_API
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The bf16 moe block checks mx_envs.MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE and mx_envs.MACA_VLLM_ENABLE_MCTLASS_PYTHON_API, but it does not verify if the optional mctlassEx library was successfully imported (i.e., mctlass_ops.mctlass_moe_gemm is not None). If these environment variables are enabled but mctlassEx is not installed, this will result in an AssertionError and crash the execution instead of falling back to the default Triton implementation.

Please add a check to ensure mctlass_ops.mctlass_moe_gemm is not None before entering this block.

            if (
                hidden_states.dtype == torch.bfloat16
                and not self.quant_config.use_int4_w4a8
                and not self.quant_config.use_int4_w4a16
                and not self.quant_config.use_int8_w8a8
                and not self.quant_config.use_int8_w8a16
                and mx_envs.MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE
                and mx_envs.MACA_VLLM_ENABLE_MCTLASS_PYTHON_API
                and getattr(mctlass_ops, "mctlass_moe_gemm", None) is not None
            ):

Comment on lines +955 to +960
elif (
A.dtype == torch.bfloat16 and B.dtype == torch.bfloat16 and C.dtype == torch.bfloat16
and A_scale is None and B_scale is None
and mx_envs.MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE
and mx_envs.MACA_VLLM_ENABLE_MCTLASS_PYTHON_API
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the issue in triton_moe.py, this elif block will call mctlass_ops.cutlass_moe_mm_bf16 which asserts that mctlass_moe_gemm is not None. If mctlassEx is not installed, this will crash instead of falling back to the else block (Triton kernel).

Please add a check to ensure mctlass_ops.mctlass_moe_gemm is not None in the condition.

    elif (
        A.dtype == torch.bfloat16 and B.dtype == torch.bfloat16 and C.dtype == torch.bfloat16
        and A_scale is None and B_scale is None
        and mx_envs.MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE
        and mx_envs.MACA_VLLM_ENABLE_MCTLASS_PYTHON_API
        and getattr(mctlass_ops, "mctlass_moe_gemm", None) is not None
    ):

Comment thread vllm_metax/model_executor/layers/quantization/_python_api_ops.py
Comment thread vllm_metax/model_executor/layers/quantization/_python_api_ops.py
Comment thread vllm_metax/model_executor/layers/quantization/_python_api_ops.py
metax-yi1zhang and others added 3 commits June 2, 2026 16:42
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant