Skip to content

[Issue]: Running Minimax 2.5/2.7 using tip of tree (ATOM/AITER/vLLM) or latest nightly version with TP=8 on results in error: Invalid qk hidden dim layout Invalid qk hidden dim layout for qknorm_allreduce_fusion_kernel_2stage kernel: #1223

Description

@rasmith

Problem Description

I am using vLLM + AITER + ATOM tip of tree. I tried with:

  1. rocm/atom-dev:vllm-latest (works)
  2. rocm/atom-dev:vllm-v0.22.0-nightly_20260616 (doesn't work, exhibits this problem)

I ran with the following configuration

export HF_TOKEN=<hf token>
TP=8
vllm serve MiniMaxAI/MiniMax-M2.5  \
    --trust-remote-code \
    --tensor-parallel-size ${TP}\
    --enable-auto-tool-choice \
    --tool-call-parser minimax_m2 \
    --reasoning-parser minimax_m2_append_think \
    --enable-expert-parallel \
    --enforce-eager
    --gpu-memory-utilization 0.55

I got the following error:

EngineCore pid=373365) ERROR 06-15 17:28:30 [multiproc_executor.py:284] Worker proc VllmWorker-6 died unexpectedly, shutting down executor.
(EngineCore pid=373365) INFO 06-15 17:28:30 [multiproc_executor.py:428] [shutdown] Executor: waiting for worker exit count=8
(EngineCore pid=373365) Process EngineCore:
(EngineCore pid=373365) Traceback (most recent call last):
(EngineCore pid=373365)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=373365)     self.run()
(EngineCore pid=373365)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=373365)     self._target(*self._args, **self._kwargs)
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/v1/engine/core.py", line 1199, in run_engine_core
(EngineCore pid=373365)     raise e
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/v1/engine/core.py", line 1164, in run_engine_core
(EngineCore pid=373365)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=373365)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=373365)     return func(*args, **kwargs)
(EngineCore pid=373365)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/v1/engine/core.py", line 930, in __init__
(EngineCore pid=373365)     super().__init__(
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/v1/engine/core.py", line 132, in __init__
(EngineCore pid=373365)     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=373365)                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=373365)     return func(*args, **kwargs)
(EngineCore pid=373365)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/v1/engine/core.py", line 257, in _initialize_kv_caches
(EngineCore pid=373365)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=373365)                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=373365)     return self.collective_rpc("determine_available_memory")
(EngineCore pid=373365)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/v1/executor/multiproc_executor.py", line 404, in collective_rpc
(EngineCore pid=373365)     return future if non_block else future.result()
(EngineCore pid=373365)                                     ^^^^^^^^^^^^^^^
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/v1/executor/multiproc_executor.py", line 91, in result
(EngineCore pid=373365)     return super().result()
(EngineCore pid=373365)            ^^^^^^^^^^^^^^^^
(EngineCore pid=373365)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=373365)     return self.__get_result()
(EngineCore pid=373365)            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=373365)     raise self._exception
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/v1/executor/multiproc_executor.py", line 95, in _wait_for_response
(EngineCore pid=373365)     response = self.aggregate(self.get_response())
(EngineCore pid=373365)                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365)   File "/vllm-upstream-atom/vllm/v1/executor/multiproc_executor.py", line 391, in get_response
(EngineCore pid=373365)     raise RuntimeError(
(EngineCore pid=373365) RuntimeError: Worker failed with error 'Invalid qk hidden dim layout for qknorm_allreduce_fusion_kernel_2stage kernel', please check the stack trace above for the root cause

This happens because hidden_dim_q = 768, hidden_dim_k = 128, hidden_dim_v = 128 when TP=8.

The kernel launcher then expects all inputs to be a multiple of:

constexpr int WARP_WORK_SIZE = WARP_SIZE * PACK_SIZE

with
constexpr int PACK_SIZE = 16 / sizeof(T);
and PACK_SIZE will be 16 / 2 = 4, so WARP_WORK_SIZE will be WARP_SIZE (32) * PACK_SIZE(4) = 256
since the inputs are of type torch.bfloat16 which causes the error to be raised.

Operating System

22.04.5 LTS (Jammy Jellyfish)"

CPU

AMD EPYC 9575F 64-Core Processor

GPU

AMD EPYC 9575F 64-Core Processor

ROCm Version

ROCm 7.2.2

ROCm Component

No response

Steps to Reproduce

export HF_TOKEN=<hf token>
TP=8
vllm serve MiniMaxAI/MiniMax-M2.5  \
    --trust-remote-code \
    --tensor-parallel-size ${TP}\
    --enable-auto-tool-choice \
    --tool-call-parser minimax_m2 \
    --reasoning-parser minimax_m2_append_think \
    --enable-expert-parallel \
    --enforce-eager
    --gpu-memory-utilization 0.55

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

rocminfo --support output
Paste output here

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions