Problem Description
I am using vLLM + AITER + ATOM tip of tree. I tried with:
rocm/atom-dev:vllm-latest (works)
rocm/atom-dev:vllm-v0.22.0-nightly_20260616 (doesn't work, exhibits this problem)
I ran with the following configuration
export HF_TOKEN=<hf token>
TP=8
vllm serve MiniMaxAI/MiniMax-M2.5 \
--trust-remote-code \
--tensor-parallel-size ${TP}\
--enable-auto-tool-choice \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--enable-expert-parallel \
--enforce-eager
--gpu-memory-utilization 0.55
I got the following error:
EngineCore pid=373365) ERROR 06-15 17:28:30 [multiproc_executor.py:284] Worker proc VllmWorker-6 died unexpectedly, shutting down executor.
(EngineCore pid=373365) INFO 06-15 17:28:30 [multiproc_executor.py:428] [shutdown] Executor: waiting for worker exit count=8
(EngineCore pid=373365) Process EngineCore:
(EngineCore pid=373365) Traceback (most recent call last):
(EngineCore pid=373365) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=373365) self.run()
(EngineCore pid=373365) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=373365) self._target(*self._args, **self._kwargs)
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/v1/engine/core.py", line 1199, in run_engine_core
(EngineCore pid=373365) raise e
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/v1/engine/core.py", line 1164, in run_engine_core
(EngineCore pid=373365) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=373365) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=373365) return func(*args, **kwargs)
(EngineCore pid=373365) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/v1/engine/core.py", line 930, in __init__
(EngineCore pid=373365) super().__init__(
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/v1/engine/core.py", line 132, in __init__
(EngineCore pid=373365) kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=373365) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=373365) return func(*args, **kwargs)
(EngineCore pid=373365) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/v1/engine/core.py", line 257, in _initialize_kv_caches
(EngineCore pid=373365) available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=373365) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=373365) return self.collective_rpc("determine_available_memory")
(EngineCore pid=373365) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/v1/executor/multiproc_executor.py", line 404, in collective_rpc
(EngineCore pid=373365) return future if non_block else future.result()
(EngineCore pid=373365) ^^^^^^^^^^^^^^^
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/v1/executor/multiproc_executor.py", line 91, in result
(EngineCore pid=373365) return super().result()
(EngineCore pid=373365) ^^^^^^^^^^^^^^^^
(EngineCore pid=373365) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=373365) return self.__get_result()
(EngineCore pid=373365) ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=373365) raise self._exception
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/v1/executor/multiproc_executor.py", line 95, in _wait_for_response
(EngineCore pid=373365) response = self.aggregate(self.get_response())
(EngineCore pid=373365) ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=373365) File "/vllm-upstream-atom/vllm/v1/executor/multiproc_executor.py", line 391, in get_response
(EngineCore pid=373365) raise RuntimeError(
(EngineCore pid=373365) RuntimeError: Worker failed with error 'Invalid qk hidden dim layout for qknorm_allreduce_fusion_kernel_2stage kernel', please check the stack trace above for the root cause
This happens because hidden_dim_q = 768, hidden_dim_k = 128, hidden_dim_v = 128 when TP=8.
The kernel launcher then expects all inputs to be a multiple of:
constexpr int WARP_WORK_SIZE = WARP_SIZE * PACK_SIZE
with
constexpr int PACK_SIZE = 16 / sizeof(T);
and PACK_SIZE will be 16 / 2 = 4, so WARP_WORK_SIZE will be WARP_SIZE (32) * PACK_SIZE(4) = 256
since the inputs are of type torch.bfloat16 which causes the error to be raised.
Operating System
22.04.5 LTS (Jammy Jellyfish)"
CPU
AMD EPYC 9575F 64-Core Processor
GPU
AMD EPYC 9575F 64-Core Processor
ROCm Version
ROCm 7.2.2
ROCm Component
No response
Steps to Reproduce
export HF_TOKEN=<hf token>
TP=8
vllm serve MiniMaxAI/MiniMax-M2.5 \
--trust-remote-code \
--tensor-parallel-size ${TP}\
--enable-auto-tool-choice \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--enable-expert-parallel \
--enforce-eager
--gpu-memory-utilization 0.55
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
rocminfo --support output
Additional Information
No response
Problem Description
I am using vLLM + AITER + ATOM tip of tree. I tried with:
rocm/atom-dev:vllm-latest(works)rocm/atom-dev:vllm-v0.22.0-nightly_20260616(doesn't work, exhibits this problem)I ran with the following configuration
I got the following error:
This happens because
hidden_dim_q = 768,hidden_dim_k = 128,hidden_dim_v = 128whenTP=8.The kernel launcher then expects all inputs to be a multiple of:
constexpr int WARP_WORK_SIZE = WARP_SIZE * PACK_SIZEwith
constexpr int PACK_SIZE = 16 / sizeof(T);and
PACK_SIZEwill be16 / 2 = 4, soWARP_WORK_SIZEwill beWARP_SIZE (32) * PACK_SIZE(4) = 256since the inputs are of type
torch.bfloat16which causes the error to be raised.Operating System
22.04.5 LTS (Jammy Jellyfish)"
CPU
AMD EPYC 9575F 64-Core Processor
GPU
AMD EPYC 9575F 64-Core Processor
ROCm Version
ROCm 7.2.2
ROCm Component
No response
Steps to Reproduce
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
rocminfo --support output
Additional Information
No response