Summary
rocm/atom-dev:latest (built 2026-06-24 15:36 UTC) fails to start when serving amd/MiniMax-M3-MXFP4. The AIter library in the image was updated to split kv_cache into separate kv_cache_k/kv_cache_v arguments in fused_qknorm_idxrqknorm, but atom/models/minimax_m3.py (at ATOM HEAD ab9eb781) still uses the old single-tensor API. This causes a type error at warmup.
Repro
docker pull rocm/atom-dev:latest
docker run --rm \
--device=/dev/kfd --device=/dev/dri \
--group-add video --ipc=host \
--cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
--network=host \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env HF_TOKEN=<token> \
--env AITER_QUICK_REDUCE_QUANTIZATION=INT4 \
--env ATOM_FORCE_ATTN_TRITON=1 \
--env TORCHDYNAMO_DISABLE=1 \
rocm/atom-dev:latest \
python -m atom.entrypoints.openai_server \
--model amd/MiniMax-M3-MXFP4 \
--tensor-parallel-size 4 \
--server-port 8000 \
--trust-remote-code \
--gpu-memory-utilization 0.8 \
--block-size 128 \
--max-model-len 32768 \
--max-num-seqs 128 \
--max-num-batched-tokens 32768 \
--no-enable_prefix_caching \
--enforce-eager
Error
RuntimeError: aiter::_fused_qknorm_idxrqknorm_hip() Expected a value of type 'Optional[Tensor]' for argument 'index_cache' but instead found type 'int'.
Cast error details: Unable to cast 0 to Tensor
Full traceback:
File "/app/ATOM/atom/model_engine/model_runner.py", line 1158, in warmup_model
output = self.compiled_callable(*args, **kwargs)
File "/app/ATOM/atom/models/minimax_m3.py", line 418, in forward
...
RuntimeError: aiter::_fused_qknorm_idxrqknorm_hip() Expected a value of type 'Optional[Tensor]' for argument 'index_cache' but instead found type 'int'.
Root Cause
atom/models/minimax_m3.py (around line 717) calls aiter.fused_qknorm_idxrqknorm with the old positional API:
aiter.fused_qknorm_idxrqknorm(
...
sparse_metadata.slot_mapping, # slot_mapping
self.kv_cache, # OLD arg 14: kv_cache (full [blocks,2,...] tensor)
self.index_cache, # OLD arg 15: index_cache
self.kv_cache.shape[2], # OLD arg 16: block_size (int)
q, # OLD arg 17: q_out
...
)
But aiter/ops/fused_qknorm_idxrqknorm.py in the same image now has the new signature with a split KV cache:
def fused_qknorm_idxrqknorm(
...
kv_cache_k: Optional[Tensor], # NEW arg 14
kv_cache_v: Optional[Tensor], # NEW arg 15 — inserted here
index_cache: Optional[Tensor], # NEW arg 16
block_size: int, # NEW arg 17
q_out: Optional[Tensor], # NEW arg 18
...
)
The insertion of kv_cache_v at position 15 shifts every subsequent argument, causing self.kv_cache.shape[2] (an int) to land on index_cache: Optional[Tensor] → type error.
Environment
- Image:
rocm/atom-dev:latest (built 2026-06-24 15:36 UTC)
- ATOM HEAD in image:
ab9eb781
- GPU: MI355 (4x)
- ROCm: 7.2.4
- Model:
amd/MiniMax-M3-MXFP4
Expected
Server starts and serves requests as documented in recipes/MiniMax-M3.md (added in PR #1305).
Summary
rocm/atom-dev:latest(built 2026-06-24 15:36 UTC) fails to start when servingamd/MiniMax-M3-MXFP4. The AIter library in the image was updated to splitkv_cacheinto separatekv_cache_k/kv_cache_varguments infused_qknorm_idxrqknorm, butatom/models/minimax_m3.py(at ATOM HEADab9eb781) still uses the old single-tensor API. This causes a type error at warmup.Repro
Error
Full traceback:
Root Cause
atom/models/minimax_m3.py(around line 717) callsaiter.fused_qknorm_idxrqknormwith the old positional API:But
aiter/ops/fused_qknorm_idxrqknorm.pyin the same image now has the new signature with a split KV cache:The insertion of
kv_cache_vat position 15 shifts every subsequent argument, causingself.kv_cache.shape[2](anint) to land onindex_cache: Optional[Tensor]→ type error.Environment
rocm/atom-dev:latest(built 2026-06-24 15:36 UTC)ab9eb781amd/MiniMax-M3-MXFP4Expected
Server starts and serves requests as documented in
recipes/MiniMax-M3.md(added in PR #1305).