Skip to content

rocm/atom-dev:latest crashes on MiniMax-M3 MXFP4: AIter API mismatch in fused_qknorm_idxrqknorm #1347

Description

@Princejain1101

Summary

rocm/atom-dev:latest (built 2026-06-24 15:36 UTC) fails to start when serving amd/MiniMax-M3-MXFP4. The AIter library in the image was updated to split kv_cache into separate kv_cache_k/kv_cache_v arguments in fused_qknorm_idxrqknorm, but atom/models/minimax_m3.py (at ATOM HEAD ab9eb781) still uses the old single-tensor API. This causes a type error at warmup.

Repro

docker pull rocm/atom-dev:latest

docker run --rm \
  --device=/dev/kfd --device=/dev/dri \
  --group-add video --ipc=host \
  --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
  --network=host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --env HF_TOKEN=<token> \
  --env AITER_QUICK_REDUCE_QUANTIZATION=INT4 \
  --env ATOM_FORCE_ATTN_TRITON=1 \
  --env TORCHDYNAMO_DISABLE=1 \
  rocm/atom-dev:latest \
  python -m atom.entrypoints.openai_server \
    --model amd/MiniMax-M3-MXFP4 \
    --tensor-parallel-size 4 \
    --server-port 8000 \
    --trust-remote-code \
    --gpu-memory-utilization 0.8 \
    --block-size 128 \
    --max-model-len 32768 \
    --max-num-seqs 128 \
    --max-num-batched-tokens 32768 \
    --no-enable_prefix_caching \
    --enforce-eager

Error

RuntimeError: aiter::_fused_qknorm_idxrqknorm_hip() Expected a value of type 'Optional[Tensor]' for argument 'index_cache' but instead found type 'int'.
Cast error details: Unable to cast 0 to Tensor

Full traceback:

File "/app/ATOM/atom/model_engine/model_runner.py", line 1158, in warmup_model
    output = self.compiled_callable(*args, **kwargs)
File "/app/ATOM/atom/models/minimax_m3.py", line 418, in forward
...
RuntimeError: aiter::_fused_qknorm_idxrqknorm_hip() Expected a value of type 'Optional[Tensor]' for argument 'index_cache' but instead found type 'int'.

Root Cause

atom/models/minimax_m3.py (around line 717) calls aiter.fused_qknorm_idxrqknorm with the old positional API:

aiter.fused_qknorm_idxrqknorm(
    ...
    sparse_metadata.slot_mapping,  # slot_mapping
    self.kv_cache,                 # OLD arg 14: kv_cache (full [blocks,2,...] tensor)
    self.index_cache,              # OLD arg 15: index_cache
    self.kv_cache.shape[2],        # OLD arg 16: block_size (int)
    q,                             # OLD arg 17: q_out
    ...
)

But aiter/ops/fused_qknorm_idxrqknorm.py in the same image now has the new signature with a split KV cache:

def fused_qknorm_idxrqknorm(
    ...
    kv_cache_k: Optional[Tensor],  # NEW arg 14
    kv_cache_v: Optional[Tensor],  # NEW arg 15 — inserted here
    index_cache: Optional[Tensor], # NEW arg 16
    block_size: int,               # NEW arg 17
    q_out: Optional[Tensor],       # NEW arg 18
    ...
)

The insertion of kv_cache_v at position 15 shifts every subsequent argument, causing self.kv_cache.shape[2] (an int) to land on index_cache: Optional[Tensor] → type error.

Environment

  • Image: rocm/atom-dev:latest (built 2026-06-24 15:36 UTC)
  • ATOM HEAD in image: ab9eb781
  • GPU: MI355 (4x)
  • ROCm: 7.2.4
  • Model: amd/MiniMax-M3-MXFP4

Expected

Server starts and serves requests as documented in recipes/MiniMax-M3.md (added in PR #1305).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions