Enable MI300X ROCm support by ehartford · Pull Request #484 · antirez/ds4

ehartford · 2026-07-01T18:18:24Z

This PR enables DeepSeek V4 Pro to run on AMD Instinct MI300X by adding CDNA-oriented ROCm kernels and sharding model layers across local ROCm GPUs.

Summary

Add CDNA3/CDNA4 direct MFMA wrapper kernels for f16 MFMA:
- gfx942 uses mfma_f32_16x16x16_f16
- gfx950 uses mfma_f32_16x16x32_f16
Add a CDNA Q8 batch matmul/MFMA prefill path.
Add ROCm MoE kernel fixes for CDNA correctness, including disabling the broken IQ2/Q2 float-down WMMA overlay.
Add ROCm attention/activation fixes to avoid fp16 overflow and repeated BOS failures.
Add MI300X/CDNA build targets, with CDNA4 gfx950 compile plumbing.
Add local --gpus launcher using the existing distributed runtime to shard layers across local GPUs.
Support repeated -m model shards independent of argument order.
Allocate graph/KV/cache state only for the layer slice owned by each worker.
Add model-cache preflight checks for early actionable OOM errors.
Use BF16 for 16-bit distributed activation transport.
Add MI300X/ROCm smoke scripts and a synthetic Q8 MFMA correctness test.

Validation

Validated on MI300X / CDNA3:

make mi300x
git diff --check

Also validated a local sharded Pro Q4 run across MI300X GPUs, including reversed -m shard order.

Notes

CDNA4 / gfx950 kernel selection and build plumbing are included, but runtime validation has not been performed because I do not have CDNA4 hardware.

beverm2391 · 2026-07-05T03:01:17Z

+1 waiting on this one!

Eric Hartford added 2 commits June 21, 2026 14:11

Add CDNA ROCm MFMA build path

50778f0

enable mi300x

725661b

OPS-NeoRetro reviewed Jul 2, 2026

View reviewed changes

Comment thread README.md Outdated

fix README.md mention of Strix Halo

ed63605

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable MI300X ROCm support#484

Enable MI300X ROCm support#484
ehartford wants to merge 3 commits into
antirez:mainfrom
QuixiAI:main

ehartford commented Jul 1, 2026

Uh oh!

Uh oh!

beverm2391 commented Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ehartford commented Jul 1, 2026

Summary

Validation

Notes

Uh oh!

Uh oh!

beverm2391 commented Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants