Skip to content

Optimize DeepSeek V4 qkv_proj_rope decode (S=2): partial-sum reduces, amax fold, K-tile/stage tuning#339

Merged
zhangqi-chen merged 2 commits into
hw-native-sys:mainfrom
wangqin1723-max:perf/dsv4-qkv-proj-rope-grouped-chunking
May 21, 2026
Merged

Optimize DeepSeek V4 qkv_proj_rope decode (S=2): partial-sum reduces, amax fold, K-tile/stage tuning#339
zhangqi-chen merged 2 commits into
hw-native-sys:mainfrom
wangqin1723-max:perf/dsv4-qkv-proj-rope-grouped-chunking

Commits

Commits on May 20, 2026

Commits on May 21, 2026