Skip to content

Commit 83f3c9b

Browse files
youkaichaosimon-mo
authored andcommitted
[bugfix][deepseek] fix flashmla kernel selection (#25956)
Signed-off-by: youkaichao <[email protected]> Signed-off-by: simon-mo <[email protected]>
1 parent d0b178c commit 83f3c9b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/attention/ops/flashmla.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ def flash_mla_with_kvcache(
136136
descale_k is None
137137
), "descale_q and descale_k should be both None or both not None"
138138

139-
if (descale_q is not None) and (descale_k is not None):
139+
if indices is None and q.element_size() == 1:
140140
out, softmax_lse = torch.ops._flashmla_extension_C.fwd_kvcache_mla_fp8(
141141
q, k_cache, head_dim_v, cache_seqlens, block_table, softmax_scale,
142142
causal, tile_scheduler_metadata, num_splits, descale_q, descale_k)

0 commit comments

Comments
 (0)