[bugfix][deepseek] fix flashmla kernel selection #25956

youkaichao · 2025-09-30T14:38:05Z

Purpose

It seems we always pass in descale_q as tensors in flashmla backend, so it selects the wrong kernel implementation.

Fixes #25896 (comment) and potentially #25896 (comment)

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: youkaichao <[email protected]>

gemini-code-assist

Code Review

This pull request addresses a bug in the FlashMLA kernel selection by using q.element_size() == 1 to more reliably detect FP8 tensors, which is a good improvement for correctness. I have added one critical comment regarding a potential edge case where the FP8 kernel might be called without the necessary scaling factors, which could lead to a crash. Adding an assertion will make the implementation more robust.

gemini-code-assist · 2025-09-30T14:39:44Z

vllm/attention/ops/flashmla.py

+    if indices is None and q.element_size() == 1:
        out, softmax_lse = torch.ops._flashmla_extension_C.fwd_kvcache_mla_fp8(
            q, k_cache, head_dim_v, cache_seqlens, block_table, softmax_scale,
            causal, tile_scheduler_metadata, num_splits, descale_q, descale_k)


This change correctly identifies FP8 tensors using q.element_size() == 1. However, it introduces a potential issue where the FP8 kernel could be called without the necessary scaling factors. If q is an FP8 tensor but descale_q is None, fwd_kvcache_mla_fp8 would be called with None for descale_q and descale_k. This would likely cause a crash or incorrect computation within the underlying C++ kernel, as FP8 operations require these scaling factors.

To make this function more robust, I suggest adding an assertion to ensure that descale_q and descale_k are provided when q is an FP8 tensor.

if indices is None and q.element_size() == 1: assert descale_q is not None, ( "descale_q and descale_k must be provided for fp8 attention") out, softmax_lse = torch.ops._flashmla_extension_C.fwd_kvcache_mla_fp8( q, k_cache, head_dim_v, cache_seqlens, block_table, softmax_scale, causal, tile_scheduler_metadata, num_splits, descale_q, descale_k)

LucasWilkinson

Makes sense! thank you for the fix

yewentao256

LGTM, thanks for the work!

youkaichao · 2025-09-30T16:30:23Z

okay, verified locally that it works.

Signed-off-by: youkaichao <[email protected]> Signed-off-by: simon-mo <[email protected]>

Signed-off-by: youkaichao <[email protected]>

Signed-off-by: youkaichao <[email protected]> Signed-off-by: yewentao256 <[email protected]>

youkaichao added 2 commits September 30, 2025 22:20

fix flashmla kernel selection

095fd64

Signed-off-by: youkaichao <[email protected]>

fix flashmla kernel selection

cc1839f

Signed-off-by: youkaichao <[email protected]>

youkaichao requested a review from LucasWilkinson as a code owner September 30, 2025 14:38

mergify bot added the deepseek Related to DeepSeek models label Sep 30, 2025

gemini-code-assist bot reviewed Sep 30, 2025

View reviewed changes

LucasWilkinson approved these changes Sep 30, 2025

View reviewed changes

LucasWilkinson enabled auto-merge (squash) September 30, 2025 14:50

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025

youkaichao added this to the v0.11.0 Cherry Picks milestone Sep 30, 2025

yewentao256 approved these changes Sep 30, 2025

View reviewed changes

youkaichao disabled auto-merge September 30, 2025 16:30

youkaichao merged commit a2e6fa7 into vllm-project:main Sep 30, 2025
46 of 51 checks passed

youkaichao deleted the fix_dsv3 branch September 30, 2025 16:30

youkaichao mentioned this pull request Sep 30, 2025

[New Model] DeepSeek-V3.2 (Rebased to Main) #25896

Merged

simon-mo pushed a commit that referenced this pull request Oct 1, 2025

[bugfix][deepseek] fix flashmla kernel selection (#25956)

83f3c9b

Signed-off-by: youkaichao <[email protected]> Signed-off-by: simon-mo <[email protected]>

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

[bugfix][deepseek] fix flashmla kernel selection (vllm-project#25956)

9732cbd

Signed-off-by: youkaichao <[email protected]>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[bugfix][deepseek] fix flashmla kernel selection (#25956)

206ab1f

Signed-off-by: youkaichao <[email protected]> Signed-off-by: yewentao256 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[bugfix][deepseek] fix flashmla kernel selection #25956

[bugfix][deepseek] fix flashmla kernel selection #25956

Uh oh!

youkaichao commented Sep 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 30, 2025

Uh oh!

LucasWilkinson left a comment

Uh oh!

yewentao256 left a comment

Uh oh!

youkaichao commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[bugfix][deepseek] fix flashmla kernel selection #25956

[bugfix][deepseek] fix flashmla kernel selection #25956

Uh oh!

Conversation

youkaichao commented Sep 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

youkaichao commented Sep 30, 2025 •

edited by github-actions bot

Loading