Skip to content

Conversation

youkaichao
Copy link
Member

@youkaichao youkaichao commented Sep 30, 2025

Purpose

It seems we always pass in descale_q as tensors in flashmla backend, so it selects the wrong kernel implementation.

Fixes #25896 (comment) and potentially #25896 (comment)

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the deepseek Related to DeepSeek models label Sep 30, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the FlashMLA kernel selection by using q.element_size() == 1 to more reliably detect FP8 tensors, which is a good improvement for correctness. I have added one critical comment regarding a potential edge case where the FP8 kernel might be called without the necessary scaling factors, which could lead to a crash. Adding an assertion will make the implementation more robust.

Comment on lines +139 to 142
if indices is None and q.element_size() == 1:
out, softmax_lse = torch.ops._flashmla_extension_C.fwd_kvcache_mla_fp8(
q, k_cache, head_dim_v, cache_seqlens, block_table, softmax_scale,
causal, tile_scheduler_metadata, num_splits, descale_q, descale_k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change correctly identifies FP8 tensors using q.element_size() == 1. However, it introduces a potential issue where the FP8 kernel could be called without the necessary scaling factors. If q is an FP8 tensor but descale_q is None, fwd_kvcache_mla_fp8 would be called with None for descale_q and descale_k. This would likely cause a crash or incorrect computation within the underlying C++ kernel, as FP8 operations require these scaling factors.

To make this function more robust, I suggest adding an assertion to ensure that descale_q and descale_k are provided when q is an FP8 tensor.

    if indices is None and q.element_size() == 1:
        assert descale_q is not None, (
            "descale_q and descale_k must be provided for fp8 attention")
        out, softmax_lse = torch.ops._flashmla_extension_C.fwd_kvcache_mla_fp8(
            q, k_cache, head_dim_v, cache_seqlens, block_table, softmax_scale,
            causal, tile_scheduler_metadata, num_splits, descale_q, descale_k)

Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! thank you for the fix

@LucasWilkinson LucasWilkinson enabled auto-merge (squash) September 30, 2025 14:50
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025
@youkaichao youkaichao added this to the v0.11.0 Cherry Picks milestone Sep 30, 2025
Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@youkaichao
Copy link
Member Author

okay, verified locally that it works.

@youkaichao youkaichao disabled auto-merge September 30, 2025 16:30
@youkaichao youkaichao merged commit a2e6fa7 into vllm-project:main Sep 30, 2025
46 of 51 checks passed
@youkaichao youkaichao deleted the fix_dsv3 branch September 30, 2025 16:30
simon-mo pushed a commit that referenced this pull request Oct 1, 2025
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants