[Bugfix] Disable FlashInfer MLA prefill by default due to chunked prefill issues #26049

mgoin · 2025-10-01T21:33:43Z

Purpose

Address #26042 for now to unblock the release. We will take a performance regression on B200 due to falling back to FA2 prefill, but we need correctness first.

Test Plan

Test Result

Base commands:
vllm serve deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
python tests/evals/gsm8k/gsm8k_eval.py

# Hopper

vllm serve deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Accuracy: 0.794

# Blackwell

## Default backends (FlashInfer prefill and CUTLASS MLA decode)
vllm serve deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Accuracy: 0.216

## Using FA2 prefill and CUTLASS MLA decode
VLLM_DISABLE_FLASHINFER_PREFILL=1 vllm serve deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Accuracy: 0.785

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…fill issues Signed-off-by: mgoin <[email protected]>

gemini-code-assist

Code Review

This pull request updates the default value for VLLM_DISABLE_FLASHINFER_PREFILL to True, effectively disabling FlashInfer MLA prefill by default to address issues with chunked prefill. The change is applied consistently to both the type-hinted default and the environment variable parsing logic. While the change itself is correct, I've identified a related potential issue: this environment variable, which likely influences the computation graph, is not included in the cache key computation. This could lead to incorrect cache hits if the flag is toggled.

vllm/envs.py

pavanimajety · 2025-10-01T22:18:08Z

Could we instead try to set this to "FA2"
https://github.com/vllm-project/vllm/blob/main/vllm/v1/attention/backends/mla/common.py#L573

FI pointer: https://github.com/flashinfer-ai/flashinfer/blob/main/flashinfer/prefill.py#L2397-L2401

mgoin · 2025-10-01T22:41:40Z

Unfortunately the FA2 backend in flashinfer still has the same issue, so it is likely vLLM preparing inputs improperly for flashinfer

LucasWilkinson

use: #26063 instead

[Bugfix] Disable FlashInfer MLA prefill by default due to chunked pre…

62dec0c

…fill issues Signed-off-by: mgoin <[email protected]>

gemini-code-assist bot reviewed Oct 1, 2025

View reviewed changes

vllm/envs.py Show resolved Hide resolved

LucasWilkinson approved these changes Oct 1, 2025

View reviewed changes

simon-mo added this to the v0.11.0 Cherry Picks milestone Oct 1, 2025

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed deepseek Related to DeepSeek models labels Oct 1, 2025

mgoin mentioned this pull request Oct 1, 2025

[CI] Add Blackwell LM Eval Small Models test to nightly #26052

Merged

5 tasks

LucasWilkinson requested changes Oct 2, 2025

View reviewed changes

LucasWilkinson removed this from the v0.11.0 Cherry Picks milestone Oct 2, 2025

mgoin closed this Oct 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Disable FlashInfer MLA prefill by default due to chunked prefill issues #26049

[Bugfix] Disable FlashInfer MLA prefill by default due to chunked prefill issues #26049

mgoin commented Oct 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

pavanimajety commented Oct 1, 2025

Uh oh!

mgoin commented Oct 1, 2025

Uh oh!

LucasWilkinson left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix] Disable FlashInfer MLA prefill by default due to chunked prefill issues #26049

[Bugfix] Disable FlashInfer MLA prefill by default due to chunked prefill issues #26049

Conversation

mgoin commented Oct 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

pavanimajety commented Oct 1, 2025

Uh oh!

mgoin commented Oct 1, 2025

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mgoin commented Oct 1, 2025 •

edited by github-actions bot

Loading