[New Model] DeepSeek-V3.2 (Rebased to Main) #25896

zyongye · 2025-09-29T17:17:13Z

Rebased dsv32, based on #25869

Run command

vllm serve deepseek-ai/DeepSeek-V3.2-Exp  --max_model_len=20000 --gpu_memory_utilization=0.9 -tp 8 --max_num_seqs=256

gsm8k

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9568|±  |0.0056|
|     |       |strict-match    |     5|exact_match|↑  |0.9575|±  |0.0056|

gsm8k, 20-shot

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|    20|exact_match|↑  |0.9507|±  | 0.006|
|     |       |strict-match    |    20|exact_match|↑  |0.9507|±  | 0.006|

Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: youkaichao <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: Lucas Wilkinson <[email protected]> fix smoke tests Signed-off-by: Lucas Wilkinson <[email protected]> moved to FlashMLA repo Signed-off-by: Lucas Wilkinson <[email protected]> removed pytorch shim Signed-off-by: Lucas Wilkinson <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

setup sparse attention backend

Signed-off-by: Lucas Wilkinson <[email protected]>

…ild-sparse-flash-mla Build and bind sparse-FlashMLA kernels

…integration [Feature] DeepGEMM integration

* and env and MQA path for both prefill and decode Signed-off-by: Lucas Wilkinson <[email protected]> * fix shapes Signed-off-by: Lucas Wilkinson <[email protected]> --------- Signed-off-by: Lucas Wilkinson <[email protected]>

* code from ds Signed-off-by: youkaichao <[email protected]> * doc from ds Signed-off-by: youkaichao <[email protected]> * Fixes for support_materials/2-tilelang/ Signed-off-by: mgoin <[email protected]> * Fix example 1 Signed-off-by: mgoin <[email protected]> * Fix Einsum in deepgemm * Fix `libc10.so` unimported error * fix reference code Signed-off-by: youkaichao <[email protected]> * adding missing indexer args * passing index args into the module * init Signed-off-by: Chen Zhang <[email protected]> * build indexer k cache medadata * prefill indexer, but weight_proj will output -inf * unqiantized paged indexer, still have -inf issue * remove support material * adding topk_indices mask * add weight scale * unittest infrastructure and fix weight_proj, numeric error due to quantization * varlen prefill passed * paged prefill * add indices mask --------- Signed-off-by: youkaichao <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Wentao Ye <[email protected]> Co-authored-by: Chen Zhang <[email protected]>

Co-authored-by: Lucia Fang <[email protected]>

* prefill mla Signed-off-by: Chen Zhang <[email protected]> * can run now Signed-off-by: Chen Zhang <[email protected]> * tmp Signed-off-by: Chen Zhang <[email protected]> * can output the first token Signed-off-by: Chen Zhang <[email protected]> * fix bug Signed-off-by: Chen Zhang <[email protected]> * remove some debug Signed-off-by: Chen Zhang <[email protected]> * update Signed-off-by: Chen Zhang <[email protected]> * hack through cu_seqlen_ks exploding issue * update basic.py Signed-off-by: Chen Zhang <[email protected]> * remove some unnecessary changes Signed-off-by: Chen Zhang <[email protected]> * clean up Signed-off-by: Chen Zhang <[email protected]> --------- Signed-off-by: Chen Zhang <[email protected]> Co-authored-by: Yongye Zhu <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: Lucas Wilkinson <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: NickLucche <[email protected]>

Fix MLA for non dsv32 models

Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2025-09-30T07:16:10Z

locally verified this PR has correct results:

local-completions (model=deepseek-ai/DeepSeek-V3.2-Exp,base_url=http://127.0.0.1:8000/v1/completions,num_concurrent=100,max_retries=3,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9613|±  |0.0053|
|     |       |strict-match    |     5|exact_match|↑  |0.9613|±  |0.0053|

Signed-off-by: youkaichao <[email protected]>

csrc/cache_kernels.cu

Signed-off-by: youkaichao <[email protected]>

heheda12345 · 2025-09-30T07:54:03Z

@youkaichao Can you help to try DeepSeek-R1? I got the following errors:

VLLM_USE_DEEP_GEMM=0 vllm serve deepseek-ai/DeepSeek-R1 -tp 8 --max-num-seqs 256

(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 411, in __call__
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "<eval_with_key>.124", line 696, in forward
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     submod_1 = self.submod_1(getitem, s72, getitem_1, getitem_2, getitem_3);  getitem = getitem_1 = getitem_2 = submod_1 = None
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 848, in call_wrapped
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 424, in __call__
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     raise e
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 411, in __call__
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "<eval_with_key>.2", line 5, in forward
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     unified_attention_with_output = torch.ops.vllm.unified_attention_with_output(q, x_11, key_rot_1, output_1, 'model.layers.0.self_attn.attn');  q = x_11 = key_rot_1 = output_1 = unified_attention_with_output = None
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1243, in __call__
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     return self._op(*args, **kwargs)
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/vllm/attention/layer.py", line 614, in unified_attention_with_output
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     self.impl.forward(self,
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/vllm/v1/attention/backends/mla/common.py", line 1767, in forward
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     attn_out, lse = self._forward_decode(decode_q, kv_cache,
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/vllm/v1/attention/backends/mla/flashmla.py", line 188, in _forward_decode
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     o, lse = flash_mla_with_kvcache(
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]              ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/vllm/attention/ops/flashmla.py", line 140, in flash_mla_with_kvcache
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     out, softmax_lse = torch.ops._flashmla_extension_C.fwd_kvcache_mla_fp8(
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]   File "/data/zhang-chen/vllm/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1243, in __call__
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]     return self._op(*args, **kwargs)
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671] RuntimeError: Expected q.dtype() == torch::kFloat8_e4m3fn to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
(Worker_TP0 pid=3647913) ERROR 09-30 00:32:14 [multiproc_executor.py:671]

youkaichao · 2025-09-30T08:09:52Z

vllm/attention/ops/flashmla.py

+        descale_k is None
+    ), "descale_q and descale_k should be both None or both not None"
+
+    if (descale_q is not None) and (descale_k is not None):


Suggested change

if (descale_q is not None) and (descale_k is not None):

if indices is not None:

@LucasWilkinson does this make sense? @heheda12345 's error seems to indicate that deepseek r1 goes into this branch and calls torch.ops._flashmla_extension_C.fwd_kvcache_mla_fp8

youkaichao · 2025-09-30T08:16:35Z

vllm/attention/ops/flashmla.py

-    # Note(hc): need revisit when we support DCP with decode query_len > 1.
-    return out.squeeze(1), softmax_lse.squeeze(-1)


@LucasWilkinson do we need this as well for dcp?

youkaichao · 2025-09-30T09:14:21Z

the issue reported by @heheda12345 seems to be kvcache dtype issue.

merging first to unblock further optimizations. @LucasWilkinson might help investigate further.

cjackal · 2025-09-30T10:50:51Z

Seems like DSV3 AWQ quantized checkpoints are broken after this PR; the error message is like the following, let me write an issue for it:

RuntimeError: Expected q.dtype() == torch::kFloat8_e4m3fn to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

njhill · 2025-09-30T15:52:44Z

A CI test was reportedly broken by this (now failing on main):

[2025-09-30T15:41:37Z] FAILED v1/spec_decode/test_eagle.py::test_load_model[True-1-FLASH_ATTN-eagle] - RuntimeError: generator raised StopIteration

https://buildkite.com/vllm/ci/builds/32959#01999b1b-bec0-44a2-bca0-2523a6209558

Edit: I have opened a fix here: #25978

youkaichao · 2025-09-30T16:31:35Z

@heheda12345 @cjackal can you help check if #25956 solves the problem?

… still failing tests (#292) vllm-project/vllm#25896 --------- Signed-off-by: Chendi Xue <[email protected]>

cjackal · 2025-10-01T03:23:21Z

@heheda12345 @cjackal can you help check if #25956 solves the problem?

Confirmed that it works normal after your PR, thank you for the prompt bugfix!

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: youkaichao <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: Yongye Zhu <[email protected]> Signed-off-by: Barry Kang <[email protected]> Signed-off-by: Lucia Fang <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Wentao Ye <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: NickLucche <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Matthew Bonanni <[email protected]> Co-authored-by: Xiaozhu Meng <[email protected]> Co-authored-by: Barry Kang <[email protected]> Signed-off-by: simon-mo <[email protected]>

… still failing tests (vllm-project#292) vllm-project/vllm#25896 --------- Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Iryna Boiko <[email protected]>

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: youkaichao <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: Yongye Zhu <[email protected]> Signed-off-by: Barry Kang <[email protected]> Signed-off-by: Lucia Fang <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Wentao Ye <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: NickLucche <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Matthew Bonanni <[email protected]> Co-authored-by: Xiaozhu Meng <[email protected]> Co-authored-by: Barry Kang <[email protected]>

heheda12345 and others added 30 commits September 20, 2025 18:24

init dev branch

12f85fb

Signed-off-by: Chen Zhang <[email protected]>

add indexer module

1486830

Signed-off-by: youkaichao <[email protected]>

fix fp8 weight loading

ee3271e

Signed-off-by: youkaichao <[email protected]>

fix key

3f4154d

Signed-off-by: youkaichao <[email protected]>

basic test

991b94f

Signed-off-by: youkaichao <[email protected]>

add indexer cache (vllm-project#12)

00c455c

Signed-off-by: Chen Zhang <[email protected]>

setup sparse attention backend

ddaf933

Signed-off-by: Chen Zhang <[email protected]>

build sparse

aff9596

Signed-off-by: Lucas Wilkinson <[email protected]> fix smoke tests Signed-off-by: Lucas Wilkinson <[email protected]> moved to FlashMLA repo Signed-off-by: Lucas Wilkinson <[email protected]> removed pytorch shim Signed-off-by: Lucas Wilkinson <[email protected]>

pass in selected index

22d0fe5

Signed-off-by: Chen Zhang <[email protected]>

make basic.py runable

3b9df19

Signed-off-by: Chen Zhang <[email protected]>

small fix

f85564f

Signed-off-by: Chen Zhang <[email protected]>

reduce api change

fe45b06

Signed-off-by: Chen Zhang <[email protected]>

revert

216c42f

Signed-off-by: Chen Zhang <[email protected]>

Merge pull request vllm-project#14 from vllm-model-0920/mla_backend

446c0de

setup sparse attention backend

format

840f205

Signed-off-by: Lucas Wilkinson <[email protected]>

Merge pull request vllm-project#13 from vllm-model-0920/lwilkinson/bu…

fa13a8b

…ild-sparse-flash-mla Build and bind sparse-FlashMLA kernels

deepgemm integration

0f54ca6

fix clean logic

93eade0

Merge pull request vllm-project#20 from vllm-model-0920/wye-deepgemm-…

1e304d8

…integration [Feature] DeepGEMM integration

support mtp with indexer kv (vllm-project#21)

6a29a01

Co-authored-by: Lucia Fang <[email protected]>

fix indexer bs>1 (vllm-project#23)

75d382e

Signed-off-by: Chen Zhang <[email protected]>

fix build

9905f9d

Signed-off-by: Lucas Wilkinson <[email protected]>

fix import (vllm-project#24)

23e809c

Signed-off-by: Chen Zhang <[email protected]>

enable sparse by default

e19d0c9

Signed-off-by: Chen Zhang <[email protected]>

fix mla

bff5944

Signed-off-by: NickLucche <[email protected]>

Merge pull request vllm-project#28 from vllm-model-0920/fix

d689f18

Fix MLA for non dsv32 models

fix unify kv cache spec

b3a44bd

Signed-off-by: Chen Zhang <[email protected]>

rm files

d9693e8

Signed-off-by: youkaichao <[email protected]>

fix spacing

39d9d0e

Signed-off-by: youkaichao <[email protected]>

zejunchen-zejun reviewed Sep 30, 2025

View reviewed changes

csrc/cache_kernels.cu Show resolved Hide resolved

zejunchen-zejun reviewed Sep 30, 2025

View reviewed changes

csrc/cache_kernels.cu Show resolved Hide resolved

youkaichao added 3 commits September 30, 2025 15:30

add type for return value

c80dfd5

Signed-off-by: youkaichao <[email protected]>

add type for return value

07be34b

Signed-off-by: youkaichao <[email protected]>

fix for amd

a0264c7

Signed-off-by: youkaichao <[email protected]>

youkaichao reviewed Sep 30, 2025

View reviewed changes

youkaichao merged commit fa7e254 into vllm-project:main Sep 30, 2025
63 of 80 checks passed

zyongye deleted the dsv32-base branch September 30, 2025 12:50

createthis mentioned this pull request Sep 30, 2025

Feature Request: DeepSeek V3.2-Exp support ggml-org/llama.cpp#16331

Open

4 tasks

sdavidbd mentioned this pull request Sep 30, 2025

[V1] [P/D] Add Support for KV Load Failure Recovery #19330

Merged

youkaichao mentioned this pull request Sep 30, 2025

[bugfix][deepseek] fix flashmla kernel selection #25956

Merged

5 tasks

xuechendi mentioned this pull request Sep 30, 2025

[FIX_FOR_VLLM_LATEST] fix issue introduced by PR25896 and comment out still failing tests vllm-project/vllm-gaudi#292

Merged

xuechendi added a commit to vllm-project/vllm-gaudi that referenced this pull request Sep 30, 2025

[FIX_FOR_VLLM_LATEST] fix issue introduced by PR25896 and comment out…

c10d05f

… still failing tests (#292) vllm-project/vllm#25896 --------- Signed-off-by: Chendi Xue <[email protected]>

njhill mentioned this pull request Sep 30, 2025

[BugFix] Fix eagle test broken by DeepSeek-V3.2 PR #25978

Closed

gshtras mentioned this pull request Oct 1, 2025

[ROCm][Bugfix] Add missing parameter to ROCm backend #26029

Merged

This was referenced Oct 3, 2025

[vllm hash update] update the pinned vllm hash pytorch/pytorch#164319

Closed

Add deepseek-ai/DeepSeek-V3.2-Exp pytorch/pytorch-integration-testing#88

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[New Model] DeepSeek-V3.2 (Rebased to Main) #25896

[New Model] DeepSeek-V3.2 (Rebased to Main) #25896

zyongye commented Sep 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

youkaichao commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

heheda12345 commented Sep 30, 2025 •

edited

Loading

Uh oh!

youkaichao Sep 30, 2025

Uh oh!

youkaichao Sep 30, 2025

Uh oh!

youkaichao Sep 30, 2025

Uh oh!

youkaichao commented Sep 30, 2025

Uh oh!

Uh oh!

cjackal commented Sep 30, 2025

Uh oh!

njhill commented Sep 30, 2025 •

edited

Loading

Uh oh!

youkaichao commented Sep 30, 2025

Uh oh!

cjackal commented Oct 1, 2025

Uh oh!

Uh oh!

	if (descale_q is not None) and (descale_k is not None):
	if indices is not None:

		# Note(hc): need revisit when we support DCP with decode query_len > 1.
		return out.squeeze(1), softmax_lse.squeeze(-1)

Uh oh!

[New Model] DeepSeek-V3.2 (Rebased to Main) #25896

[New Model] DeepSeek-V3.2 (Rebased to Main) #25896

Conversation

zyongye commented Sep 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

heheda12345 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Sep 30, 2025

Uh oh!

Uh oh!

cjackal commented Sep 30, 2025

Uh oh!

njhill commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao commented Sep 30, 2025

Uh oh!

cjackal commented Oct 1, 2025

Uh oh!

Uh oh!

zyongye commented Sep 29, 2025 •

edited by github-actions bot

Loading

heheda12345 commented Sep 30, 2025 •

edited

Loading

njhill commented Sep 30, 2025 •

edited

Loading