-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add tree attention backend for v1 (part 1)
llama
Related to Llama models
speculative-decoding
v1
#20401
opened Jul 2, 2025 by
TheEpicDolphin
Loading…
[Misc] Fix ONLY add when PR is ready to merge/full CI is needed
v1
Unable to detect current VLLM config. Defaulting to NHD kv cache layout
warning
ready
#20400
opened Jul 2, 2025 by
NickLucche
Loading…
[Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning
#20396
opened Jul 2, 2025 by
LyrisZhong
Loading…
4 tasks
[Misc] Small: Remove global media connector. Each test should have its own test connector object.
documentation
Improvements or additions to documentation
multi-modality
Related to multi-modality (#4194)
tpu
Related to Google TPUs
v1
#20395
opened Jul 2, 2025 by
huachenheli
Loading…
Resolve the torch nightly sync issue
ci/build
documentation
Improvements or additions to documentation
ready
ONLY add when PR is ready to merge/full CI is needed
#20393
opened Jul 2, 2025 by
yangw-dev
Loading…
[Misc] Fused MoE Modular Kernel : Refactor Chunking loop
#20392
opened Jul 2, 2025 by
varun-sundar-rabindranath
Loading…
[CI] Trimming some failing test groups from AMDPRODUCTION.
ci/build
rocm
Related to AMD ROCm
#20390
opened Jul 2, 2025 by
Alexei-V-Ivanov-AMD
Loading…
[Misc] Small: Fix video loader return type annotations.
multi-modality
Related to multi-modality (#4194)
ready
ONLY add when PR is ready to merge/full CI is needed
#20389
opened Jul 2, 2025 by
huachenheli
Loading…
FIX: Add libnuma-dev to Dockerfile for dev stage
ci/build
#20388
opened Jul 2, 2025 by
dongbo910220
Loading…
[Hardware][POWER] Add Power (ppc64le)–specific CPU binding for VLLM_CPU_OMP_THREADS_BIND=auto
#20387
opened Jul 2, 2025 by
Akashcodes732
Loading…
2 of 4 tasks
[Misc] Rename Improvements or additions to documentation
structured-output
v1
DecodingConfig
to StructuredOutputConfig
documentation
#20386
opened Jul 2, 2025 by
njhill
Loading…
[TPU] Add a case to cover RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8
ci/build
llama
Related to Llama models
ready
ONLY add when PR is ready to merge/full CI is needed
tpu
Related to Google TPUs
#20385
opened Jul 2, 2025 by
QiliangCui
Loading…
3 tasks done
[Bugfix] Fix import of CutlassExpertsFp8 in compressed_tensors_moe.py
ready
ONLY add when PR is ready to merge/full CI is needed
#20381
opened Jul 2, 2025 by
bnellnm
Loading…
[CI/Build] Fix torch nightly CI dependencies part 3
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#20378
opened Jul 2, 2025 by
zou3519
Loading…
3 of 4 tasks
[Bugfix] Remove executable flag on a few files related to flash_attn and flashinfer
v1
#20377
opened Jul 2, 2025 by
tlrmchlsmth
Loading…
[Docs] Update EAGLE example
documentation
Improvements or additions to documentation
ready
ONLY add when PR is ready to merge/full CI is needed
#20375
opened Jul 2, 2025 by
NickLucche
Loading…
[Structured Outputs][V1] Skipping with models doesn't contain tokenizers
ready
ONLY add when PR is ready to merge/full CI is needed
structured-output
v1
#20365
opened Jul 2, 2025 by
aarnphm
Loading…
[Bugfix] Fix flaky ONLY add when PR is ready to merge/full CI is needed
test_streaming_response
test
ready
#20363
opened Jul 2, 2025 by
NickLucche
Loading…
[PP][V1]: Integrate Token Throttling into vLLM
v1
#20359
opened Jul 2, 2025 by
gty111
Loading…
4 tasks done
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.