-
-
Notifications
You must be signed in to change notification settings - Fork 5k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[v1][bugfix] fix cudagraph with inplace buffer assignment
ready
ONLY add when PR is ready to merge/full CI is needed
#11596
opened Dec 29, 2024 by
youkaichao
Loading…
[V1] 7/N API Server: Update LM-Eval To Use Streaming
ci/build
#11590
opened Dec 28, 2024 by
robertgshaw2-neuralmagic
Loading…
[Kernel] Triton Configs for Fp8 Block Quantization
#11589
opened Dec 28, 2024 by
robertgshaw2-neuralmagic
Loading…
[V1] [6/N] API Server: Better Shutdown
frontend
#11586
opened Dec 28, 2024 by
robertgshaw2-neuralmagic
Loading…
[Bugfix] Reduce prefix prefill block size for Pascal
#11584
opened Dec 28, 2024 by
sasha0552
Loading…
[misc] Add LoRA kernel micro benchmarks
#11579
opened Dec 28, 2024 by
varun-sundar-rabindranath
Loading…
[benchmark] Remove dependency for H100 benchmark step
ci/build
#11572
opened Dec 27, 2024 by
khluu
Loading…
[Frontend] Improve Error Handling
documentation
Improvements or additions to documentation
frontend
needs-rebase
#11570
opened Dec 27, 2024 by
robertgshaw2-neuralmagic
Loading…
[Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks.
#11565
opened Dec 27, 2024 by
sakunkun
Loading…
[Model] LoRA with lm_head and embed_tokens fully trained - 3
#11558
opened Dec 27, 2024 by
sergeykochetkov
Loading…
[Frontend] [Bugfix] Refactor tool parsers and simplify the tool parsing interface.
ci/build
frontend
#11554
opened Dec 27, 2024 by
elementary-particle
•
Draft
[Misc] Speculative Decoding: Adding Mean Accept Length Metric
#11552
opened Dec 27, 2024 by
MMuzzammil1
Loading…
Bounded peak memory in Top-P-Top-K with chunked sorting
ci/build
#11544
opened Dec 27, 2024 by
yangalan123
Loading…
[Benchmark] Add benchmark script for CPU offloading
ready
ONLY add when PR is ready to merge/full CI is needed
#11533
opened Dec 26, 2024 by
ApostaC
Loading…
[Core] Block Allocator to support KV cache CPU offloading
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
#11532
opened Dec 26, 2024 by
ApostaC
Loading…
[Core] Performance optimization for swap_blocks by cuda kernels
ready
ONLY add when PR is ready to merge/full CI is needed
#11531
opened Dec 26, 2024 by
ApostaC
Loading…
[BugFix] Fix parameter names and
process_after_weight_loading
for W4A16 MoE Group Act Order
#11528
opened Dec 26, 2024 by
dsikka
Loading…
[Platform] Move get_punica_wrapper() function to Platform
#11516
opened Dec 26, 2024 by
shen-shanshan
•
Draft
Previous Next
ProTip!
Follow long discussions with comments:>50.