vllm-project / vllm-gaudi Public

Notifications You must be signed in to change notification settings
Fork 91
Star 21

Code
Issues 1
Pull requests 73
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

Pull requests: vllm-project/vllm-gaudi

Labels 12 Milestones 1

New pull request New

73 Open 702 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Set device according to local rank

#788 opened Jan 8, 2026 by yangulei

Loading…

Enable HPU Fused SDPA for Qwen3-VL vision attention using attention masks

#787 opened Jan 7, 2026 by slokesha

Loading…

Draft: Add FlashAttention online merge in Unified Attention

#785 opened Jan 7, 2026 by kzawora-intel • Draft

[WIP] Add Chunked Shared Attention with Dense Biases

#784 opened Jan 7, 2026 by kzawora-intel • Draft

Revert "[GAUDISW-244336] Add missing long ctx prompt buckets (#739)"

#783 opened Jan 7, 2026 by wpyszka

Loading…

Migrate INCConfig for HPU

#779 opened Jan 6, 2026 by yiliu30

Loading…

lmcache ci test

#778 opened Jan 6, 2026 by hsubramony

Loading…

Implement profile_run method in HPU model runner

#775 opened Jan 4, 2026 by xwu-intel

Loading…

[FIX_FOR_VLLM_LATEST] Fix embedding models, after bug found in #27614

#774 opened Jan 2, 2026 by iboiko-habana

Loading…

Documentation: Fix the mobile navigation issue in v.0.13.0 documentation

Improvements or additions to documentation

skip-gaudi-tests

#772 opened Jan 2, 2026 by mhelf-intel

Loading…

GPT OSS Integration Code

#771 opened Jan 2, 2026 by hlahkar

Loading…

[GAUDISW-245117] add b2b matmul

#770 opened Jan 1, 2026 by linoybu

Loading…

Upgrade transformers>= 4.56.0, <5

#767 opened Dec 30, 2025 by iboiko-habana

Loading…

test - do not merge

#765 opened Dec 29, 2025 by nirda7

Loading…

Introduce absolute and relative padding limits to the linear bucketing

#762 opened Dec 26, 2025 by yangulei

Loading…

fix empty buckets issue for enforce eager mode

#761 opened Dec 25, 2025 by yangulei

Loading…

[GAUDISW-243560] Monkey-patching _get_attn_scale for the Llama4Attention layer

#760 opened Dec 24, 2025 by rsmyrek

Loading…

multimodal model embedding fixes

#759 opened Dec 23, 2025 by libinta

Loading…

debug inc

#755 opened Dec 23, 2025 by HolyFalafel • Draft

Prefill batching logic to handle chunked prefill/prefix caching for HPU

#753 opened Dec 23, 2025 by hlin99

Loading…

Release Notes for v0.13.0 documentation

Improvements or additions to documentation

skip-gaudi-tests

#750 opened Dec 22, 2025 by mhelf-intel

Loading…

[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim

#749 opened Dec 21, 2025 by dudilester

Loading…

Fix async_scheduling + batched prefill

#741 opened Dec 18, 2025 by tianmu-li

Loading…

Added Qwen3 Test

#736 opened Dec 18, 2025 by slokesha

Loading…

Dryrun implementation for generating command line file

#723 opened Dec 16, 2025 by rajanintel24

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!