Remove gather scatter v0.14.0 #857

skaulintel · 2026-01-21T18:49:24Z

port #759 to v0.14.0 release

Also overwrite qwen3_vl function to use _merge_multimodal_embeddings with index copy.

Format update

format

update for pre-commit error: vllm_gaudi/models/qwen3_vl.py:62:25: F821 Undefined name `_require_is_multimodal`

Signed-off-by: Libin Tang <litang@habana.ai>

Following reasoning stated in PR: vllm-project#616 Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>

…ect#837) Signed-off-by: linoy buchnik <lbuchnik@habana.ai> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Iryna Boiko <iboiko@habana.ai>

Adds support for cross-layer KV cache sharing on HPU, enabling models like Gemma-3n that share KV cache between layers to run on Gaudi. **Changes** - hpu_attn.py: Store kv_sharing_target_layer_name and skip KV cache writes for sharing layers - hpu_model_runner.py: Track shared layers, validate config, and set up tensor sharing during initialization - test_hpu_model_runner.py: Enable KV sharing unit tests **Expected Benefits** Reduced KV cache memory usage for models with layer sharing Lower TTFT for long-context scenarios in supported models (e.g., Gemma-3n) **Testing** Unit tests pass E2E validation with a KV-sharing model (e.g., Gemma-3n) pending --------- Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com> Co-authored-by: jakub-sochacki <jakub.sochacki@intel.com>

Signed-off-by: Shiv Kaul <shiv.kaul@intel.com>

libinta · 2026-01-21T19:09:23Z

vllm_gaudi/attention/backends/hpu_attn.py

+from vllm.v1.attention.backends.mla.common import MLACommonImpl
 from vllm_gaudi.attention.ops.hpu_paged_attn import (HPUPagedAttention, HPUPagedAttentionMetadata,
                                                     HPUPagedAttentionMetadataBuilder)



didn't change this file in original PR

libinta · 2026-01-21T19:09:38Z

.github/CODEOWNERS

-# See https://help.github.com/articles/about-codeowners/
-# for more info about CODEOWNERS file
-
-* @kzawora-intel @xuechendi @adobrzyn @mgawarkiewicz-intel @afierka-intel @michalkuligowski @iboiko-habana @kamil-kaczor @ksmusz


didn't change this file in original PR

libinta · 2026-01-21T19:10:03Z

vllm_gaudi/extension/ops.py

        query, key, value, attn_bias, 0.0, is_causal, scale, softmax_mode, recompute_mode, valid_seq_lengths,
        padding_side
    ]
+


didnt chang this file in original PR

skaulintel · 2026-01-21T19:37:44Z

moved to #858

libinta and others added 30 commits December 23, 2025 14:11

Pick model runner change related to PR30475.

9d8e272

Also overwrite qwen3_vl function to use _merge_multimodal_embeddings with index copy.

add qwen3_vl.py functions

49d7633

Merge branch 'main' into libinta/remove_gather_scatter

bdff63f

precomit fix

c6526de

precommit fix and fix use_window_sdpa

7c6329e

Update qwen3_vl.py

bff3cf5

Format update

Update qwen3_vl.py

625d9c2

format

Merge branch 'main' into libinta/remove_gather_scatter

568b4eb

Merge branch 'main' into libinta/remove_gather_scatter

495643a

Merge branch 'main' into libinta/remove_gather_scatter

327a9cc

Update qwen3_vl.py

bb3ac24

update for pre-commit error: vllm_gaudi/models/qwen3_vl.py:62:25: F821 Undefined name `_require_is_multimodal`

Merge branch 'main' into libinta/remove_gather_scatter

fe67f98

Merge branch 'main' into libinta/remove_gather_scatter

8a9efd1

Merge branch 'main' into libinta/remove_gather_scatter

a394b9a

Merge branch 'main' into libinta/remove_gather_scatter

6502061

fix test failure

48a96db

Merge branch 'main' into libinta/remove_gather_scatter

0171641

fix precommit issue

db10548

Update interfaces.py for precommit fix

40d7635

Update hpu_model_runner.py to match with upstream for MultiModalBudget

e23e6d2

Merge branch 'main' into libinta/remove_gather_scatter

46facad

Signed-off-by: Libin Tang <litang@habana.ai>

Update qwen3_vl.py for precommit fix

4089adf

Update qwen3_vl.py for precommit fix

79d90a4

Merge branch 'main' into libinta/remove_gather_scatter

e370a49

Interleaved sliding window fix (vllm-project#805)

0321b6c

Following reasoning stated in PR: vllm-project#616 Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>

Merge branch 'main' into libinta/remove_gather_scatter

0df1f20

[GAUDISW-245665] fix diverge from vllm in multiModalBudget (vllm-proj…

3108ed8

…ect#837) Signed-off-by: linoy buchnik <lbuchnik@habana.ai> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Iryna Boiko <iboiko@habana.ai>

Merge branch 'main' into libinta/remove_gather_scatter

5fdf237

add back warmup with ratio and video warmup

07f40c9

Merge branch 'releases/v0.14.0' into libinta/remove_gather_scatter

7252043

Signed-off-by: Shiv Kaul <shiv.kaul@intel.com>

skaulintel requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners January 21, 2026 18:49

libinta reviewed Jan 21, 2026

View reviewed changes

skaulintel changed the base branch from main to releases/v0.14.0 January 21, 2026 19:13

skaulintel requested review from piotrbocian and wpyszka as code owners January 21, 2026 19:13

skaulintel closed this Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove gather scatter v0.14.0 #857

Remove gather scatter v0.14.0 #857

Uh oh!

skaulintel commented Jan 21, 2026

Uh oh!

libinta Jan 21, 2026

Uh oh!

libinta Jan 21, 2026

Uh oh!

libinta Jan 21, 2026

Uh oh!

skaulintel commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Remove gather scatter v0.14.0 #857

Remove gather scatter v0.14.0 #857

Uh oh!

Conversation

skaulintel commented Jan 21, 2026

Uh oh!

libinta Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

libinta Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

libinta Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

skaulintel commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants