-
Notifications
You must be signed in to change notification settings - Fork 101
Remove gather scatter v0.14.0 #857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove gather scatter v0.14.0 #857
Conversation
Also overwrite qwen3_vl function to use _merge_multimodal_embeddings with index copy.
Format update
format
update for pre-commit error: vllm_gaudi/models/qwen3_vl.py:62:25: F821 Undefined name `_require_is_multimodal`
Signed-off-by: Libin Tang <litang@habana.ai>
Following reasoning stated in PR: vllm-project#616 Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>
…ect#837) Signed-off-by: linoy buchnik <lbuchnik@habana.ai> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Adds support for cross-layer KV cache sharing on HPU, enabling models like Gemma-3n that share KV cache between layers to run on Gaudi. **Changes** - hpu_attn.py: Store kv_sharing_target_layer_name and skip KV cache writes for sharing layers - hpu_model_runner.py: Track shared layers, validate config, and set up tensor sharing during initialization - test_hpu_model_runner.py: Enable KV sharing unit tests **Expected Benefits** Reduced KV cache memory usage for models with layer sharing Lower TTFT for long-context scenarios in supported models (e.g., Gemma-3n) **Testing** Unit tests pass E2E validation with a KV-sharing model (e.g., Gemma-3n) pending --------- Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com> Co-authored-by: jakub-sochacki <jakub.sochacki@intel.com>
Signed-off-by: Shiv Kaul <shiv.kaul@intel.com>
| from vllm.v1.attention.backends.mla.common import MLACommonImpl | ||
| from vllm_gaudi.attention.ops.hpu_paged_attn import (HPUPagedAttention, HPUPagedAttentionMetadata, | ||
| HPUPagedAttentionMetadataBuilder) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't change this file in original PR
.github/CODEOWNERS
Outdated
| # See https://help.github.com/articles/about-codeowners/ | ||
| # for more info about CODEOWNERS file | ||
|
|
||
| * @kzawora-intel @xuechendi @adobrzyn @mgawarkiewicz-intel @afierka-intel @michalkuligowski @iboiko-habana @kamil-kaczor @ksmusz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't change this file in original PR
| query, key, value, attn_bias, 0.0, is_causal, scale, softmax_mode, recompute_mode, valid_seq_lengths, | ||
| padding_side | ||
| ] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didnt chang this file in original PR
|
moved to #858 |
port #759 to v0.14.0 release