-
Notifications
You must be signed in to change notification settings - Fork 46
AICSDEV-216: gaudi oss enablement #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
block_list = attn_metadata.block_list | ||
block_groups = attn_metadata.block_groups | ||
block_mapping = attn_metadata.block_mapping | ||
attn_bias = attn_metadata.attn_bias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should rename the attributes here. You can do
if not self.sliding_window or attn_metadata.window_block_list is None:
block_list = attn_metadata.block_list
block_groups = attn_metadata.block_groups
block_mapping = attn_metadata.block_mapping
attn_bias = attn_metadata.attn_bias
else:
block_list = attn_metadata.window_block_list
block_groups = attn_metadata.window_block_groups
block_mapping = attn_metadata.window_block_mapping
attn_bias = attn_metadata.window_attn_bias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah problem was this:
(Worker_TP7 pid=102175) ERROR 09-17 19:58:46 [multiproc_executor.py:671] AttributeError: 'TrimmedAttentionMetadata' object has no attribute 'window_block_list'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK nvm, I see the code in vllm-fork does have this missing attribute. Let me fix it in here..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xuechendi I tried PR# 150 which has sliding window support and following change on top of it, but unfortunately the accuracy still looks inaccurate.
diff --git a/vllm_gaudi/attention/backends/hpu_attn.py b/vllm_gaudi/attention/backends/hpu_attn.py
index a558079..1206cbe 100644
--- a/vllm_gaudi/attention/backends/hpu_attn.py
+++ b/vllm_gaudi/attention/backends/hpu_attn.py
@@ -351,8 +351,10 @@ class HPUAttentionImpl(AttentionImpl, torch.nn.Module):
attn_type: str = AttentionType.DECODER,
kv_sharing_target_layer_name: Optional[str] = None,
use_irope: bool = False,
+ sinks: Optional[int] = None,
) -> None:
super(AttentionImpl, self).__init__()
+ self._sinks = sinks
if kv_sharing_target_layer_name is not None:
raise NotImplementedError("KV sharing is not currently supported on HPU.")
if use_irope:
diff --git a/vllm_gaudi/extension/ops.py b/vllm_gaudi/extension/ops.py
index 4e01ec8..905d93d 100644
--- a/vllm_gaudi/extension/ops.py
+++ b/vllm_gaudi/extension/ops.py
@@ -484,7 +484,7 @@ class VllmMixtureOfExpertsOp(torch.nn.Module):
w12=w1_list,
w3=w2_list,
permuted_weights=permuted_weights,
- activation=activation,
+ activation="silu",
experts_min=self.experts_min,
experts_max=self.experts_max)
for i in range(self.moe_n_slice):
w3=w2_list, | ||
permuted_weights=permuted_weights, | ||
activation=activation, | ||
activation="silu", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If silu is necessary, pass through config instead of hard-code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like upstream vllm has this hardcoded to: swigluoai
.
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/gpt_oss.py#L158
acf8c82
to
ed4354e
Compare
No description provided.