-
Notifications
You must be signed in to change notification settings - Fork 101
GPT OSS Integration Code #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR integrates support for the GPT OSS model type, including additions for handling model-specific routing logic, bias support in MoE layers, and attention sink mechanisms for improved inference.
- Adds GPT OSS-specific expert routing and softmax handling in the MoE forward pass
- Implements bias support throughout the MoE pipeline
- Introduces attention sink functionality across attention backends and operations
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| vllm_gaudi/v1/worker/hpu_model_runner.py | Increases sliding window block size calculation by 1 |
| vllm_gaudi/ops/hpu_fused_moe.py | Adds GPT OSS model type detection, bias handling in MoE operations, and model-specific expert routing |
| vllm_gaudi/extension/utils.py | Adds sinks parameter support to forward pass |
| vllm_gaudi/extension/ops.py | Implements sink attention mechanism in pipelined and naive attention functions, adds bias support to MoE operations |
| vllm_gaudi/attention/backends/hpu_attn.py | Adds sinks parameter and dtype consistency checks in attention implementation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
|
/run-gaudi-tests |
Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
|
/run-gaudi-tests |
|
Only codeowners and testowners can request to run Gaudi tests. Contact list: kzawora-intel, xuechendi, adobrzyn, mgawarkiewicz-intel, afierka-intel, michalkuligowski, iboiko-habana, kamil-kaczor, ksmusz, PatrykWo, kamil-kaczor, kfojcik-intel, ksmusz, wuxun-zhang, xuechendi, attafosu, ulivne, Kacper-Pietkun, iboiko-habana, jkaniecki, jbyczkow, wpyszka |
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
|
/run-gaudi-tests |
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
No description provided.