-
Notifications
You must be signed in to change notification settings - Fork 91
Migrate INCConfig for HPU
#779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: yiliu30 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR migrates the HPU-specific INCConfig implementation to maintain compatibility as the main vLLM repository moves INC configuration for GPU/CPU use. A placeholder _FakeINCConfig is introduced to handle the "inc" quantization method in the HPU plugin while delegating actual quantization to unquantized methods.
Key Changes:
- Introduces a new
_FakeINCConfigclass that acts as a stub for Intel Neural Compressor quantization - Monkey-patches vLLM's
get_quantization_configfunction to intercept "inc" requests and return the fake config
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| vllm_gaudi/extension/quant.py | New file defining _FakeINCConfig stub that returns unquantized methods for linear and MoE layers |
| vllm_gaudi/extension/ops.py | Adds monkey-patch function oot_get_quantization_config to override vLLM's quantization config retrieval for "inc" method |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
✅ CI PassedAll checks passed successfully against the following vllm commit: |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
As part of vllm-project/vllm#31716, the
INCConfigin vllm-project/vllm will be used for GPU/CPU. However, we still need a placeholder for the INC path in the plugin.cc @hshen14 @thuang6 @kzawora-intel @xuechendi