-
Notifications
You must be signed in to change notification settings - Fork 91
[GAUDISW-245117] add b2b matmul #770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a B2BMatmul class for batch-to-block matrix multiplication operations. The change replaces the generic Matmul class with the new B2BMatmul class for batch2block_matmul and block2batch_matmul operations in the attention backend.
- Added new
B2BMatmulclass that inherits fromMatmul - Updated attention backend to use
B2BMatmulfor batch-to-block operations - Modified import statement to include the new class
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| vllm_gaudi/extension/utils.py | Defines the new B2BMatmul class inheriting from Matmul |
| vllm_gaudi/attention/backends/hpu_attn.py | Updates batch2block_matmul and block2batch_matmul to use B2BMatmul instead of Matmul |
| vllm_gaudi/extension/ops.py | Contains an apparent typo in variable name |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Signed-off-by: linoy buchnik <[email protected]>
7cd4a46 to
bbebcc1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Co-authored-by: Copilot <[email protected]> Signed-off-by: Linoy Buchnik <[email protected]>
|
@copilot open a new pull request to apply changes based on the comments in this thread |
| This class is intentionally kept functionally identical to ``Matmul``. | ||
| It exists to provide semantic distinction in the codebase (e.g., for | ||
| patterns that specifically require back-to-back matmul) and to allow | ||
| future customization without changing call sites. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe edit the comment to be more specific, change back-to-back to batch2block/block2batch and explain the reasoning for it, that it is used by the INC to adjust the scale to the needed values of the input tensor as some of them are discarded by the 2nd input which is kind of a mask mapping
Signed-off-by: Linoy Buchnik <[email protected]>
Signed-off-by: Linoy Buchnik <[email protected]>
Signed-off-by: Linoy Buchnik <[email protected]>
No description provided.