Skip to content

Conversation

@linoybu
Copy link

@linoybu linoybu commented Jan 1, 2026

No description provided.

@github-actions
Copy link

github-actions bot commented Jan 1, 2026

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a B2BMatmul class for batch-to-block matrix multiplication operations. The change replaces the generic Matmul class with the new B2BMatmul class for batch2block_matmul and block2batch_matmul operations in the attention backend.

  • Added new B2BMatmul class that inherits from Matmul
  • Updated attention backend to use B2BMatmul for batch-to-block operations
  • Modified import statement to include the new class

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
vllm_gaudi/extension/utils.py Defines the new B2BMatmul class inheriting from Matmul
vllm_gaudi/attention/backends/hpu_attn.py Updates batch2block_matmul and block2batch_matmul to use B2BMatmul instead of Matmul
vllm_gaudi/extension/ops.py Contains an apparent typo in variable name

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

github-actions bot commented Jan 5, 2026

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Signed-off-by: linoy buchnik <[email protected]>
@linoybu linoybu requested a review from Copilot January 6, 2026 07:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@linoybu
Copy link
Author

linoybu commented Jan 6, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Linoy Buchnik <[email protected]>
@linoybu
Copy link
Author

linoybu commented Jan 6, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

@linoybu linoybu closed this Jan 6, 2026
@linoybu linoybu reopened this Jan 6, 2026
This class is intentionally kept functionally identical to ``Matmul``.
It exists to provide semantic distinction in the codebase (e.g., for
patterns that specifically require back-to-back matmul) and to allow
future customization without changing call sites.
Copy link
Contributor

@dudilester dudilester Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe edit the comment to be more specific, change back-to-back to batch2block/block2batch and explain the reasoning for it, that it is used by the INC to adjust the scale to the needed values of the input tensor as some of them are discarded by the 2nd input which is kind of a mask mapping

Signed-off-by: Linoy Buchnik <[email protected]>
Signed-off-by: Linoy Buchnik <[email protected]>
Signed-off-by: Linoy Buchnik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants