-
Notifications
You must be signed in to change notification settings - Fork 1.1k
x64: updates for brgemm kernel and avx2 brgemm convolutions #2420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
make test |
555c553
to
ccce0f8
Compare
ccce0f8
to
e359903
Compare
e359903
to
c00d9d9
Compare
@Radu2k could you have a look at the aarch64 changes? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, also no regressions in terms of performance. Thanks!
69b80e9
to
9946fdc
Compare
9946fdc
to
84db225
Compare
84db225
to
772528b
Compare
I removed commit "brgemm conv: update loop nest - move loop by ic_chunks to the top: we usually don't use blocking by input channels. If we do it makes more sense to have a corresponding block loop at the top level" from this PR as it not directly related to purpose of avx2 convolution optimization and should be updated and included to other PR |
This request contains a few general brgemm code changes and some performance updates for avx2:
Performance testing by openvino on MTL. Here are a ratio brgemm to jit implementation for rls-v3.5 and for this brgemm update over v3.5: