Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

brgemm: support arbitrary K on AMX #2319

Merged
merged 5 commits into from
Jan 8, 2025
Merged

Conversation

ankalinin
Copy link
Contributor

@ankalinin ankalinin commented Dec 27, 2024

There are two problems regarding brgemm K value on AMX:

brgemm doesn't support K not divisible by vnni granularity
for K not divisible by tile width (32 or 64) the blocking by K dimension may be not optimal.
To get around this limitation brgemm primitives are forced to either transform the matrix A or call many small brgemm kernels with different tile configurations. Both ways leads to performance lost.
This PR implements support of arbitrary K in brgemm on AMX and implements corresponding updates in 1x1 convolutions and in matmul to use this new ability.

Performance update for convolutions:
image

Performance update for matmul
image

@ankalinin ankalinin requested review from a team as code owners December 27, 2024 00:25
@github-actions github-actions bot added platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 component:tests Codeowner: @oneapi-src/onednn-arch labels Dec 27, 2024
@ankalinin ankalinin removed the component:tests Codeowner: @oneapi-src/onednn-arch label Dec 27, 2024
@ankalinin
Copy link
Contributor Author

make test
disable device_gpu

@ankalinin ankalinin force-pushed the akalinin/brgemm_k_tail_pr branch from 527c2a0 to 4bd35fa Compare January 2, 2025 20:55
@ankalinin ankalinin requested a review from a team as a code owner January 2, 2025 20:55
@github-actions github-actions bot added platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 component:tests Codeowner: @oneapi-src/onednn-arch labels Jan 2, 2025
@ankalinin ankalinin force-pushed the akalinin/brgemm_k_tail_pr branch from 4bd35fa to 280499f Compare January 2, 2025 22:15
@theComputeKid
Copy link
Member

I restarted the AArch64 CI because the encountered failure is a known sporadic bug on the c7g.

@theComputeKid
Copy link
Member

cc: @Radu2k

@ankalinin ankalinin force-pushed the akalinin/brgemm_k_tail_pr branch from 280499f to b2df32f Compare January 3, 2025 19:00
@ankalinin ankalinin force-pushed the akalinin/brgemm_k_tail_pr branch from b2df32f to b1dbf4f Compare January 3, 2025 22:45
@ankalinin
Copy link
Contributor Author

I restarted the AArch64 CI because the encountered failure is a known sporadic bug on the c7g.

Thanks!

@Radu2k
Copy link
Contributor

Radu2k commented Jan 6, 2025

Hi @ankalinin, the code looks good to me, but even if this is a minimal invasive AArch64 change we need to run a performance analysis to be on the safe side. Could you please let me know what benchdnn test/s did you run to get the benchmark numbers? We will run them for AArch64 and then should be good to go in if no major regressions show up.

@ankalinin
Copy link
Contributor Author

Hi @ankalinin, the code looks good to me, but even if this is a minimal invasive AArch64 change we need to run a performance analysis to be on the safe side. Could you please let me know what benchdnn test/s did you run to get the benchmark numbers? We will run them for AArch64 and then should be good to go in if no major regressions show up.

Hi, @Radu2k. The changes were made in AArch64 code only to avoid compile errors. I don't expect any performance changes in AArch64 part.
What benchdnn testing do you usually use for performance testing?

Anyway, for cpu commands may be like this:
benchdnn --mode=P --matmul --batch=tests/benchdnn/inputs/matmul/harness_matmul_runtime_f32
benchdnn --mode=P --conv --batch=tests/benchdnn/inputs/conv/harness_conv_f32

@vpirogov
Copy link
Member

vpirogov commented Jan 7, 2025

We will run them for AArch64 and then should be good to go in if no major regressions show up.

@Radu2k, if you still plan to run performance validation considering @ankalinin's explanation please let us know when you plan to do that. I would really like to promote these changes by v3.7 code freeze, which is this Friday.

@tprimak
Copy link
Contributor

tprimak commented Jan 7, 2025

@Radu2k the change renames a variable and adds initialization for another 2 variables. This not exactly a kind of change that requires a performance study

@Radu2k
Copy link
Contributor

Radu2k commented Jan 8, 2025

@ankalinin @vpirogov I have just finished running the performance checks and it looks fine, no regressions showed up.

@tprimak As mentioned above, even if it is a minimal invasive change, we run these lightweight performances checks for all PRs before approving.

@vpirogov
Copy link
Member

vpirogov commented Jan 8, 2025

Thanks, @Radu2k!

@ankalinin ankalinin merged commit bc0ce23 into main Jan 8, 2025
17 checks passed
@ankalinin ankalinin deleted the akalinin/brgemm_k_tail_pr branch January 8, 2025 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:tests Codeowner: @oneapi-src/onednn-arch platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants