brgemm: support arbitrary K on AMX #2319

ankalinin · 2024-12-27T00:25:55Z

There are two problems regarding brgemm K value on AMX:

brgemm doesn't support K not divisible by vnni granularity
for K not divisible by tile width (32 or 64) the blocking by K dimension may be not optimal.
To get around this limitation brgemm primitives are forced to either transform the matrix A or call many small brgemm kernels with different tile configurations. Both ways leads to performance lost.
This PR implements support of arbitrary K in brgemm on AMX and implements corresponding updates in 1x1 convolutions and in matmul to use this new ability.

Performance update for convolutions:

Performance update for matmul

ankalinin · 2024-12-27T00:29:58Z

make test
disable device_gpu

theComputeKid · 2025-01-03T15:41:36Z

I restarted the AArch64 CI because the encountered failure is a known sporadic bug on the c7g.

theComputeKid · 2025-01-03T15:44:08Z

cc: @Radu2k

ankalinin · 2025-01-04T00:21:01Z

I restarted the AArch64 CI because the encountered failure is a known sporadic bug on the c7g.

Thanks!

Radu2k · 2025-01-06T16:03:00Z

Hi @ankalinin, the code looks good to me, but even if this is a minimal invasive AArch64 change we need to run a performance analysis to be on the safe side. Could you please let me know what benchdnn test/s did you run to get the benchmark numbers? We will run them for AArch64 and then should be good to go in if no major regressions show up.

ankalinin · 2025-01-06T18:19:46Z

Hi @ankalinin, the code looks good to me, but even if this is a minimal invasive AArch64 change we need to run a performance analysis to be on the safe side. Could you please let me know what benchdnn test/s did you run to get the benchmark numbers? We will run them for AArch64 and then should be good to go in if no major regressions show up.

Hi, @Radu2k. The changes were made in AArch64 code only to avoid compile errors. I don't expect any performance changes in AArch64 part.
What benchdnn testing do you usually use for performance testing?

Anyway, for cpu commands may be like this:
benchdnn --mode=P --matmul --batch=tests/benchdnn/inputs/matmul/harness_matmul_runtime_f32
benchdnn --mode=P --conv --batch=tests/benchdnn/inputs/conv/harness_conv_f32

vpirogov · 2025-01-07T21:26:27Z

We will run them for AArch64 and then should be good to go in if no major regressions show up.

@Radu2k, if you still plan to run performance validation considering @ankalinin's explanation please let us know when you plan to do that. I would really like to promote these changes by v3.7 code freeze, which is this Friday.

tprimak · 2025-01-07T21:38:33Z

@Radu2k the change renames a variable and adds initialization for another 2 variables. This not exactly a kind of change that requires a performance study

Radu2k · 2025-01-08T12:25:47Z

@ankalinin @vpirogov I have just finished running the performance checks and it looks fine, no regressions showed up.

@tprimak As mentioned above, even if it is a minimal invasive change, we run these lightweight performances checks for all PRs before approving.

vpirogov · 2025-01-08T17:27:54Z

Thanks, @Radu2k!

ankalinin requested review from a team as code owners December 27, 2024 00:25

github-actions bot added platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 component:tests Codeowner: @oneapi-src/onednn-arch labels Dec 27, 2024

ankalinin removed the component:tests Codeowner: @oneapi-src/onednn-arch label Dec 27, 2024

densamoilov approved these changes Dec 27, 2024

View reviewed changes

dzarukin approved these changes Dec 27, 2024

View reviewed changes

ankalinin force-pushed the akalinin/brgemm_k_tail_pr branch from 527c2a0 to 4bd35fa Compare January 2, 2025 20:55

ankalinin requested a review from a team as a code owner January 2, 2025 20:55

github-actions bot added platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 component:tests Codeowner: @oneapi-src/onednn-arch labels Jan 2, 2025

ankalinin force-pushed the akalinin/brgemm_k_tail_pr branch from 4bd35fa to 280499f Compare January 2, 2025 22:15

x64: brgemm: common base class for all brgemm jit kernels

29ebf27

ankalinin force-pushed the akalinin/brgemm_k_tail_pr branch from 280499f to b2df32f Compare January 3, 2025 19:00

ankalinin added 4 commits January 3, 2025 14:45

x64: brgemm: support arbitrary K for AMX

ea5d427

x64: brgemm 1x1 conv: support arbitrary ic without rtus

cf63e71

x64: brgemm_matmul_copy_utils: support arbitrary padding

7460c02

x64: brgemm matmul: support arbitrary K on AMX

b1dbf4f

ankalinin force-pushed the akalinin/brgemm_k_tail_pr branch from b2df32f to b1dbf4f Compare January 3, 2025 22:45

Radu2k approved these changes Jan 8, 2025

View reviewed changes

ankalinin merged commit bc0ce23 into main Jan 8, 2025
17 checks passed

ankalinin deleted the akalinin/brgemm_k_tail_pr branch January 8, 2025 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

brgemm: support arbitrary K on AMX #2319

brgemm: support arbitrary K on AMX #2319

ankalinin commented Dec 27, 2024 •

edited

Loading

ankalinin commented Dec 27, 2024

theComputeKid commented Jan 3, 2025

theComputeKid commented Jan 3, 2025

ankalinin commented Jan 4, 2025

Radu2k commented Jan 6, 2025

ankalinin commented Jan 6, 2025

vpirogov commented Jan 7, 2025

tprimak commented Jan 7, 2025

Radu2k commented Jan 8, 2025

vpirogov commented Jan 8, 2025

brgemm: support arbitrary K on AMX #2319

brgemm: support arbitrary K on AMX #2319

Conversation

ankalinin commented Dec 27, 2024 • edited Loading

ankalinin commented Dec 27, 2024

theComputeKid commented Jan 3, 2025

theComputeKid commented Jan 3, 2025

ankalinin commented Jan 4, 2025

Radu2k commented Jan 6, 2025

ankalinin commented Jan 6, 2025

vpirogov commented Jan 7, 2025

tprimak commented Jan 7, 2025

Radu2k commented Jan 8, 2025

vpirogov commented Jan 8, 2025

ankalinin commented Dec 27, 2024 •

edited

Loading