-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
brgemm: support arbitrary K on AMX #2319
Conversation
make test |
527c2a0
to
4bd35fa
Compare
4bd35fa
to
280499f
Compare
I restarted the AArch64 CI because the encountered failure is a known sporadic bug on the c7g. |
cc: @Radu2k |
280499f
to
b2df32f
Compare
b2df32f
to
b1dbf4f
Compare
Thanks! |
Hi @ankalinin, the code looks good to me, but even if this is a minimal invasive AArch64 change we need to run a performance analysis to be on the safe side. Could you please let me know what benchdnn test/s did you run to get the benchmark numbers? We will run them for AArch64 and then should be good to go in if no major regressions show up. |
Hi, @Radu2k. The changes were made in AArch64 code only to avoid compile errors. I don't expect any performance changes in AArch64 part. Anyway, for cpu commands may be like this: |
@Radu2k, if you still plan to run performance validation considering @ankalinin's explanation please let us know when you plan to do that. I would really like to promote these changes by v3.7 code freeze, which is this Friday. |
@Radu2k the change renames a variable and adds initialization for another 2 variables. This not exactly a kind of change that requires a performance study |
@ankalinin @vpirogov I have just finished running the performance checks and it looks fine, no regressions showed up. @tprimak As mentioned above, even if it is a minimal invasive change, we run these lightweight performances checks for all PRs before approving. |
Thanks, @Radu2k! |
There are two problems regarding brgemm K value on AMX:
brgemm doesn't support K not divisible by vnni granularity
for K not divisible by tile width (32 or 64) the blocking by K dimension may be not optimal.
To get around this limitation brgemm primitives are forced to either transform the matrix A or call many small brgemm kernels with different tile configurations. Both ways leads to performance lost.
This PR implements support of arbitrary K in brgemm on AMX and implements corresponding updates in 1x1 convolutions and in matmul to use this new ability.
Performance update for convolutions:
![image](https://private-user-images.githubusercontent.com/19217783/399784985-53ca1011-d9da-4be2-b76a-066100c44084.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2MDg4NjksIm5iZiI6MTczOTYwODU2OSwicGF0aCI6Ii8xOTIxNzc4My8zOTk3ODQ5ODUtNTNjYTEwMTEtZDlkYS00YmUyLWI3NmEtMDY2MTAwYzQ0MDg0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDA4MzYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTUxYzI2MDg1OTI1ZDdmZmFhNmNhY2U4Y2Y1MmNiYjIzZWEwYWJjMGUxZTgwMDYzN2VkOTZiNGY5OGJiMmYxY2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.k7yBpYNgrRp5Qw3nnCQsul9ZUs10JJqzfs0ULC1yWBU)
Performance update for matmul
![image](https://private-user-images.githubusercontent.com/19217783/399785013-3c8a4ae6-7bc2-4390-9db5-7753639605e4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2MDg4NjksIm5iZiI6MTczOTYwODU2OSwicGF0aCI6Ii8xOTIxNzc4My8zOTk3ODUwMTMtM2M4YTRhZTYtN2JjMi00MzkwLTlkYjUtNzc1MzYzOTYwNWU0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDA4MzYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWIxNzFjZjA1ZGQ2ZjBmMDBlYTNiYmNjMjIzYzljYjg5ZGEyNThjN2IyZjkyMjIwMmU5NDg5NmEyZjFjMTQ3NGMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.wIvofMOVoWdJsmrj3R4oXLB7336Yn03S04P8akFOp6s)