Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64: updates for brgemm kernel and avx2 brgemm convolutions #2420

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ankalinin
Copy link
Contributor

@ankalinin ankalinin commented Jan 15, 2025

This request contains a few general brgemm code changes and some performance updates for avx2:

  • brgemm: introduce brgemm_desc_finalize function : this function must be called after all the parameters of the brgemm descriptor are set. As result brgemm blocking is called only once having all needed information
  • brgemm conv: update to reduce the number of generated kernels: small update for brgemm non-amx convolutions
  • brgemm: update non-amx blocking: more careful distribution of available vmm registers. It's especially important for avx2. It also determine which order of data loading in microkernel to choose - n_bcast_1_load or 1_bcast_n_load .
  • brgemm kernel: update gemm microkernel: just update the kernel code
  • brgemm kernel: reduce number of vpad variants: it reduces the size of generated kernel
  • brgemm conv: knob for brgemm microkernel decomposition: we may want to tune microkernel decomposition
  • brgemm conv: update loop nest - move loop by ic_chunks to the top: we usually don't use blocking by input channels. If we do it makes more sense to have a corresponding block loop at the top level
  • brgemm conv: introduce loop order gcndhw: for some shapes such loop order makes sense in terms of performance
  • brgemm conv: update cache constants and ur estimation: update platform values used in convolution blocking selection
  • brgemm conv: heuristic for big convolution on avx2

Performance testing by openvino on MTL. Here are a ratio brgemm to jit implementation for rls-v3.5 and for this brgemm update over v3.5:

image

@ankalinin ankalinin requested a review from a team as a code owner January 15, 2025 21:17
@github-actions github-actions bot added the platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 label Jan 15, 2025
@ankalinin
Copy link
Contributor Author

make test
disable test_device_gpu
disable build_gpu_runtime_ocl
disable build_gpu_runtime_sycl

@ankalinin ankalinin force-pushed the akalinin/avx2_brgemm_conv_pr branch 4 times, most recently from 555c553 to ccce0f8 Compare January 16, 2025 18:42
@ankalinin ankalinin requested a review from a team as a code owner January 16, 2025 18:42
@github-actions github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Jan 16, 2025
@ankalinin ankalinin changed the title [WIP] x64: updates for brgemm kernel and avx2 brgemm convolutions x64: updates for brgemm kernel and avx2 brgemm convolutions Jan 16, 2025
@ankalinin ankalinin force-pushed the akalinin/avx2_brgemm_conv_pr branch from ccce0f8 to e359903 Compare January 16, 2025 19:25
@ankalinin ankalinin requested a review from a team as a code owner January 16, 2025 19:25
@github-actions github-actions bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Jan 16, 2025
@ankalinin ankalinin force-pushed the akalinin/avx2_brgemm_conv_pr branch from e359903 to c00d9d9 Compare January 16, 2025 19:29
@Sqvid
Copy link
Contributor

Sqvid commented Jan 17, 2025

@Radu2k could you have a look at the aarch64 changes? Thanks

Copy link
Contributor

@Radu2k Radu2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also no regressions in terms of performance. Thanks!

@ankalinin ankalinin force-pushed the akalinin/avx2_brgemm_conv_pr branch from c00d9d9 to 69b80e9 Compare January 17, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:tests Codeowner: @oneapi-src/onednn-arch platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants