Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xe: jit: gemm: TLB warmup #2631

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

xe: jit: gemm: TLB warmup #2631

wants to merge 2 commits into from

Conversation

petercad
Copy link
Contributor

@petercad petercad commented Feb 7, 2025

New solution to the problem in #2607 / MFDNN-12523. This PR improves cold-TLB performance of specific TN GEMV compressed weights kernels on MTL/ARL by adding an extra warmup workgroup whose only job is to probe every 64k page in the A/B matrices (along with scales/zp if present). In case of a TLB miss these probes will initiate page walks that will fill the STLB so that the other workgroups (doing the real work) will not incur the latency penalties from page walks later.

Shows 15-20% speedup on many cases of interest.

@petercad petercad requested a review from a team as a code owner February 7, 2025 22:21
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Feb 7, 2025
@petercad
Copy link
Contributor Author

petercad commented Feb 7, 2025

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable arch_gpu_xe-hpc
disable arch_gpu_xe-lp
disable arch_gpu_xe2-hpg-bmg
disable benchdnn_all
enable benchdnn_matmul

@petercad
Copy link
Contributor Author

petercad commented Feb 7, 2025

make test perf-gpu
set primitive=matmul
disable arch_gpu_xe-hpc
disable arch_gpu_xe-lp
disable arch_gpu_xe2-hpg-bmg
disable arch_gpu_xe2-lpg
disable arch_gpu_xe3-lpg

@petercad petercad force-pushed the petercad/tlb_warmup branch from 090c941 to f700f63 Compare February 8, 2025 00:01
@petercad
Copy link
Contributor Author

petercad commented Feb 8, 2025

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable arch_gpu_xe-hpc
disable arch_gpu_xe-lp
disable arch_gpu_xe2-hpg-bmg
disable benchdnn_all
enable benchdnn_matmul

@petercad
Copy link
Contributor Author

petercad commented Feb 8, 2025

make test perf-gpu
set primitive=matmul
disable arch_gpu_xe-hpc
disable arch_gpu_xe-lp
disable arch_gpu_xe2-hpg-bmg
disable arch_gpu_xe2-lpg
disable arch_gpu_xe3-lpg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant