whisper: validate get_rows support for cpu extra buffer #3323

chaxu01 · 2025-07-14T11:37:29Z

This patch enables KleidiAI to accelerate the Q4_0 matmul operation when it shares weight data with the get_rows operator.

ggerganov · 2025-07-14T12:14:02Z

Could you share some sample numbers of the performance before and after this change?

chaxu01 · 2025-07-14T12:34:00Z

Here's benchmark Comparison (GET_ROWS vs. Baseline) for Pixel 8:

Metric	GET_ROWS	Baseline	% Difference
Load time (ms)	82.16	61.14	25.58%
Encode time (ms)	856.86	841.23	1.82%
Decode time (ms)	401.13	423.74	-5.64%
Batch decode time (ms)	303.76	370.42	-21.94%
Prompt time (ms)	2838.28	3711.67	-30.77%
Total time (ms)	4401.69	5350.61	-21.56%

chaxu01 · 2025-07-14T12:45:22Z

This PR has a dependency on llama.cpp PR #14676, which introduces KleidiAI support for the get_rows operator.

* ggerganov/master: (89 commits) whisper: validate get_rows support for cpu extra buffer (ggml-org#3323) examples : update links in wasm examples (ggml-org#3318) sync : resolve conflicts (#0) talk-llama : sync llama.cpp sync : ggml sync : resolve conflicts (ggml/0) vulkan: support SET_ROWS (llama/14587) vulkan: optimizations for deepseek prompt processing (llama/14555) model : support LiquidAI LFM2 hybrid family (llama/14620) HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (llama/14634) opencl: add tiled mul_mat_f16_f32 (llama/14535) opencl: add `set_rows` for `f16` and `f32` (llama/14547) SYCL: Initial set_rows kernel implementation (llama/14562) cuda : support Falcon-H1 state size for SSM_SCAN (llama/14602) ggml : add ggml_scale_bias (llama/14417) ggml : prevent integer overflow in gguf tensor size calculation (llama/14595) vulkan: optimize flash attention split_k_reduce (llama/14554) vulkan : fix rope with partial rotation and non-cont src (llama/14582) cuda : fix rope with partial rotation and non-cont src (llama/14580) CUDA: add bilinear interpolation for upscale (llama/14563) ...

whisper: validate get_rows support for cpu extra buffer

1f0ff47

ggerganov approved these changes Jul 14, 2025

View reviewed changes

ggerganov merged commit 032697b into ggml-org:master Jul 14, 2025
53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

whisper: validate get_rows support for cpu extra buffer #3323

whisper: validate get_rows support for cpu extra buffer #3323

Uh oh!

chaxu01 commented Jul 14, 2025

Uh oh!

Uh oh!

ggerganov commented Jul 14, 2025

Uh oh!

chaxu01 commented Jul 14, 2025

Uh oh!

chaxu01 commented Jul 14, 2025

Uh oh!

Uh oh!

whisper: validate get_rows support for cpu extra buffer #3323

whisper: validate get_rows support for cpu extra buffer #3323

Uh oh!

Conversation

chaxu01 commented Jul 14, 2025

Uh oh!

Uh oh!

ggerganov commented Jul 14, 2025

Uh oh!

chaxu01 commented Jul 14, 2025

Uh oh!

chaxu01 commented Jul 14, 2025

Uh oh!

Uh oh!