Implications of recently announced SOCs for llama.cpp performance #16250

Dampfinchen · 2025-09-25T12:53:31Z

Dampfinchen
Sep 25, 2025

Hello,

Qualcomm and Mediatek have announced new SOCs, and I think the Mediatek is the most interesting one.

It allows for bitnet hardware acceleration.
It now integrates an NPU with compute in memory.
SME2 support will allow for faster CPU processing.

Qualcomm also made some changes, although they are more conservative.

-> They support SME1 but not SME2.
-> Instead of implementing ML accelerators into the GPU, they implemented direct access, bypassing memory, from GPU to NPU.
-> NPU supports FP8, and the other standard formats like INT2, INT4, INT8, FP16 and FP32. So theoretically it doesn't need extra formats now and can be developed for more easily, no? The older NPUs also did support many of these formats, but did not receive support in llama.cpp. Is it more of a documentation and SDK issue then?

Apple has also made significant changes with their architectures, implementing ML accelerators in the GPU cores similar to tensor cores which should provide a huge and much needed boost in prompt processing. I'm assuming we will see SME2 and ML accelerators inside the GPU with upcoming M5 products as well.

Let's discuss which approaches make the most sense and how performance could change in the upcoming months. What do you guys think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implications of recently announced SOCs for llama.cpp performance #16250

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Implications of recently announced SOCs for llama.cpp performance #16250

Uh oh!

Uh oh!

Dampfinchen Sep 25, 2025

Replies: 0 comments

Dampfinchen
Sep 25, 2025