Implications of recently announced SOCs for llama.cpp performance #16250
Dampfinchen
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Qualcomm and Mediatek have announced new SOCs, and I think the Mediatek is the most interesting one.
Qualcomm also made some changes, although they are more conservative.
-> They support SME1 but not SME2.
-> Instead of implementing ML accelerators into the GPU, they implemented direct access, bypassing memory, from GPU to NPU.
-> NPU supports FP8, and the other standard formats like INT2, INT4, INT8, FP16 and FP32. So theoretically it doesn't need extra formats now and can be developed for more easily, no? The older NPUs also did support many of these formats, but did not receive support in llama.cpp. Is it more of a documentation and SDK issue then?
Apple has also made significant changes with their architectures, implementing ML accelerators in the GPU cores similar to tensor cores which should provide a huge and much needed boost in prompt processing. I'm assuming we will see SME2 and ML accelerators inside the GPU with upcoming M5 products as well.
Let's discuss which approaches make the most sense and how performance could change in the upcoming months. What do you guys think?
Beta Was this translation helpful? Give feedback.
All reactions