How can I leverage additional VRAM headroom? #1443
Unanswered
JamesDConley
asked this question in
Q&A
Replies: 1 comment
-
Sam question I have 7 gpus and trying to use all available vram with Kimi K2 but the optimized .yaml file seems not to use more than first gpu? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've been testing on an RTX Pro 6000 Blackwell and see with the standard Deepseek-V3-Chat.yaml only
16372MiB / 97887MiB
is being used.Can I offload more layers/compute to cuda to speed things up further, and if so what would you recommend offloading first?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions