How can I leverage additional VRAM headroom? #1443

JamesDConley · 2025-07-16T03:14:31Z

JamesDConley
Jul 16, 2025

I've been testing on an RTX Pro 6000 Blackwell and see with the standard Deepseek-V3-Chat.yaml only 16372MiB / 97887MiB is being used.

Can I offload more layers/compute to cuda to speed things up further, and if so what would you recommend offloading first?

Thanks!

fernandaspets · 2025-07-16T03:35:02Z

Sam question I have 7 gpus and trying to use all available vram with Kimi K2 but the optimized .yaml file seems not to use more than first gpu?

0 replies