Can ROCm allow GPUs to enter deep idle while keeping model weights in VRAM? (Inference use-case) #1877

boecks · 2025-11-16T19:15:35Z

boecks
Nov 16, 2025

Hi!

I’m experimenting with ROCm for LLM inference (llama.cpp HIP backend, running inside an unprivileged LXC/container on Linux). Everything works well overall, and my Radeon AI Pro R9700 can enter proper runtime suspend (runtime_status = suspended) when no ROCm backend is active.

However, I noticed that when a ROCm compute context is alive and VRAM allocations exist (for example, when an LLM model is loaded into VRAM), the GPU does not enter deep idle:

Memory controller stays active
PCIe link remains in a higher power state
Idle power is significantly higher (≈60–100W)
Deep runtime suspend only occurs if the ROCm backend is fully unloaded and VRAM freed

This leads to a trade-off:

Keep the model in VRAM → fast inference start, but high idle power
Unload the model → low idle power, but slow first-token latency due to reloading multi-GB weights

This raises a few questions:

1. Is it currently possible in ROCm/AMDGPU for the GPU to enter deep idle while VRAM allocations remain resident?

In other words:

Can ROCm park/suspend compute queues while keeping model weights in VRAM, allowing the GPU to drop into a low-power state without unloading everything?

2. If not, is support for VRAM-resident idle states planned for future ROCm or AMDGPU driver releases?

This would be very useful for LLM inference workloads, which are typically “bursty”:

short compute burst
several minutes of idle
another short burst
repeat

NVIDIA GPUs handle a similar pattern with context parking + memory self-refresh, allowing VRAM to stay allocated while the GPU enters a deep idle state.

3. Are there architectural or driver limitations on RDNA/ROCm that prevent this today?

Understanding whether this is a hardware, firmware, or runtime limitation would help clarify expectations for future optimizations.

Thanks for your time — I’m happy to share logs or additional information from my setup if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can ROCm allow GPUs to enter deep idle while keeping model weights in VRAM? (Inference use-case) #1877

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Can ROCm allow GPUs to enter deep idle while keeping model weights in VRAM? (Inference use-case) #1877

Uh oh!

boecks Nov 16, 2025

1. Is it currently possible in ROCm/AMDGPU for the GPU to enter deep idle while VRAM allocations remain resident?

2. If not, is support for VRAM-resident idle states planned for future ROCm or AMDGPU driver releases?

3. Are there architectural or driver limitations on RDNA/ROCm that prevent this today?

Replies: 0 comments

boecks
Nov 16, 2025