I am using the qwen_image_edit_plus_2511 workflow in ComfyUI with ai-toolkit-inference.
When running Qwen-Image-Edit-2511 with a local LoRA, the VRAM usage becomes extremely high and almost fills my 32GB GPU memory.
In my case, VRAM usage can reach around 30GB+ / 32GB, and the generation becomes very slow or may run into CUDA out-of-memory issues.
Environment
- OS: Windows
- App: ComfyUI Desktop App
- Custom node:
ai-toolkit-inference
- Pipeline:
qwen_image_edit_plus_2511
- Model:
Qwen/Qwen-Image-Edit-2511
- GPU: NVIDIA RTX 5090, 32GB VRAM
- LoRA: local
.safetensors LoRA
- Offload mode:
model
Steps to reproduce
- Install
ai-toolkit-inference in ComfyUI.
- Load the Qwen Image Edit Plus 2511 workflow.
- Load a local LoRA.
- Run image generation with Qwen-Image-Edit-2511.
- Observe GPU VRAM usage during model loading and generation.
Expected behavior
The workflow should be able to run more reliably on a 32GB VRAM GPU, ideally with lower peak VRAM usage or a recommended low-VRAM configuration.
Actual behavior
The workflow consumes almost all available VRAM on a 32GB GPU.
VRAM usage can reach around 30GB+ / 32GB, leaving very little margin. The process may become extremely slow or fail with CUDA out-of-memory.
Additional notes
I would like to know whether this VRAM usage is expected for Qwen-Image-Edit-2511, or if there are recommended settings to reduce memory usage, such as:
- lower-memory loading mode
- stronger CPU offload
- sequential CPU offload
- quantized loading
- lower precision loading
- lower resolution / steps
- LoRA loading method changes
- any recommended configuration for 24GB / 32GB GPUs
I am using the
qwen_image_edit_plus_2511workflow in ComfyUI withai-toolkit-inference.When running Qwen-Image-Edit-2511 with a local LoRA, the VRAM usage becomes extremely high and almost fills my 32GB GPU memory.
In my case, VRAM usage can reach around 30GB+ / 32GB, and the generation becomes very slow or may run into CUDA out-of-memory issues.
Environment
ai-toolkit-inferenceqwen_image_edit_plus_2511Qwen/Qwen-Image-Edit-2511.safetensorsLoRAmodelSteps to reproduce
ai-toolkit-inferencein ComfyUI.Expected behavior
The workflow should be able to run more reliably on a 32GB VRAM GPU, ideally with lower peak VRAM usage or a recommended low-VRAM configuration.
Actual behavior
The workflow consumes almost all available VRAM on a 32GB GPU.
VRAM usage can reach around 30GB+ / 32GB, leaving very little margin. The process may become extremely slow or fail with CUDA out-of-memory.
Additional notes
I would like to know whether this VRAM usage is expected for Qwen-Image-Edit-2511, or if there are recommended settings to reduce memory usage, such as: