vLLM startup hangs after PageAllocator init on single-GPU world_size=1 with kvcached enabled
Summary
When kvcached is enabled on the latest main, a single-GPU vLLM startup can hang after KV cache / PageAllocator initialization.
The process does not exit, but the API server port never starts listening, so /health and /v1/models remain unreachable.
This does not reproduce when running the same vLLM startup configuration without kvcached.
Environment
- kvcached: latest
main
- commit:
0d4d581dc5b17ea8f07a83a9f9cf5345ba24478b
- vLLM:
0.18.1
- GPU:
NVIDIA GeForce RTX 4090
- OS:
Linux
- Model:
Qwen3-8B-FP8
- tensor_parallel_size:
1
- world_size observed in logs:
1
Reproduction Setup
I can reproduce this with a direct single-GPU startup, without relying on any controller readiness logic.
Environment variables:
ENABLE_KVCACHED=true
KVCACHED_AUTOPATCH=1
KVCACHED_IPC_NAME=kvcached_vllm_GPU2
VLLM_SERVER_DEV_MODE=1
CUDA_VISIBLE_DEVICES=2
vLLM startup configuration:
model: /mnt/test/models/Qwen3-8B-FP8
port: 12346
gpu-memory-utilization: 0.49
kv-cache-memory-bytes: 536870912
max-model-len: 6144
kv-cache-dtype: fp8_e4m3
enable-sleep-mode: true
enable-prefix-caching: false
enforce-eager: true
disable-log-stats: true
Actual Behavior
The process stays alive, GPU memory increases to about 10.8 GiB, but the API server never starts listening.
Observed symptoms:
http://127.0.0.1:12346/health stays connection refused
http://127.0.0.1:12346/v1/models stays connection refused
- the process does not crash
- there is no explicit Python exception in the startup log
- the log consistently stops after PageAllocator initialization
Relevant log tail:
Loading weights took 120.46 seconds
Model loading took 8.8 GiB memory and 121.099412 seconds
GPU KV cache size: 7,280 tokens
Maximum concurrency for 6,144 tokens per request: 1.18x
init engine (profile, create kv cache, warmup model) took 1.01 seconds
Init C++ PageAllocator: num_layers=36, mem_size_per_layer=7MB, total_mem_size=511MB, page_size=2MB, world_size=1, pp_rank=0, async_sched=1, contiguous_layout=1, enable_prealloc=1, num_kv_buffers=2, group_id=0, min_reserved_pages=3, max_reserved_pages=3
After that point, there are no further startup logs and the API server never becomes ready.
Expected Behavior
The server should complete startup and begin serving /health and /v1/models.
Important Comparison
Running the same startup configuration without kvcached succeeds.
If I remove:
ENABLE_KVCACHED=true
KVCACHED_AUTOPATCH=1
KVCACHED_IPC_NAME=kvcached_vllm_GPU2
then the server starts successfully and /v1/models becomes reachable.
So this does not look like:
- insufficient GPU memory
- a generic vLLM startup failure
- controller-specific readiness logic
It appears to be a kvcached startup-path issue on single-GPU world_size=1.
Additional Notes
- The reproduction does not depend on local benchmark scripts; it is observed on a direct vLLM startup path with kvcached autopatch enabled.
- The problem remains on the latest kvcached
main, so this is not fixed by pulling current upstream.
- I also tested with kvcached enabled but without
enable-sleep-mode, and the same hang still reproduces. So sleep mode itself may not be the root cause.
- This may be related to recent allocator-path changes, but I have not yet reduced it to a specific commit.
Minimal Symptom Classification
- single GPU
- world_size=1
- latest kvcached main
- vLLM 0.18.1
- direct startup, no controller dependency
- hangs after PageAllocator init
- no port listen
- no crash, no explicit exception
- disappears when kvcached is disabled
Possibly Related Issues
- #284 seems related in that it also involves a kvcached + vLLM + tensor_parallel_size=1 path, but the symptom is different.
- #217 is a sleep-path failure after startup, not a startup hang.
- #316 is about env/autopatch/IPC naming behavior, not this startup stall.
vLLM startup hangs after PageAllocator init on single-GPU world_size=1 with kvcached enabled
Summary
When kvcached is enabled on the latest
main, a single-GPU vLLM startup can hang after KV cache / PageAllocator initialization.The process does not exit, but the API server port never starts listening, so
/healthand/v1/modelsremain unreachable.This does not reproduce when running the same vLLM startup configuration without kvcached.
Environment
main0d4d581dc5b17ea8f07a83a9f9cf5345ba24478b0.18.1NVIDIA GeForce RTX 4090LinuxQwen3-8B-FP811Reproduction Setup
I can reproduce this with a direct single-GPU startup, without relying on any controller readiness logic.
Environment variables:
vLLM startup configuration:
Actual Behavior
The process stays alive, GPU memory increases to about
10.8 GiB, but the API server never starts listening.Observed symptoms:
http://127.0.0.1:12346/healthstays connection refusedhttp://127.0.0.1:12346/v1/modelsstays connection refusedRelevant log tail:
After that point, there are no further startup logs and the API server never becomes ready.
Expected Behavior
The server should complete startup and begin serving
/healthand/v1/models.Important Comparison
Running the same startup configuration without kvcached succeeds.
If I remove:
then the server starts successfully and
/v1/modelsbecomes reachable.So this does not look like:
It appears to be a kvcached startup-path issue on single-GPU
world_size=1.Additional Notes
main, so this is not fixed by pulling current upstream.enable-sleep-mode, and the same hang still reproduces. So sleep mode itself may not be the root cause.Minimal Symptom Classification
Possibly Related Issues