Skip to content

vLLM startup hangs after PageAllocator init on single-GPU world_size=1 with kvcached enabled #334

@shipiyouniao

Description

@shipiyouniao

vLLM startup hangs after PageAllocator init on single-GPU world_size=1 with kvcached enabled

Summary

When kvcached is enabled on the latest main, a single-GPU vLLM startup can hang after KV cache / PageAllocator initialization.

The process does not exit, but the API server port never starts listening, so /health and /v1/models remain unreachable.

This does not reproduce when running the same vLLM startup configuration without kvcached.

Environment

  • kvcached: latest main
    • commit: 0d4d581dc5b17ea8f07a83a9f9cf5345ba24478b
  • vLLM: 0.18.1
  • GPU: NVIDIA GeForce RTX 4090
  • OS: Linux
  • Model: Qwen3-8B-FP8
  • tensor_parallel_size: 1
  • world_size observed in logs: 1

Reproduction Setup

I can reproduce this with a direct single-GPU startup, without relying on any controller readiness logic.

Environment variables:

ENABLE_KVCACHED=true
KVCACHED_AUTOPATCH=1
KVCACHED_IPC_NAME=kvcached_vllm_GPU2
VLLM_SERVER_DEV_MODE=1
CUDA_VISIBLE_DEVICES=2

vLLM startup configuration:

model: /mnt/test/models/Qwen3-8B-FP8
port: 12346
gpu-memory-utilization: 0.49
kv-cache-memory-bytes: 536870912
max-model-len: 6144
kv-cache-dtype: fp8_e4m3
enable-sleep-mode: true
enable-prefix-caching: false
enforce-eager: true
disable-log-stats: true

Actual Behavior

The process stays alive, GPU memory increases to about 10.8 GiB, but the API server never starts listening.

Observed symptoms:

  • http://127.0.0.1:12346/health stays connection refused
  • http://127.0.0.1:12346/v1/models stays connection refused
  • the process does not crash
  • there is no explicit Python exception in the startup log
  • the log consistently stops after PageAllocator initialization

Relevant log tail:

Loading weights took 120.46 seconds
Model loading took 8.8 GiB memory and 121.099412 seconds
GPU KV cache size: 7,280 tokens
Maximum concurrency for 6,144 tokens per request: 1.18x
init engine (profile, create kv cache, warmup model) took 1.01 seconds
Init C++ PageAllocator: num_layers=36, mem_size_per_layer=7MB, total_mem_size=511MB, page_size=2MB, world_size=1, pp_rank=0, async_sched=1, contiguous_layout=1, enable_prealloc=1, num_kv_buffers=2, group_id=0, min_reserved_pages=3, max_reserved_pages=3

After that point, there are no further startup logs and the API server never becomes ready.

Expected Behavior

The server should complete startup and begin serving /health and /v1/models.

Important Comparison

Running the same startup configuration without kvcached succeeds.

If I remove:

ENABLE_KVCACHED=true
KVCACHED_AUTOPATCH=1
KVCACHED_IPC_NAME=kvcached_vllm_GPU2

then the server starts successfully and /v1/models becomes reachable.

So this does not look like:

  • insufficient GPU memory
  • a generic vLLM startup failure
  • controller-specific readiness logic

It appears to be a kvcached startup-path issue on single-GPU world_size=1.

Additional Notes

  • The reproduction does not depend on local benchmark scripts; it is observed on a direct vLLM startup path with kvcached autopatch enabled.
  • The problem remains on the latest kvcached main, so this is not fixed by pulling current upstream.
  • I also tested with kvcached enabled but without enable-sleep-mode, and the same hang still reproduces. So sleep mode itself may not be the root cause.
  • This may be related to recent allocator-path changes, but I have not yet reduced it to a specific commit.

Minimal Symptom Classification

  • single GPU
  • world_size=1
  • latest kvcached main
  • vLLM 0.18.1
  • direct startup, no controller dependency
  • hangs after PageAllocator init
  • no port listen
  • no crash, no explicit exception
  • disappears when kvcached is disabled

Possibly Related Issues

  • #284 seems related in that it also involves a kvcached + vLLM + tensor_parallel_size=1 path, but the symptom is different.
  • #217 is a sleep-path failure after startup, not a startup hang.
  • #316 is about env/autopatch/IPC naming behavior, not this startup stall.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions