vLLM startup hangs after PageAllocator init on single-GPU world_size=1 with kvcached enabled

# vLLM startup hangs after PageAllocator init on single-GPU world_size=1 with kvcached enabled

## Summary

When kvcached is enabled on the latest `main`, a single-GPU vLLM startup can hang after KV cache / PageAllocator initialization.

The process does not exit, but the API server port never starts listening, so `/health` and `/v1/models` remain unreachable.

This does **not** reproduce when running the same vLLM startup configuration without kvcached.

## Environment

- kvcached: latest `main`
  - commit: `0d4d581dc5b17ea8f07a83a9f9cf5345ba24478b`
- vLLM: `0.18.1`
- GPU: `NVIDIA GeForce RTX 4090`
- OS: `Linux`
- Model: `Qwen3-8B-FP8`
- tensor_parallel_size: `1`
- world_size observed in logs: `1`

## Reproduction Setup

I can reproduce this with a direct single-GPU startup, without relying on any controller readiness logic.

Environment variables:

```bash
ENABLE_KVCACHED=true
KVCACHED_AUTOPATCH=1
KVCACHED_IPC_NAME=kvcached_vllm_GPU2
VLLM_SERVER_DEV_MODE=1
CUDA_VISIBLE_DEVICES=2
```

vLLM startup configuration:

```text
model: /mnt/test/models/Qwen3-8B-FP8
port: 12346
gpu-memory-utilization: 0.49
kv-cache-memory-bytes: 536870912
max-model-len: 6144
kv-cache-dtype: fp8_e4m3
enable-sleep-mode: true
enable-prefix-caching: false
enforce-eager: true
disable-log-stats: true
```

## Actual Behavior

The process stays alive, GPU memory increases to about `10.8 GiB`, but the API server never starts listening.

Observed symptoms:

- `http://127.0.0.1:12346/health` stays connection refused
- `http://127.0.0.1:12346/v1/models` stays connection refused
- the process does not crash
- there is no explicit Python exception in the startup log
- the log consistently stops after PageAllocator initialization

Relevant log tail:

```text
Loading weights took 120.46 seconds
Model loading took 8.8 GiB memory and 121.099412 seconds
GPU KV cache size: 7,280 tokens
Maximum concurrency for 6,144 tokens per request: 1.18x
init engine (profile, create kv cache, warmup model) took 1.01 seconds
Init C++ PageAllocator: num_layers=36, mem_size_per_layer=7MB, total_mem_size=511MB, page_size=2MB, world_size=1, pp_rank=0, async_sched=1, contiguous_layout=1, enable_prealloc=1, num_kv_buffers=2, group_id=0, min_reserved_pages=3, max_reserved_pages=3
```

After that point, there are no further startup logs and the API server never becomes ready.

## Expected Behavior

The server should complete startup and begin serving `/health` and `/v1/models`.

## Important Comparison

Running the same startup configuration without kvcached succeeds.

If I remove:

```bash
ENABLE_KVCACHED=true
KVCACHED_AUTOPATCH=1
KVCACHED_IPC_NAME=kvcached_vllm_GPU2
```

then the server starts successfully and `/v1/models` becomes reachable.

So this does not look like:

- insufficient GPU memory
- a generic vLLM startup failure
- controller-specific readiness logic

It appears to be a kvcached startup-path issue on single-GPU `world_size=1`.

## Additional Notes

- The reproduction does not depend on local benchmark scripts; it is observed on a direct vLLM startup path with kvcached autopatch enabled.
- The problem remains on the latest kvcached `main`, so this is not fixed by pulling current upstream.
- I also tested with kvcached enabled but without `enable-sleep-mode`, and the same hang still reproduces. So sleep mode itself may not be the root cause.
- This may be related to recent allocator-path changes, but I have not yet reduced it to a specific commit.

## Minimal Symptom Classification

- single GPU
- world_size=1
- latest kvcached main
- vLLM 0.18.1
- direct startup, no controller dependency
- hangs after PageAllocator init
- no port listen
- no crash, no explicit exception
- disappears when kvcached is disabled

## Possibly Related Issues

- [#284](https://github.com/ovg-project/kvcached/issues/284) seems related in that it also involves a kvcached + vLLM + tensor_parallel_size=1 path, but the symptom is different.
- [#217](https://github.com/ovg-project/kvcached/issues/217) is a sleep-path failure after startup, not a startup hang.
- [#316](https://github.com/ovg-project/kvcached/issues/316) is about env/autopatch/IPC naming behavior, not this startup stall.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM startup hangs after PageAllocator init on single-GPU world_size=1 with kvcached enabled #334

vLLM startup hangs after PageAllocator init on single-GPU world_size=1 with kvcached enabled

Summary

Environment

Reproduction Setup

Actual Behavior

Expected Behavior

Important Comparison

Additional Notes

Minimal Symptom Classification

Possibly Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

vLLM startup hangs after PageAllocator init on single-GPU world_size=1 with kvcached enabled #334

Description

vLLM startup hangs after PageAllocator init on single-GPU world_size=1 with kvcached enabled

Summary

Environment

Reproduction Setup

Actual Behavior

Expected Behavior

Important Comparison

Additional Notes

Minimal Symptom Classification

Possibly Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions