Skip to content

TP2 vLLM startup requires two kvcached fixes before the server becomes ready #339

@shipiyouniao

Description

@shipiyouniao

Summary

When kvcached is enabled on the latest main, a TP2 vLLM startup can hang before the API server becomes ready.

The process stays alive and model loading proceeds, but the server port never starts listening until two separate TP startup defects are addressed.

This does not reproduce when running the same TP2 startup configuration without kvcached.

Environment

  • kvcached: latest main
  • vLLM: 0.18.1
  • GPU: NVIDIA GeForce RTX 4090
  • OS: Linux
  • Model: Qwen3-8B-FP8
  • tensor_parallel_size: 2
  • kv cache mode: kvcached autopatch enabled

Reproduction Setup

This was reproduced on a direct TP2 startup path without relying on controller readiness logic.

Environment variables:

ENABLE_KVCACHED=true
KVCACHED_AUTOPATCH=1
KVCACHED_IPC_NAME=kvcached_hotfix_probe3
NCCL_CUMEM_HOST_ENABLE=0
VLLM_SERVER_DEV_MODE=1
CUDA_VISIBLE_DEVICES=0,1

vLLM startup configuration:

model: /root/offload-lab/local-models/Qwen3-8B-FP8
port: 19113
gpu-memory-utilization: 0.35
kv-cache-memory-bytes: 536870912
max-model-len: 4096
kv-cache-dtype: fp8_e4m3
enable-sleep-mode: true
enable-prefix-caching: false
disable-log-stats: true
tensor-parallel-size: 2
max-num-seqs: 64

Actual Behavior

With kvcached enabled, the TP2 process stays alive but /health remains unreachable until both of the following defects are fixed:

  1. The coordinator path can initialize kvcached with world_size=1 even though EngineCore already knows tensor_parallel_size=2.
  2. After correcting that world size, TP startup can still hang when the background prealloc thread starts too early and races the first null-block allocation on the multi-process map path.

Observed progression during debugging:

  • bare TP2 without kvcached starts and /health returns 200
  • TP2 + kvcached on latest main hangs before port listen
  • fixing the coordinator world size moves startup past the earlier world_size=1 allocator setup point but does not yet make the server ready
  • fixing the prealloc timing on top of that makes /health return 200 and the server starts normally

Root Cause Breakdown

Root cause 1: coordinator reads TP world size too early

The EngineCore patch records the correct TP size, but the KVCacheCoordinator path can still query vLLM parallel state at a point where it observes 1.

That causes kvcached to initialize its coordinator-side KVCacheManager with the wrong world size even though the run is actually TP2.

Root cause 2: prealloc startup races the first TP null-block allocation

Once the coordinator world size is corrected, startup can still stall in the first real KVCacheManager.alloc(1) used by vLLM's null block.

The verified workaround is to defer starting the background prealloc thread until after that first alloc completes in the TP multi-process path. With that change in place, the same TP2 startup reaches:

  • Starting vLLM server on http://0.0.0.0:19113
  • Application startup complete
  • GET /health -> 200

Expected Behavior

TP2 startup with kvcached enabled should complete normally and begin serving /health and /v1/models without requiring any local workaround.

Important Comparison

Running the same TP2 startup configuration without kvcached succeeds.

So this does not look like:

  • a generic TP2 vLLM startup failure
  • insufficient GPU memory
  • controller-specific readiness logic
  • unsupported TP usage being forced from outside

It is a kvcached TP startup-path issue.

Validation Status

The problem has been reduced to two focused fixes:

  • PR 1: coordinator world size fix
  • PR 2: deferred prealloc startup for the TP multi-process null-block path

Applying both fixes together was validated on the direct TP2 reproduction above and resulted in /health = 200.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions