Skip to content

fix: defer tp prealloc until first alloc#341

Open
shipiyouniao wants to merge 2 commits into
ovg-project:mainfrom
shipiyouniao:fix/tp2-kvcached-startup
Open

fix: defer tp prealloc until first alloc#341
shipiyouniao wants to merge 2 commits into
ovg-project:mainfrom
shipiyouniao:fix/tp2-kvcached-startup

Conversation

@shipiyouniao
Copy link
Copy Markdown

Summary

Defer the background prealloc thread until after the first null-block allocation completes in the TP multi-process startup path.

Related issue: #339

Root cause

After the TP coordinator world size is corrected, vLLM still performs one early KVCacheManager.alloc(1) to reserve its real null block.

In the TP multi-process path, starting the background prealloc thread before that first alloc finishes can send both flows through the same multi-process page-map path during startup.

That race can leave the server alive but never ready, even though worker IPC sockets and direct map/unmap operations are otherwise healthy.

Changes

  • Detect the multi-process TP/worker-IPC startup path where vLLM will reserve the null block from its first normal alloc
  • Defer start_prealloc_thread() until after that first alloc completes
  • Keep eager prealloc startup for the existing single-process / reserve-null-block flows
  • Add a regression test that verifies multi-process startup defers prealloc until the first alloc, while single-process startup keeps eager prealloc

Why this approach

This is narrower than globally disabling preallocation and matches the verified behavior from the TP2 reproduction.

The goal is not to remove preallocation, but to prevent it from racing the first null-block allocation in the one startup path where that ordering matters.

Validation

  • Added unit test: python -m pytest tests/test_tp_prealloc_startup.py -q
  • Remote TP2 reproduction with both fixes applied reached:
    • Starting vLLM server on http://0.0.0.0:19113
    • Application startup complete
    • GET /health -> 200

Notes

This PR is intentionally split from the coordinator world_size fix. The TP2 startup issue turned out to be a two-defect chain, and this PR addresses the second one.

Copilot AI review requested due to automatic review settings May 21, 2026 10:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Defers starting the background page preallocation thread in multi-process (TP/IPC) setups until the first alloc() call to avoid a startup race that can stall TP initialization.

Changes:

  • Add logic in KVCacheManager to optionally defer prealloc-thread startup and introduce _start_prealloc_thread() guard.
  • Start the prealloc thread on first allocation when deferral is enabled, while keeping eager behavior for single-process.
  • Add unit tests covering deferred vs eager prealloc startup behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
kvcached/kv_cache_manager.py Adds deferred prealloc startup behavior and a guarded helper to start the prealloc thread.
tests/test_tp_prealloc_startup.py Introduces tests verifying deferred prealloc in multi-process and eager prealloc in single-process.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread kvcached/kv_cache_manager.py
Comment thread tests/test_tp_prealloc_startup.py Outdated
@shipiyouniao shipiyouniao force-pushed the fix/tp2-kvcached-startup branch from 2c73eba to 476b679 Compare May 21, 2026 19:05
@shipiyouniao shipiyouniao force-pushed the fix/tp2-kvcached-startup branch from 476b679 to fabc1ce Compare May 22, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants