Skip to content

fix: cache worker IPC state for prealloc thread#336

Open
shipiyouniao wants to merge 1 commit into
ovg-project:mainfrom
shipiyouniao:fix/single-gpu-startup-hang
Open

fix: cache worker IPC state for prealloc thread#336
shipiyouniao wants to merge 1 commit into
ovg-project:mainfrom
shipiyouniao:fix/single-gpu-startup-hang

Conversation

@shipiyouniao
Copy link
Copy Markdown

@shipiyouniao shipiyouniao commented May 19, 2026

Summary

Fix single-GPU startup hangs by caching the worker-IPC decision for the C++ background prealloc thread instead of calling back into Python from that thread.

Related issue: #334

Root cause

On world_size=1, the background prealloc thread enters PageAllocator::map_pages() during startup and evaluates should_use_worker_ipc().

That check was implemented as a Python callback. The prealloc thread therefore tried to re-enter Python while the main Python thread was still inside extension code and holding the GIL during startup. The background thread stalled before the first KV page mapping call, preallocation never completed, and the server never became ready.

The previous workaround avoided the hang by skipping the background prealloc thread for world_size=1, but it did not fix the underlying cause.

Changes

  • Cache the should_use_worker_ipc() result when the callback is installed from the Python-owned thread.
  • Make the C++ prealloc thread read that cached value instead of calling back into Python.
  • Keep the existing callback path for non-prealloc-thread callers.
  • Remove the Python-side single-GPU skip workaround so world_size=1 follows the normal prealloc flow again.

Why this approach

This keeps the behavior change narrowly scoped to the background prealloc thread, which is the only place that needs to avoid Python callback re-entry during startup. It restores the intended preallocation behavior on single-GPU startup instead of bypassing it.

Validation

Validated against the single-GPU reproduction described in issue #334:

  • vLLM 0.18.1
  • world_size=1
  • kvcached enabled
  • async scheduling enabled
  • eager mode enabled

Observed after the fix:

  • Background prealloc runs on single-GPU startup
  • Log shows Mapped 3 pages to KV tensors
  • Log shows Preallocated 3 pages, reserved=3
  • GET /health returns 200
  • GET /v1/models returns 200
  • Real POST /v1/chat/completions returns 200

Notes

This replaces the earlier workaround commit that skipped background preallocation for world_size=1. The new change fixes the verified deadlock mechanism directly.

Copilot AI review requested due to automatic review settings May 19, 2026 13:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Prevents a known deadlock during single-GPU startup by conditionally disabling the C++ background preallocation thread when world_size == 1.

Changes:

  • Gate start_prealloc_thread() behind world_size > 1
  • Add inline explanation describing the deadlock scenario
  • Add an info log when preallocation is skipped for single-GPU runs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread kvcached/kv_cache_manager.py Outdated
Comment thread kvcached/kv_cache_manager.py Outdated
Comment thread kvcached/kv_cache_manager.py Outdated
@shipiyouniao shipiyouniao force-pushed the fix/single-gpu-startup-hang branch from bcf4d13 to 10682ec Compare May 19, 2026 13:19
@shipiyouniao shipiyouniao requested a review from Copilot May 19, 2026 13:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread kvcached/kv_cache_manager.py Outdated
Comment thread kvcached/kv_cache_manager.py Outdated
@shipiyouniao shipiyouniao force-pushed the fix/single-gpu-startup-hang branch from 10682ec to a48f046 Compare May 20, 2026 03:42
@shipiyouniao shipiyouniao changed the title fix: skip background prealloc for world_size=1 fix: cache worker IPC state for prealloc thread May 20, 2026
@shipiyouniao shipiyouniao force-pushed the fix/single-gpu-startup-hang branch 2 times, most recently from a481bb1 to 7de21c7 Compare May 20, 2026 03:55
@shipiyouniao shipiyouniao requested a review from Copilot May 20, 2026 03:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread csrc/page_allocator.cpp Outdated
Comment thread csrc/page_allocator.cpp Outdated
Comment thread csrc/page_allocator.cpp Outdated
@shipiyouniao shipiyouniao force-pushed the fix/single-gpu-startup-hang branch from 7de21c7 to f51e3ab Compare May 20, 2026 05:29
@shipiyouniao shipiyouniao requested a review from Copilot May 20, 2026 05:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Comment thread csrc/page_allocator.cpp
Comment thread csrc/page_allocator.cpp Outdated
Comment thread csrc/page_allocator.cpp Outdated
Comment thread csrc/page_allocator.cpp
Avoid calling the Python worker IPC callback from the C++ prealloc thread. On single-GPU startup that callback can block on the GIL before the first page mapping call, which hangs background preallocation and keeps the server from becoming ready.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants