fix: reuse recorded tp world size in coordinator#340
Open
shipiyouniao wants to merge 2 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates the vLLM integration patch so KVCacheCoordinator uses the tensor-parallel (TP) world size captured during EngineCore initialization, and adds a regression test to verify the behavior.
Changes:
- Switch TP world size detection from
vllm.distributed.parallel_statetokvcached.integration.vllm.interfaces._world_size. - Add a new unit test that verifies
init_kvcached(world_size=...)uses the EngineCore-recorded world size. - Add supporting module stubs/mocks for
torch,vllm, andkvcachedintegration imports in the new test.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tests/test_tp_world_size_patch.py | Adds a regression test ensuring coordinator init uses the EngineCore-recorded TP world size. |
| kvcached/integration/vllm/patches.py | Uses interfaces._world_size for TP size during coordinator setup to avoid early-startup world_size=1 observations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix the TP2 coordinator startup path by reusing the TP world size already recorded during EngineCore initialization instead of querying vLLM parallel state too early.
Related issue: #339
Root cause
In TP startup,
EngineCorealready knows the correcttensor_parallel_size, and kvcached records that value ininterfaces._world_size.But
KVCacheCoordinatorwas still callingget_tensor_model_parallel_world_size()at a point where vLLM could still report1.That caused the coordinator-side
KVCacheManagerto be initialized withworld_size=1during a TP2 run, which is incorrect for the worker IPC / allocator setup that follows.Changes
KVCacheCoordinatorPatchkvcached.integration.vllm.interfaces._world_size, which was already recorded during EngineCore init1but the recorded EngineCore TP size is2Why this approach
The coordinator should not rediscover TP size from a startup-time API that is not yet stable for this path.
The EngineCore patch already computes and records the correct value earlier in the same startup sequence, so reusing that value is both narrower and more reliable.
Validation
python -m pytest tests/test_tp_world_size_patch.py -qworld_size=1toworld_size=2after this fixNotes
This is intentionally split from the prealloc timing fix so the two TP startup defects can be reviewed independently.