[KVPOOl]Support pp #4761

baxingpiaochong · 2025-12-06T09:15:15Z

What this PR does / why we need it?

Support pp for kv pool

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: baxingpiaochong <[email protected]>

github-actions · 2025-12-06T09:15:23Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for pipeline parallelism (PP) to the KV pool. The changes include updating data structures to be aware of the pipeline rank and modifying the cache lookup logic. However, the implementation of the cache lookup across different pipeline stages in lookup_scheduler is flawed. It does not correctly generate keys for all combinations of tensor and pipeline parallel ranks, and the subsequent result processing is broken. This critical issue will lead to incorrect cache hit detection and potential failures. I have provided a detailed comment with a suggested fix for this logic.

gemini-code-assist · 2025-12-06T09:17:30Z

vllm_ascend/distributed/kvpool/pool_worker.py

+            for i in range(1, self.pp_size):
+                for item in keys:
+                    new_str = item.replace(  # type: ignore[attr-defined]
+                        "@pp_rank:0", f"@pp_rank:{i}", 1)
+                    multi_tp_keys.append(new_str)


The current logic for checking key existence across both tensor (TP) and pipeline parallel (PP) ranks is flawed.

Incomplete key generation: It fails to generate keys for all (TP, PP) rank combinations, only checking for (TP=i, PP=0) and (TP=0, PP=j). This will result in missed cache hits when both TP and PP are greater than 1.

Incorrect result processing: The subsequent result processing logic (lines 573-576) is not updated for pipeline parallelism and will likely fail with an IndexError or produce incorrect results.

The entire key generation and result processing block needs to be refactored. Additionally, the implementation relies on hardcoded rank:0 strings, which is brittle. A more robust solution would replace the current worker's rank in the key string.

Here is a corrected implementation for lines 554-579 that addresses the combination issue, assuming this lookup is always performed from a worker with tp_rank=0 and pp_rank=0:

multi_tp_keys = [] for pp_i in range(self.pp_size): for tp_i in range(min(self.tp_size, self.num_kv_head)): for item in keys: item_with_pp = item.replace("@pp_rank:0", f"@pp_rank:{pp_i}", 1) new_str = item_with_pp.replace("@head_or_tp_rank:0", f"@head_or_tp_rank:{tp_i}", 1) multi_tp_keys.append(new_str) res = self.m_store.exists( multi_tp_keys) # type: ignore[assignment] num_block = len(keys) if use_layerwise: res = self.check_all_layers_exists(res, self.num_layers) num_block = len(keys) // self.num_layers num_ranks = self.pp_size * min(self.tp_size, self.num_kv_head) multi_rank_values = [ res[i * num_block:(i + 1) * num_block] for i in range(num_ranks) ] index = self.find_min_first_non_one_index(multi_rank_values) if index != -1: return starts[index]

LCAIZJ · 2025-12-08T02:41:09Z

When pipeline parallelism (PP) is enabled, self.maybe_wait_for_kv_save() cannot remain in its current position and must be moved back to its previous location. Additionally, we need to synchronize with the ADXL version that includes the fix for the hang issue.

Signed-off-by: baxingpiaochong <[email protected]>

LCAIZJ · 2025-12-09T01:58:02Z

The ADXL-related fixes will be included in 8.5.RC1.

### What this PR does / why we need it? Support pp for kv pool - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: baxingpiaochong <[email protected]>

support pp

80c061f

Signed-off-by: baxingpiaochong <[email protected]>

gemini-code-assist bot reviewed Dec 6, 2025

View reviewed changes

fix

7f0e203

Signed-off-by: baxingpiaochong <[email protected]>

LCAIZJ approved these changes Dec 8, 2025

View reviewed changes

wangxiyuan approved these changes Dec 9, 2025

View reviewed changes

wangxiyuan merged commit dda027e into vllm-project:main Dec 9, 2025
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[KVPOOl]Support pp #4761

[KVPOOl]Support pp #4761

Uh oh!

baxingpiaochong commented Dec 6, 2025 •

edited by wangxiyuan

Loading

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 6, 2025

Uh oh!

LCAIZJ commented Dec 8, 2025

Uh oh!

LCAIZJ commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[KVPOOl]Support pp #4761

[KVPOOl]Support pp #4761

Uh oh!

Conversation

baxingpiaochong commented Dec 6, 2025 • edited by wangxiyuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

LCAIZJ commented Dec 8, 2025

Uh oh!

LCAIZJ commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

baxingpiaochong commented Dec 6, 2025 •

edited by wangxiyuan

Loading