-
Notifications
You must be signed in to change notification settings - Fork 660
[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol #5132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol #5132
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes two critical bugs in PD (Prefill-Decode) disaggregation when using the V1 KV cache scheduler with IPC protocol:
Key Changes:
- Fixed block_tables mishandling: Decode instances now replace (not extend) block_tables from prefill instances
- Fixed profile run hang: Added
is_profilingflag to skip IPC message queue initialization during memory profiling - Removed obsolete NotImplementedError checks that blocked V1 scheduler usage with IPC protocol
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/test_ernie_03b_pd_router_v1.py | Added E2E test for V1 scheduler with PD disaggregation and IPC protocol |
| fastdeploy/worker/gpu_model_runner.py | Added is_profiling parameter to _prepare_inputs and initialize_forward_meta methods |
| fastdeploy/model_executor/forward_meta.py | Added is_profiling boolean flag to ForwardMeta dataclass |
| fastdeploy/model_executor/layers/attention/mla_attention_backend.py | Skip init_kv_signal_per_query during profiling to prevent message queue hang |
| fastdeploy/model_executor/layers/attention/flash_attn_backend.py | Skip init_kv_signal_per_query during profiling to prevent message queue hang |
| fastdeploy/model_executor/layers/attention/append_attn_backend.py | Skip init_kv_signal_per_query during profiling to prevent message queue hang |
| fastdeploy/engine/sched/resource_manager_v1.py | Changed decode instance to replace block_tables instead of extending them |
| fastdeploy/engine/args_utils.py | Removed NotImplementedError for V1 scheduler with IPC protocol and missing num_gpu_blocks_override |
| fastdeploy/cache_manager/cache_messager.py | Fixed IPC target_id to use device_ids instead of rdma_ports |
| custom_ops/xpu_ops/src/ops/remote_cache_kv_ipc.h | Wrapped send_signal in inited check to prevent sending to uninitialized message queue |
| custom_ops/gpu_ops/remote_cache_kv_ipc.h | Wrapped send_signal in inited check and applied code formatting improvements |
gongshaotian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| metadata.kv_signal_data_list = [None] * self.num_layers | ||
| if self.pd_disaggregation_mode == "per_chunk": | ||
| if not self.keep_pd_step_flag: | ||
| if not self.keep_pd_step_flag and not forward_meta.is_profiling: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_profiling换个名字,profiling一般指的是“对程序的性能进行测量与分析”,换成is_dummy_or_profile_run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
09bc7ef
Motivation
修复以下问题:
问题 1:当 ENABLE_V1_KVCACHE_SCHEDULER=1 时,运行单机 PD 分离,如果数据传输采用 IPC 协议,则服务无法启动;
问题 2:当 ENABLE_V1_KVCACHE_SCHEDULER=1 时,若不手动指定 --num-gpu-blocks-override,则服务无法启动。
Modifications
问题 1:原因为 P 实例请求的 block_tables 被误传到了 D 节点,D 节点为请求分配资源后 block_tables 与 P 节点长度不一致;解决方式为让 D 实例忽略 P 传来请求的 block_tables。
问题 2:原因为 V1 调度下走 per_chunk 模式的 kv 信号通信,profile run 时往消息队列写数据,但没有接收端读数据,导致卡住;解决方式为 profile 模式下禁用消息队列的 init 和 send_signal 操作。
Usage or Command
bash examples/splitwise/start_v1_tp1.shAccuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.