feat(deepseek-v4): device-side implement prefix caching for v4 hybrid cache.#146
Open
SimonCqk wants to merge 1 commit into
Open
feat(deepseek-v4): device-side implement prefix caching for v4 hybrid cache.#146SimonCqk wants to merge 1 commit into
SimonCqk wants to merge 1 commit into
Conversation
1ab91ee to
42c223b
Compare
7425f99 to
d590af4
Compare
… cache on radix-tree node. Extend HybridPrefixCache to coordinate the KV radix tree with DeepSeek V4's paged cache groups, enabling history-aligned prefix reuse across SWA / compressed KV / indexer / compressor-state buffers. Group page lifecycle uses chunk-boundary deferred ownership transfer: pages move uniquely between a request's owned suffix and a TreeNode snapshot's committed segment without page-level refcounting. Required groups split into two families — history (full-chain, e.g. compressed KV / indexer KV) and state (trailing window, e.g. SWA / compressor tail). Match validates a history-aligned contiguous chain, then a state-window check at the deepest aligned boundary; state-only eviction preserves the history chain, and state snapshots are trimmed to their live window at commit time. Python runtime stays vendor-neutral: pools opt into the adjunct by exposing prefix_cache_required_group_ids; scheduler public API surface is unchanged. Add C++ scheduler tests covering prefix match, passive eviction, family split fallback, state trim, and state-only prune; add Python test for V4 prefix cache metadata. Signed-off-by: SimonCqk <[email protected]>
d590af4 to
6ff4d74
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds device-side prefix cache support for DeepSeek V4 paged-cache groups through
HybridPrefixCache.Core changes:
HybridPrefixCache, avoiding V4-specific table management in scheduler hot paths.Test Plan
Evalscope perf multi-turn run:
Request Metrics
Validation run:
uvx pre-commit run --all-filesgit diff --check upstream/main...HEAD-fsyntax-only:hybrid_prefix_cache.cpp,forward.cpp,scheduler.cpppy_compilefor touched runtime/scheduler files0.99, no regression observed