Skip to content

feat(deepseek-v4): device-side implement prefix caching for v4 hybrid cache.#146

Open
SimonCqk wants to merge 1 commit into
lightseekorg:mainfrom
SimonCqk:feat/v4-prefix-cache
Open

feat(deepseek-v4): device-side implement prefix caching for v4 hybrid cache.#146
SimonCqk wants to merge 1 commit into
lightseekorg:mainfrom
SimonCqk:feat/v4-prefix-cache

Conversation

@SimonCqk
Copy link
Copy Markdown
Contributor

@SimonCqk SimonCqk commented May 14, 2026

Summary

Adds device-side prefix cache support for DeepSeek V4 paged-cache groups through HybridPrefixCache.

Core changes:

  • Attach V4 paged-cache snapshots to KV radix-tree nodes for repeated-prefix reuse.
  • Keep paged-cache ownership and lifecycle inside HybridPrefixCache, avoiding V4-specific table management in scheduler hot paths.
  • Split paged-cache request tables into borrowed prefix page ids and owned suffix pages.
  • Match only contiguous LCM-aligned snapshot chains and cap KV reuse to the deepest complete V4 paged-cache prefix.
  • Preserve correctness for both full-history and sliding-window groups, including sliding base offsets.

Test Plan

Evalscope perf multi-turn run:

  • dataset: share_gpt_en_multi_turn
  • model: DeepSeek-V4-Flash
  • achieve 95% cache hit overall

Request Metrics

Conc. Num Avg In Toks P99 In Toks Avg Out Toks P99 Out Toks Avg Turns/Req Approx Cache Hit Decode toks/s
1 21 888.0 4149.0 935.9 2048.0 1.86 95.0% 38.83

Validation run:

  • uvx pre-commit run --all-files
  • git diff --check upstream/main...HEAD
  • C++ -fsyntax-only: hybrid_prefix_cache.cpp, forward.cpp, scheduler.cpp
  • Python py_compile for touched runtime/scheduler files
  • GSM8K limit 200: score 0.99, no regression observed

@SimonCqk SimonCqk requested a review from a team as a code owner May 14, 2026 08:58
@SimonCqk SimonCqk removed the request for review from a team May 14, 2026 08:58
@SimonCqk SimonCqk force-pushed the feat/v4-prefix-cache branch 6 times, most recently from 1ab91ee to 42c223b Compare May 18, 2026 06:35
@SimonCqk SimonCqk changed the title [WIP] feat(deepseek-v4): support prefix cache snapshots [WIP] feat(deepseek-v4): implement prefix caching for v4 hybrid cache on radix-tree node. May 18, 2026
@SimonCqk SimonCqk force-pushed the feat/v4-prefix-cache branch 2 times, most recently from 7425f99 to d590af4 Compare May 18, 2026 07:45
@SimonCqk SimonCqk changed the title [WIP] feat(deepseek-v4): implement prefix caching for v4 hybrid cache on radix-tree node. feat(deepseek-v4): implement prefix caching for v4 hybrid cache on radix-tree node. May 18, 2026
@SimonCqk SimonCqk requested review from a team, XucSh, dongjiyingdjy, tuanzhangCS and zhyncs May 18, 2026 07:46
@SimonCqk SimonCqk changed the title feat(deepseek-v4): implement prefix caching for v4 hybrid cache on radix-tree node. feat(deepseek-v4): device-side implement prefix caching for v4 hybrid cache on radix-tree node. May 18, 2026
… cache on radix-tree node.

Extend HybridPrefixCache to coordinate the KV radix tree with DeepSeek V4's
paged cache groups, enabling history-aligned prefix reuse across SWA /
compressed KV / indexer / compressor-state buffers.

Group page lifecycle uses chunk-boundary deferred ownership transfer: pages
move uniquely between a request's owned suffix and a TreeNode snapshot's
committed segment without page-level refcounting. Required groups split into
two families — history (full-chain, e.g. compressed KV / indexer KV) and
state (trailing window, e.g. SWA / compressor tail). Match validates a
history-aligned contiguous chain, then a state-window check at the deepest
aligned boundary; state-only eviction preserves the history chain, and
state snapshots are trimmed to their live window at commit time.

Python runtime stays vendor-neutral: pools opt into the adjunct by exposing
prefix_cache_required_group_ids; scheduler public API surface is unchanged.

Add C++ scheduler tests covering prefix match, passive eviction, family
split fallback, state trim, and state-only prune; add Python test for V4
prefix cache metadata.

Signed-off-by: SimonCqk <[email protected]>
@SimonCqk SimonCqk force-pushed the feat/v4-prefix-cache branch from d590af4 to 6ff4d74 Compare May 18, 2026 08:15
@SimonCqk SimonCqk changed the title feat(deepseek-v4): device-side implement prefix caching for v4 hybrid cache on radix-tree node. feat(deepseek-v4): device-side implement prefix caching for v4 hybrid cache. May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant