Skip to content

indexCache#2178

Open
faresobeid wants to merge 4 commits intomainfrom
indexCache
Open

indexCache#2178
faresobeid wants to merge 4 commits intomainfrom
indexCache

Conversation

@faresobeid
Copy link
Copy Markdown
Contributor

@faresobeid faresobeid commented Apr 1, 2026

Add support for Index Cache inference and training
Can use with

[model]
index_topk_freq = 4

Note

Medium Risk
Touches model config propagation and monkey-patches vLLM/DeepSeek attention internals to reuse sparse top-k indices across layers, which could impact correctness/perf if assumptions change upstream. Also changes GLM MoE DSA attention forward signatures/returns to thread cached indices through the model.

Overview
Adds configurable Index Cache support via a new index_topk_freq model setting, allowing runtime overrides from config without editing HF model directories.

Plumbs index_topk_freq through shared RL config, trainer model loading (overriding AutoConfig), and inference startup (exporting PRIME_RL_INDEX_TOPK_FREQ). In inference, introduces a vLLM monkeypatch that conditionally skips top-k recomputation on non-frequency layers by caching and reusing indices across DeepSeek/MLA layers.

Extends the custom glm_moe_dsa model to support the same cross-layer reuse: adds index_topk_freq to its config, marks layers to skip top-k, and threads cached_topk_indices through attention/layer/model forwards while returning the computed indices for reuse.

Written by Cursor Bugbot for commit 3191c5b. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Unguarded list access crashes on empty regex match
    • Added a guard to return early when the regex match list is empty before indexing.
  • ✅ Fixed: Indexer receives None for q_c in else branch
    • When q_lora_rank is None, pass hidden_states as a fallback q_c to the indexer.

Create PR

Or push these changes by commenting:

@cursor push 108af11ad1
Preview (108af11ad1)
diff --git a/src/prime_rl/inference/patches.py b/src/prime_rl/inference/patches.py
--- a/src/prime_rl/inference/patches.py
+++ b/src/prime_rl/inference/patches.py
@@ -28,7 +28,10 @@
     if num_hidden_layers is None:
         return False
 
-    layer_idx = int(_LAYER_INDEX_RE.findall(prefix)[-1])
+    matches = _LAYER_INDEX_RE.findall(prefix)
+    if not matches:
+        return False
+    layer_idx = int(matches[-1])
     if layer_idx >= num_hidden_layers:
         return False
 
@@ -97,7 +100,7 @@
                 raise ValueError("IndexCache shared layers require cached top-k indices.")
             topk_indices = prev_topk_indices
         else:
-            topk_indices = self.indexer(hidden_states, q_c, positions, self.indexer_rope_emb)
+            topk_indices = self.indexer(hidden_states, q_c if q_c is not None else hidden_states, positions, self.indexer_rope_emb)
 
         if llama_4_scaling is not None:
             q *= llama_4_scaling

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Field(
description="Override the loaded Hugging Face config's `index_topk_freq` for trainer and inference without editing the model directory.",
),
] = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing CHANGELOG entry for new config field

Low Severity

This PR adds a new index_topk_freq config field to both SharedModelConfig in configs/rl.py and BaseModelConfig in configs/shared.py, but CHANGELOG.md has no corresponding entry. Per project rules, any PR that modifies configuration structures (added, removed, renamed, moved, or default value changes) must update the changelog.

Additional Locations (1)
Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant