indexCache by faresobeid · Pull Request #2178 · PrimeIntellect-ai/prime-rl

faresobeid · 2026-04-01T22:33:28Z

Add support for Index Cache inference and training
Can use with

[model]
index_topk_freq = 4

Note

Medium Risk
Touches model config propagation and monkey-patches vLLM/DeepSeek attention internals to reuse sparse top-k indices across layers, which could impact correctness/perf if assumptions change upstream. Also changes GLM MoE DSA attention forward signatures/returns to thread cached indices through the model.

Overview
Adds configurable Index Cache support via a new index_topk_freq model setting, allowing runtime overrides from config without editing HF model directories.

Plumbs index_topk_freq through shared RL config, trainer model loading (overriding AutoConfig), and inference startup (exporting PRIME_RL_INDEX_TOPK_FREQ). In inference, introduces a vLLM monkeypatch that conditionally skips top-k recomputation on non-frequency layers by caching and reusing indices across DeepSeek/MLA layers.

Extends the custom glm_moe_dsa model to support the same cross-layer reuse: adds index_topk_freq to its config, marks layers to skip top-k, and threads cached_topk_indices through attention/layer/model forwards while returning the computed indices for reuse.

^{Written by Cursor Bugbot for commit 3191c5b. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

✅ Fixed: Unguarded list access crashes on empty regex match
- Added a guard to return early when the regex match list is empty before indexing.
✅ Fixed: Indexer receives None for q_c in else branch
- When q_lora_rank is None, pass hidden_states as a fallback q_c to the indexer.

Or push these changes by commenting:

@cursor push 108af11ad1

Preview (108af11ad1)

diff --git a/src/prime_rl/inference/patches.py b/src/prime_rl/inference/patches.py
--- a/src/prime_rl/inference/patches.py
+++ b/src/prime_rl/inference/patches.py
@@ -28,7 +28,10 @@
     if num_hidden_layers is None:
         return False
 
-    layer_idx = int(_LAYER_INDEX_RE.findall(prefix)[-1])
+    matches = _LAYER_INDEX_RE.findall(prefix)
+    if not matches:
+        return False
+    layer_idx = int(matches[-1])
     if layer_idx >= num_hidden_layers:
         return False
 
@@ -97,7 +100,7 @@
                 raise ValueError("IndexCache shared layers require cached top-k indices.")
             topk_indices = prev_topk_indices
         else:
-            topk_indices = self.indexer(hidden_states, q_c, positions, self.indexer_rope_emb)
+            topk_indices = self.indexer(hidden_states, q_c if q_c is not None else hidden_states, positions, self.indexer_rope_emb)
 
         if llama_4_scaling is not None:
             q *= llama_4_scaling

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

src/prime_rl/inference/patches.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-01T23:13:57Z

src/prime_rl/configs/rl.py

+        Field(
+            description="Override the loaded Hugging Face config's `index_topk_freq` for trainer and inference without editing the model directory.",
+        ),
+    ] = None


Missing CHANGELOG entry for new config field

Low Severity

This PR adds a new index_topk_freq config field to both SharedModelConfig in configs/rl.py and BaseModelConfig in configs/shared.py, but CHANGELOG.md has no corresponding entry. Per project rules, any PR that modifies configuration structures (added, removed, renamed, moved, or default value changes) must update the changelog.

Additional Locations (1)

src/prime_rl/configs/shared.py#L118-L124

^{Triggered by project rule: BugBot Instructions}

faresobeid added 2 commits April 2, 2026 04:03

indexCache

1064209

ruff

fdf6e51

cursor bot reviewed Apr 1, 2026

View reviewed changes

src/prime_rl/inference/patches.py Show resolved Hide resolved

src/prime_rl/inference/patches.py Show resolved Hide resolved

faresobeid added 2 commits April 2, 2026 04:28

fix

b64198c

ruff

3191c5b

cursor bot reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexCache#2178

indexCache#2178
faresobeid wants to merge 4 commits intomainfrom
indexCache

faresobeid commented Apr 1, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

faresobeid commented Apr 1, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 1, 2026

Choose a reason for hiding this comment

Missing CHANGELOG entry for new config field

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

faresobeid commented Apr 1, 2026 •

edited by cursor bot

Loading

cursor bot left a comment •

edited

Loading