[fix] dont use ep enabled variant of clip gradnorm by Jackmin801 · Pull Request #2144 · PrimeIntellect-ai/prime-rl

Jackmin801 · 2026-03-31T08:14:02Z

This seems to not actually work but without it its fine

Note

Low Risk
Low risk: a small change limited to the SFT training loop’s gradient clipping call; main risk is behavior differences in gradient norm computation/clipping under expert-parallel configurations.

Overview
Removes use of the ep_enabled argument when calling clip_grad_norm_ in the SFT training loop, always using the default gradient-norm clipping path.

^{Written by Cursor Bugbot for commit 501b237. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Incomplete fix: RL trainer still uses broken ep_enabled
- Removed ep_enabled parameter from clip_grad_norm_ call in RL trainer to match the fix already applied to SFT trainer.

Or push these changes by commenting:

@cursor push 7e3946985b

Preview (7e3946985b)

diff --git a/src/prime_rl/trainer/rl/train.py b/src/prime_rl/trainer/rl/train.py
--- a/src/prime_rl/trainer/rl/train.py
+++ b/src/prime_rl/trainer/rl/train.py
@@ -462,9 +462,7 @@
 
         # Optionally, clip the gradients
 
-        grad_norm = clip_grad_norm_(
-            model.parameters(), max_norm=config.optim.max_norm, ep_enabled=parallel_dims.ep_enabled
-        )
+        grad_norm = clip_grad_norm_(model.parameters(), max_norm=config.optim.max_norm)
         if grad_norm.device.type == "cpu":
             grad_norm = grad_norm.to(torch.device("cuda"))
         zero_grad_ratio = get_zero_gradient_ratio(model.parameters(), parallel_dims.dp_replicate)

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

cursor · 2026-03-31T08:16:39Z

src/prime_rl/trainer/sft/train.py

-        grad_norm = clip_grad_norm_(
-            model.parameters(), max_norm=config.optim.max_norm, ep_enabled=parallel_dims.ep_enabled
-        )
+        grad_norm = clip_grad_norm_(model.parameters(), max_norm=config.optim.max_norm)


Incomplete fix: RL trainer still uses broken ep_enabled

Medium Severity

The ep_enabled=parallel_dims.ep_enabled parameter was removed from clip_grad_norm_ in the SFT trainer because it "doesn't actually work," but the identical call in src/prime_rl/trainer/rl/train.py still passes ep_enabled=parallel_dims.ep_enabled. If the EP-enabled variant of clip_grad_norm_ is broken, the RL trainer has the same problem and likely needs the same fix.

S1ro1 · 2026-03-31T15:39:14Z

src/prime_rl/trainer/sft/train.py

-        grad_norm = clip_grad_norm_(
-            model.parameters(), max_norm=config.optim.max_norm, ep_enabled=parallel_dims.ep_enabled
-        )
+        grad_norm = clip_grad_norm_(model.parameters(), max_norm=config.optim.max_norm)


Imo this will be broken with standard EP, so it should be just en_enabled=use_deepep and the same should be added to rl? Though I'd swear it worked before with deepep

Jackmin801 added 2 commits March 31, 2026 08:13

dont use ep enabled variant of clip gradnorm

7e082d6

ruff

501b237

cursor bot reviewed Mar 31, 2026

View reviewed changes

S1ro1 reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] dont use ep enabled variant of clip gradnorm#2144

[fix] dont use ep enabled variant of clip gradnorm#2144
Jackmin801 wants to merge 2 commits intomainfrom
fix-ep-sft

Jackmin801 commented Mar 31, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

cursor bot Mar 31, 2026

Uh oh!

S1ro1 Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jackmin801 commented Mar 31, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 31, 2026

Choose a reason for hiding this comment

Incomplete fix: RL trainer still uses broken ep_enabled

Uh oh!

S1ro1 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jackmin801 commented Mar 31, 2026 •

edited by cursor bot

Loading

cursor bot left a comment •

edited

Loading