[fix] dont use ep enabled variant of clip gradnorm#2144
[fix] dont use ep enabled variant of clip gradnorm#2144Jackmin801 wants to merge 2 commits intomainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Incomplete fix: RL trainer still uses broken ep_enabled
- Removed ep_enabled parameter from clip_grad_norm_ call in RL trainer to match the fix already applied to SFT trainer.
Or push these changes by commenting:
@cursor push 7e3946985b
Preview (7e3946985b)
diff --git a/src/prime_rl/trainer/rl/train.py b/src/prime_rl/trainer/rl/train.py
--- a/src/prime_rl/trainer/rl/train.py
+++ b/src/prime_rl/trainer/rl/train.py
@@ -462,9 +462,7 @@
# Optionally, clip the gradients
- grad_norm = clip_grad_norm_(
- model.parameters(), max_norm=config.optim.max_norm, ep_enabled=parallel_dims.ep_enabled
- )
+ grad_norm = clip_grad_norm_(model.parameters(), max_norm=config.optim.max_norm)
if grad_norm.device.type == "cpu":
grad_norm = grad_norm.to(torch.device("cuda"))
zero_grad_ratio = get_zero_gradient_ratio(model.parameters(), parallel_dims.dp_replicate)This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
| grad_norm = clip_grad_norm_( | ||
| model.parameters(), max_norm=config.optim.max_norm, ep_enabled=parallel_dims.ep_enabled | ||
| ) | ||
| grad_norm = clip_grad_norm_(model.parameters(), max_norm=config.optim.max_norm) |
There was a problem hiding this comment.
Incomplete fix: RL trainer still uses broken ep_enabled
Medium Severity
The ep_enabled=parallel_dims.ep_enabled parameter was removed from clip_grad_norm_ in the SFT trainer because it "doesn't actually work," but the identical call in src/prime_rl/trainer/rl/train.py still passes ep_enabled=parallel_dims.ep_enabled. If the EP-enabled variant of clip_grad_norm_ is broken, the RL trainer has the same problem and likely needs the same fix.
| grad_norm = clip_grad_norm_( | ||
| model.parameters(), max_norm=config.optim.max_norm, ep_enabled=parallel_dims.ep_enabled | ||
| ) | ||
| grad_norm = clip_grad_norm_(model.parameters(), max_norm=config.optim.max_norm) |
There was a problem hiding this comment.
Imo this will be broken with standard EP, so it should be just en_enabled=use_deepep and the same should be added to rl? Though I'd swear it worked before with deepep



This seems to not actually work but without it its fine
Note
Low Risk
Low risk: a small change limited to the SFT training loop’s gradient clipping call; main risk is behavior differences in gradient norm computation/clipping under expert-parallel configurations.
Overview
Removes use of the
ep_enabledargument when callingclip_grad_norm_in the SFT training loop, always using the default gradient-norm clipping path.Written by Cursor Bugbot for commit 501b237. This will update automatically on new commits. Configure here.