feat: update default loss to dppo+kl by mikasenghaas · Pull Request #2187 · PrimeIntellect-ai/prime-rl

mikasenghaas · 2026-04-02T14:02:36Z

Summary

Condition the trust region mask on advantage sign (DPPO-style) instead of masking both directions unconditionally
Rename ipo_mask_{low,high} → dppo_mask_{low,high} in config and internals -- this is BREAKING
Based on hendrycks-sanity ablation results showing DPPO+KL outperforms IPO

🤖 Generated with Claude Code

Note

Medium Risk
Changes the core RL objective by altering the default trust-region masking behavior and renaming config fields, which can materially impact training dynamics and is a breaking config change.

Overview
Updates the default RL loss to a DPPO+KL-style objective by conditioning the trust-region token mask on the sign of the advantage instead of masking both probability-increase and probability-decrease violations unconditionally.

Renames loss config thresholds from ipo_mask_{low,high} to dppo_mask_{low,high} (breaking for existing configs) and updates unit tests to use the new fields.

^{Written by Cursor Bugbot for commit 8595ae4. This will update automatically on new commits. Configure here.}

Condition the trust region mask on advantage sign instead of masking both directions unconditionally. Rename ipo_mask -> dppo_mask in config and internal variables. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-02T14:59:23Z

src/prime_rl/configs/trainer.py

-    ipo_mask_low: Annotated[float, Field(ge=0, description="The low threshold for masking tokens.")] = 0.2
-    ipo_mask_high: Annotated[float, Field(ge=0, description="The high threshold for masking tokens.")] = 0.2
+    dppo_mask_low: Annotated[float, Field(ge=0, description="The low threshold for masking tokens.")] = 0.2
+    dppo_mask_high: Annotated[float, Field(ge=0, description="The high threshold for masking tokens.")] = 0.2


CHANGELOG not updated for breaking config rename

Low Severity

The ipo_mask_low and ipo_mask_high fields in DefaultLossConfig were renamed to dppo_mask_low and dppo_mask_high — a breaking config change — but CHANGELOG.md was not updated. The existing entry at line 127 still references the old ipo_mask_* names. This violates the rule requiring any PR that modifies configuration structures in src/prime_rl/*/config.py to update the changelog.

^{Triggered by project rule: BugBot Instructions}

feat: switch default loss from IPO to DPPO+KL

43529ab

Condition the trust region mask on advantage sign instead of masking both directions unconditionally. Rename ipo_mask -> dppo_mask in config and internal variables. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

mikasenghaas changed the title ~~switch default loss to DPPO+KL~~ feat: update default loss to dppo+kl Apr 2, 2026

ruff

8595ae4

mikasenghaas requested review from faresobeid and samsja April 2, 2026 14:04

samsja approved these changes Apr 2, 2026

View reviewed changes

mikasenghaas marked this pull request as ready for review April 2, 2026 14:56

cursor bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: update default loss to dppo+kl#2187

feat: update default loss to dppo+kl#2187
mikasenghaas wants to merge 2 commits intomainfrom
feat/dppo-kl-default-loss

mikasenghaas commented Apr 2, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mikasenghaas commented Apr 2, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 2, 2026

Choose a reason for hiding this comment

CHANGELOG not updated for breaking config rename

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mikasenghaas commented Apr 2, 2026 •

edited by cursor bot

Loading