feat: reward-gated length shaping for correct rollouts by hallerite · Pull Request #2169 · PrimeIntellect-ai/prime-rl

hallerite · 2026-04-01T09:59:34Z

Summary

Replace GR³ length scaling with a simpler correctness-gated approach: correct rollout rewards are attenuated by L_min / L_i, where L_min is the shortest correct completion per problem
Shortest correct rollout keeps full reward (1.0), longer correct ones get proportionally less, incorrect rollouts are untouched
Remove length_shaping_alpha: float in favor of length_shaping: bool (no hyperparameter needed) and drop the online difficulty filtering requirement

Breaking changes

orchestrator.advantage.length_shaping_alpha → orchestrator.advantage.length_shaping (float|None → bool)
length_shaping no longer requires online_difficulty_filtering = true

🤖 Generated with Claude Code

Replace GR³ multiplicative length scaling (applied to all rollouts, gated on online difficulty filtering) with a simpler approach: attenuate correct rollout rewards by L_min/L_i, where L_min is the shortest correct completion per problem. Incorrect rollouts are untouched. - No hyperparameter needed (was `length_shaping_alpha`, now `length_shaping: bool`) - No dependency on online difficulty filtering - Shortest correct rollout keeps reward=1, longer ones get proportionally less Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…", "gr3") Both length shaping strategies are now available via a single config field: - length_shaping = "off" (default): no length shaping - length_shaping = "brevity_bonus": reward-gated L_min/L_i (correct rollouts only) - length_shaping = "gr3": GR³ multiplicative (1 + alpha * L_i/L_mean)^-1 (all rollouts) length_shaping_alpha controls the GR³ coefficient (default 0.33, only used with "gr3"). Replaces the old length_shaping_alpha-only API and removes the ODF coupling. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

hallerite force-pushed the feat/reward-gated-length-shaping branch 2 times, most recently from ba1aef7 to fac1a2c Compare April 1, 2026 17:41

hallerite force-pushed the feat/reward-gated-length-shaping branch from fac1a2c to c8ebf0c Compare April 1, 2026 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: reward-gated length shaping for correct rollouts#2169

feat: reward-gated length shaping for correct rollouts#2169
hallerite wants to merge 2 commits intomainfrom
feat/reward-gated-length-shaping

hallerite commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallerite commented Apr 1, 2026

Summary

Breaking changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant