feat: add option to filter zero advantage samples by samsja · Pull Request #2192 · PrimeIntellect-ai/prime-rl

samsja · 2026-04-03T13:43:35Z

Summary

Add filter_zero_advantages config option to filter out training samples with zero advantage
Simple postprocessing step in the orchestrator that filters samples before sending to the trainer
Useful for filtering out samples that don't contribute to learning

Note

Medium Risk
Changes the training data pipeline by default-filtering rollouts with advantage == 0.0, which can alter effective batch size and learning dynamics if advantages are frequently zero or computed with floating-point edge cases.

Overview
Adds a new filter_zero_advantages orchestrator config (default enabled) to drop rollouts that have advantage == 0.0 before they are converted into TrainingSamples.

Implements a ZeroAdvantageFilter and updates the orchestrator loop to persist computed advantages onto each rollout, then run the filter via the existing apply_filters mechanism so filtered rollouts have their completion masks cleared and are tracked in the filter metrics.

^{Written by Cursor Bugbot for commit 64148b0. This will update automatically on new commits. Configure here.}

Add config option to filter out training samples with zero advantage. This is a simple postprocessing step in the orchestrator that filters samples before sending them to the trainer.

src/prime_rl/configs/orchestrator.py

Keep simple filter_zero_advantages config option but use the filter abstraction internally. ZeroAdvantageFilter is created and applied after advantages are computed, masking zero-advantage rollouts.

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-03T15:06:53Z

src/prime_rl/orchestrator/orchestrator.py

+
+            zero_advantage_filter = ZeroAdvantageFilter(name="zero_advantage", enforce=True)
+            zero_advantage_metrics = apply_filters([zero_advantage_filter], train_rollouts)
+            filter_metrics.update(zero_advantage_metrics)


Aggregate filter metrics overwritten by second apply_filters call

Medium Severity

The filter_metrics.update(zero_advantage_metrics) call overwrites the filter/total_detected_rate and filter/total_enforced_rate keys that were set by the first apply_filters call (for gibberish/repetition filters). Since apply_filters always emits these two aggregate keys, the second invocation's values silently replace the originals, making the logged totals reflect only zero-advantage detections rather than the combined total across all filters.

Additional Locations (1)

src/prime_rl/orchestrator/filters.py#L188-L190

cursor · 2026-04-03T15:06:53Z

src/prime_rl/configs/orchestrator.py

+        Field(
+            description="Whether to filter out training samples with zero advantage. If True, samples with advantage == 0.0 are not sent to the trainer.",
+        ),
+    ] = True


New config field defaults to True, changing existing behavior

Medium Severity

filter_zero_advantages defaults to True, which silently changes training behavior for all existing deployments that don't set this field. Every existing config will now have zero-advantage samples masked out, altering training dynamics. Since the PR title describes this as an "option," defaulting to False would be the safer, non-breaking choice.

cursor · 2026-04-03T15:06:53Z

src/prime_rl/configs/orchestrator.py

+        Field(
+            description="Whether to filter out training samples with zero advantage. If True, samples with advantage == 0.0 are not sent to the trainer.",
+        ),
+    ] = True


Missing CHANGELOG entry for new config field

Low Severity

A new config field filter_zero_advantages was added to src/prime_rl/configs/orchestrator.py but CHANGELOG.md was not updated. Per the project rules, any PR that modifies configuration structures in config files must include a corresponding changelog entry.

^{Triggered by project rule: BugBot Instructions}

feat: add option to filter zero advantage samples

9e255e2

Add config option to filter out training samples with zero advantage. This is a simple postprocessing step in the orchestrator that filters samples before sending them to the trainer.

samsja commented Apr 3, 2026

View reviewed changes

src/prime_rl/configs/orchestrator.py Outdated Show resolved Hide resolved

feat: use filter abstraction for zero advantage filtering

fd012a6

Keep simple filter_zero_advantages config option but use the filter abstraction internally. ZeroAdvantageFilter is created and applied after advantages are computed, masking zero-advantage rollouts.

samsja force-pushed the feat/filter-zero-advantages branch from bd0a0ae to fd012a6 Compare April 3, 2026 14:03

samsja marked this pull request as ready for review April 3, 2026 15:01

feat: enable zero advantage filtering by default

64148b0

cursor bot reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add option to filter zero advantage samples#2192

feat: add option to filter zero advantage samples#2192
samsja wants to merge 3 commits intomainfrom
feat/filter-zero-advantages

samsja commented Apr 3, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 3, 2026

Uh oh!

cursor bot Apr 3, 2026

Uh oh!

cursor bot Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samsja commented Apr 3, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 3, 2026

Choose a reason for hiding this comment

Aggregate filter metrics overwritten by second apply_filters call

Uh oh!

cursor bot Apr 3, 2026

Choose a reason for hiding this comment

New config field defaults to True, changing existing behavior

Uh oh!

cursor bot Apr 3, 2026

Choose a reason for hiding this comment

Missing CHANGELOG entry for new config field

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samsja commented Apr 3, 2026 •

edited by cursor bot

Loading