Skip to content

feat: add option to filter zero advantage samples#2192

Open
samsja wants to merge 3 commits intomainfrom
feat/filter-zero-advantages
Open

feat: add option to filter zero advantage samples#2192
samsja wants to merge 3 commits intomainfrom
feat/filter-zero-advantages

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented Apr 3, 2026

Summary

  • Add filter_zero_advantages config option to filter out training samples with zero advantage
  • Simple postprocessing step in the orchestrator that filters samples before sending to the trainer
  • Useful for filtering out samples that don't contribute to learning

Note

Medium Risk
Changes the training data pipeline by default-filtering rollouts with advantage == 0.0, which can alter effective batch size and learning dynamics if advantages are frequently zero or computed with floating-point edge cases.

Overview
Adds a new filter_zero_advantages orchestrator config (default enabled) to drop rollouts that have advantage == 0.0 before they are converted into TrainingSamples.

Implements a ZeroAdvantageFilter and updates the orchestrator loop to persist computed advantages onto each rollout, then run the filter via the existing apply_filters mechanism so filtered rollouts have their completion masks cleared and are tracked in the filter metrics.

Written by Cursor Bugbot for commit 64148b0. This will update automatically on new commits. Configure here.

Add config option to filter out training samples with zero advantage.
This is a simple postprocessing step in the orchestrator that filters
samples before sending them to the trainer.
Keep simple filter_zero_advantages config option but use the filter
abstraction internally. ZeroAdvantageFilter is created and applied after
advantages are computed, masking zero-advantage rollouts.
@samsja samsja force-pushed the feat/filter-zero-advantages branch from bd0a0ae to fd012a6 Compare April 3, 2026 14:03
@samsja samsja marked this pull request as ready for review April 3, 2026 15:01
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.


zero_advantage_filter = ZeroAdvantageFilter(name="zero_advantage", enforce=True)
zero_advantage_metrics = apply_filters([zero_advantage_filter], train_rollouts)
filter_metrics.update(zero_advantage_metrics)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aggregate filter metrics overwritten by second apply_filters call

Medium Severity

The filter_metrics.update(zero_advantage_metrics) call overwrites the filter/total_detected_rate and filter/total_enforced_rate keys that were set by the first apply_filters call (for gibberish/repetition filters). Since apply_filters always emits these two aggregate keys, the second invocation's values silently replace the originals, making the logged totals reflect only zero-advantage detections rather than the combined total across all filters.

Additional Locations (1)
Fix in Cursor Fix in Web

Field(
description="Whether to filter out training samples with zero advantage. If True, samples with advantage == 0.0 are not sent to the trainer.",
),
] = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New config field defaults to True, changing existing behavior

Medium Severity

filter_zero_advantages defaults to True, which silently changes training behavior for all existing deployments that don't set this field. Every existing config will now have zero-advantage samples masked out, altering training dynamics. Since the PR title describes this as an "option," defaulting to False would be the safer, non-breaking choice.

Fix in Cursor Fix in Web

Field(
description="Whether to filter out training samples with zero advantage. If True, samples with advantage == 0.0 are not sent to the trainer.",
),
] = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing CHANGELOG entry for new config field

Low Severity

A new config field filter_zero_advantages was added to src/prime_rl/configs/orchestrator.py but CHANGELOG.md was not updated. Per the project rules, any PR that modifies configuration structures in config files must include a corresponding changelog entry.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant