You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* update loss interface
* refactor: rename grpo_loss to prime_rl_loss and consolidate loss interface
- Rename grpo_loss to prime_rl_loss (uses LossConfig directly)
- Move LossInputs, LossOutputs, LossFn from loss_interface.py into loss.py
- Remove loss_interface.py
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* feat: add bring-your-own-loss support for custom loss functions
- Add CustomLossConfig with path and kwargs fields
- Add LossConfigType union (LossConfig | CustomLossConfig)
- Update setup_loss_fn to handle custom loss imports
- Add _import_object helper for dynamic imports
- Add test for custom loss configuration
- Add docs/bring-your-own-loss.md documentation
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* feat: add bring-your-own advantage function support
- Add AdvantageInputs/AdvantageOutputs dataclasses
- Add CustomAdvantageConfig with path and kwargs
- Add setup_advantage_fn for custom advantage imports
- Refactor compute_advantages to use the new interface
- Add tests for custom advantage configuration
- Rename docs to bring-your-own-algorithms.md with both loss and advantage sections
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* refactor: rename path to byo_function in custom configs
More descriptive name that ties into the "bring your own" concept.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* refactor: address PR review feedback
- Extract import_object to shared utils (deduplicate from loss.py and advantage.py)
- Add pydantic discriminator types for LossConfigType and AdvantageConfigType
- Rename byo_function to import_path (standard Python terminology)
- Rename grpo_advantage to default_advantage
- Fix train.py AttributeError when using CustomLossConfig
- Fix docs: per-example terminology, loss/advantage descriptions, config examples
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* docs: update prime_rl_loss description
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* fix: handle custom loss metrics in training loop and restore extra=forbid on AdvantageConfig
- Guard mismatch_kl access in micro-step and step logging (custom loss may not emit it)
- Change AdvantageConfig base from BaseModel to BaseConfig to restore extra="forbid" validation
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* refactor: rename prime_rl_loss to default_loss_fn
Consistent with default_advantage naming.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* refactor: simplify loss setup logging
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* refactor: rename default_advantage to default_advantage_fn
Consistent with default_loss_fn naming.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
---------
Co-authored-by: Claude Opus 4.5 <[email protected]>
Prime-RL supports custom implementations for key algorithmic components, allowing you to experiment with different RL objectives and techniques.
4
+
5
+
## 1. Custom Loss Functions
6
+
7
+
The loss is computed **per-sequence** (per-sample). You provide a function that computes the loss for a single sequence, and the framework handles iteration and aggregation.
8
+
9
+
### Interface
10
+
11
+
```python
12
+
from prime_rl.trainer.rl.loss import LossInputs, LossOutputs
0 commit comments