Skip to content

Conversation

@willccbb
Copy link
Contributor

Reorganization of data flow for the verifiers_rl recipe to utilize new internal structure from verifiers>=0.1.8, including encapsulation of rollout data within a State object which can be more directly translated to the tinker TrajectoryGroup format.

This allows maintaining only a single generic OpenAI client (rather than one per-rollout with a hook for data tracking), and adds support for groupwise rewards from verifiers Rubric classes.

Tested on the reverse-text training example.

@willccbb willccbb marked this pull request as draft November 26, 2025 06:43
@willccbb willccbb marked this pull request as ready for review November 29, 2025 10:24
@willccbb willccbb changed the title [DRAFT] verifiers_rl updates for verifiers v0.1.8 release verifiers_rl recipe updates for verifiers v0.1.8 release Nov 29, 2025
Copy link
Collaborator

@Tiiiger Tiiiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Tiiiger Tiiiger merged commit d177465 into thinking-machines-lab:main Dec 2, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants