Skip to content

feat: Test Time RL - Auto Research#94

Open
Aum08Desai wants to merge 11 commits intokarpathy:masterfrom
Aum08Desai:codex/test-time-rl-auto-research
Open

feat: Test Time RL - Auto Research#94
Aum08Desai wants to merge 11 commits intokarpathy:masterfrom
Aum08Desai:codex/test-time-rl-auto-research

Conversation

@Aum08Desai
Copy link

@Aum08Desai Aum08Desai commented Mar 9, 2026

Summary

  • adds a thin TTT-Discover integration on top of autoresearch
  • keeps the original autoresearch workflow intact
  • uses the outer RL loop to propose train.py edits and uses inner-loop val_bpb as reward
  • This should hypothetically (based on literature), massively outperform any frontier model, even when using TTT-Discover on a small OS model.

Notes

  • this has now been mostly tested and is confirmed to work in large part end to end
  • I am currently still debugging it locally, but it works almost in entirety
  • the main remaining rough edge is that the agent's code-edit replacement behavior can still be somewhat unreliable in some batches
  • the most recent pushes are specifically aimed at improving that edit reliability

Safety

  • this PR does not overwrite upstream master
  • it is proposed from a separate fork branch for review

@Aum08Desai Aum08Desai changed the title Test Time RL - Auto Research feat: Test Time RL - Auto Research Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants