Enhance experiment runner with deterministic controls#148
Closed
buzypi wants to merge 6 commits intokarpathy:masterfrom
Closed
Enhance experiment runner with deterministic controls#148buzypi wants to merge 6 commits intokarpathy:masterfrom
buzypi wants to merge 6 commits intokarpathy:masterfrom
Conversation
svlandeg
reviewed
Mar 10, 2026
Collaborator
svlandeg
left a comment
There was a problem hiding this comment.
Did you mean to submit this to your fork instead? (the readme e.g. says "In this fork (buzypi/autoresearch)"
Author
|
No, I wanted you to review it and if it looks feasible, I will generate a separate pull request with only the workflow runner. I have created a separate PR with only the code to merge: #193 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a fork-specific execution workflow for autonomous experiments, replacing session-by-session
program.mdinterpretation with a deterministic runner plus agent runbook.What’s included
workflows/run_experiment.pyas the single experiment orchestrator:start,resume,statuscommandssetup,baseline,looppropose,apply,commit,train,triage,record,decideworkflows/runs/<run_id>/<branch-slug>-rNNNAGENTS.mdrunbook with explicit natural-language to command mapping for agent sessions.uv run prepare.pywhen cache/tokenizer is missing (default on, opt-out via--no-auto-prepare)--background-train)resumepolls/continues in-flight baseline/train jobsrun_experiment.py+AGENTS.md) is better than a program.md-only execution styleWhy
In long-running autonomous sessions, prose-only execution is fragile and inconsistent.
This PR makes runs repeatable, resumable, and inspectable while preserving
program.mdas the policy/objective layer.