Enhance experiment runner with deterministic controls by buzypi · Pull Request #148 · karpathy/autoresearch

buzypi · 2026-03-10T18:46:20Z

Summary

This PR adds a fork-specific execution workflow for autonomous experiments, replacing session-by-session program.md interpretation with a deterministic runner plus agent runbook.

What’s included

Add workflows/run_experiment.py as the single experiment orchestrator:
- start, resume, status commands
- top-level stage controls: setup, baseline, loop
- loop sub-stage controls: propose, apply, commit, train, triage, record, decide
- resumable checkpointing under workflows/runs/<run_id>/
- run-id policy: <branch-slug>-rNNN
Add AGENTS.md runbook with explicit natural-language to command mapping for agent sessions.
Setup robustness:
- auto-run uv run prepare.py when cache/tokenizer is missing (default on, opt-out via --no-auto-prepare)
- explicit setup precondition checks before baseline/loop
Background training support:
- training stages start in background by default (--background-train)
- resume polls/continues in-flight baseline/train jobs
Update README:
- OpenCode quickstart instructions
- fork-specific explanation of why this workflow layer (run_experiment.py + AGENTS.md) is better than a program.md-only execution style

Why

In long-running autonomous sessions, prose-only execution is fragile and inconsistent.
This PR makes runs repeatable, resumable, and inspectable while preserving program.md as the policy/objective layer.

svlandeg

Did you mean to submit this to your fork instead? (the readme e.g. says "In this fork (buzypi/autoresearch)"

buzypi · 2026-03-11T20:24:20Z

No, I wanted you to review it and if it looks feasible, I will generate a separate pull request with only the workflow runner.

I have created a separate PR with only the code to merge: #193

buzypi added 5 commits March 10, 2026 22:00

add standalone experiment runner with resumable stage controls

ae36be0

improve experiment runner observability and setup robustness

1f86517

add background training with resume-based polling

2aeb0a1

add opencode quickstart instructions

767ef72

clarify fork workflow vs program.md execution

b13714c

svlandeg reviewed Mar 10, 2026

View reviewed changes

add proposal override schema and document human-in-the-loop flow

c7637c7

svlandeg closed this Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance experiment runner with deterministic controls#148

Enhance experiment runner with deterministic controls#148
buzypi wants to merge 6 commits intokarpathy:masterfrom
buzypi:master

buzypi commented Mar 10, 2026

Uh oh!

svlandeg left a comment

Uh oh!

buzypi commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

buzypi commented Mar 10, 2026

Summary

What’s included

Why

Uh oh!

svlandeg left a comment

Choose a reason for hiding this comment

Uh oh!

buzypi commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

buzypi commented Mar 11, 2026 •

edited

Loading