karpathy · buzypi · Mar 10, 2026 · Mar 10, 2026 · Mar 10, 2026 · Mar 10, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,69 @@
+# Agent Runbook: Autoresearch Execution
+
+Use `workflows/run_experiment.py` for all autoresearch execution. Do not use `workflow/dag.json` or task-graph-planner for running experiments.
+
+## Core Rules
+
+1. Keep all run artifacts under `workflows/runs/`.
+2. Modify only `train.py` during experiments.
+3. Never modify `prepare.py`.
+4. Always run training via `uv run train.py` through the script.
+
+## Natural-language to Command Mapping
+
+- User says: "Start running the experiment, run 5 loops"
+  - Run: `python workflows/run_experiment.py start --loops 5`
+
+- User says: "Run another 5 iterations"
+  - Run: `python workflows/run_experiment.py resume --loops 5`
+
+- User says: "Resume run <run_id> and run 5 loops"
+  - Run: `python workflows/run_experiment.py resume --run-id <run_id> --loops 5`
+
+- User says: "Only run setup and baseline"
+  - Run: `python workflows/run_experiment.py start --only setup,baseline`
+
+- User says: "Only run training and decision parts in loops for 3 iterations"
+  - Run: `python workflows/run_experiment.py resume --loops 3 --only loop --loop-only train,record,decide`
+
+- User says: "Show run status"
+  - Run: `python workflows/run_experiment.py status`
+
+## Stage Controls
+
+- Top-level stages: `setup`, `baseline`, `loop`
+- Loop stages: `propose`, `apply`, `commit`, `train`, `triage`, `record`, `decide`
+
+Supported control flags:
+
+- `--only <comma-list>`: run only selected stages
+- `--from-stage <setup|baseline|loop>` + `--to-stage <...>`: run a top-level stage range
+- `--loop-only <comma-list>`: limit loop internals to selected stages
+- `--loops N`: run `N` loop iterations
+
+## Resume Behavior
+
+- The script checkpoints state at `workflows/runs/<run_id>/state.json`.
+- If a loop iteration is partially complete, `resume` continues that iteration from the next pending stage.
+- "Run another N iterations" means execute N more loop iterations from current state.
+- Training runs are started in background by default (`--background-train`), and `resume` polls/continues in-flight baseline/train jobs.
+
+## Logging and Observability
+
+- Human-readable execution log: `workflows/runs/<run_id>/runner.log`
+- Structured event log: `workflows/runs/<run_id>/history.jsonl`
+- Checkpoint state: `workflows/runs/<run_id>/state.json`
+- Per-iteration details (including opencode raw outputs): `workflows/runs/<run_id>/iterations/<NNNN>/`
+
+## Run ID Policy
+
+- Default run id: `<branch-slug>-rNNN`
+- Example: branch `autoresearch/mar10` -> `autoresearch-mar10-r001`
+- On `resume` without `--run-id`, script picks latest run for current branch.
+
+## Notes
+
+- Use `--no-stochastic` only when opencode stochastic execution is unavailable.
+- Setup auto-runs `uv run prepare.py` if cache/tokenizer are missing (disable via `--no-auto-prepare`).
+- Background training is enabled by default; disable with `--no-background-train` for fully foreground execution.
+- `results.tsv` is maintained in repo root and should remain untracked.
diff --git a/README.md b/README.md
@@ -49,6 +49,35 @@ Hi have a look at program.md and let's kick off a new experiment! let's do the s
 
 The `program.md` file is essentially a super lightweight "skill".
 
+### OpenCode quickstart
+
+If you want to run this with OpenCode, use this flow:
+
+1. Clone the repo.
+2. Install OpenCode.
+3. `cd` into the repo.
+4. Run `opencode`.
+5. Type: `Run the experiment for 1 loop`.
+
+The agent will use `workflows/run_experiment.py` and kick off the run.
+
+### Why this workflow is better
+
+In **this fork** (`buzypi/autoresearch`), autonomous runs are executed through a dedicated workflow script (`workflows/run_experiment.py`) and an explicit agent runbook (`AGENTS.md`).
+
+That is different from the older "just follow `program.md` directly" execution style, where each agent session had to repeatedly infer process details from prose instructions.
+
+Benefits:
+
+- **Operational consistency vs prose-only execution.** Instead of relying on session-by-session interpretation of `program.md`, one script now encodes setup, baseline, loop control, and resume behavior.
+- **Natural-language intent still works.** `AGENTS.md` maps prompts like "run another 5 iterations" to deterministic commands, so agents stay aligned.
+- **Reliable continuation.** Runs persist state under `workflows/runs/<run_id>/`, so partial iterations can resume from the next pending stage.
+- **Controlled execution surface.** You can target only selected top-level stages or loop sub-stages while keeping the same run state.
+- **Long-run friendly.** Training can run in background, and later `resume` invocations poll/continue in-flight work.
+- **Clear audit trail.** `runner.log`, `history.jsonl`, `state.json`, and per-iteration artifacts make each decision traceable.
+
+`program.md` is still the research policy and objective layer (what to optimize, constraints, judgment criteria). In this fork, `workflows/run_experiment.py` + `AGENTS.md` provide the execution layer (how runs are actually carried out repeatably).
+
 ## Project structure
 
 ```