Skip to content

Commit bcd406c

Browse files
authored
Merge pull request #8 from AytuncYildizli/docs/mahmory-autoresearch-usage
Good reference doc for the autoresearch pattern. Pairs with the config-driven policy tuning we added to main.
2 parents d40fe5f + 32a3b95 commit bcd406c

File tree

2 files changed

+192
-0
lines changed

2 files changed

+192
-0
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ ZeroAPI is an OpenClaw plugin that intercepts eligible messages at the gateway l
1010

1111
> **For AI agents**: Start with `SKILL.md` — it contains the complete setup wizard. Read `benchmarks.json` for model data. The `plugin/` directory contains the router source code. Config examples are in `examples/`. Provider setup details are in `references/`.
1212
13+
For a real production example of offline policy tuning around OpenClaw routing, see [`references/mahmory-autoresearch-usage.md`](references/mahmory-autoresearch-usage.md).
14+
1315
## Provider Exclusions
1416

1517
**Anthropic (Claude):** Subscriptions no longer cover OpenClaw as of April 4, 2026. ([source](https://x.com/bcherny/status/2040206440556826908))
@@ -113,6 +115,7 @@ ZeroAPI/
113115
├── cost-summary.md
114116
├── provider-config.md
115117
├── oauth-setup.md
118+
├── mahmory-autoresearch-usage.md
116119
├── subscription-catalog.md
117120
└── troubleshooting.md
118121
```
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
# Routing Autoresearch Pattern
2+
3+
This document describes how the broader OpenClaw stack uses autoresearch in a real production repository, beyond ZeroAPI's own routing policy layer.
4+
5+
The concrete reference implementation lives in `mahobrain/scripts/autoresearch/` and currently runs multiple optimization lanes:
6+
7+
1. `skill-routing` — keyword + semantic threshold tuning for skill dispatch
8+
2. other lanes for unrelated product surfaces
9+
10+
ZeroAPI does not run this framework itself today, but the pattern is directly relevant because it shows how policy-heavy model routing systems can be improved with offline experiment loops instead of intuition-driven config edits.
11+
12+
## Why this matters for ZeroAPI
13+
14+
ZeroAPI already does all of the hard runtime work:
15+
16+
- classify tasks conservatively
17+
- filter ineligible providers
18+
- preserve benchmark order
19+
- apply subscription-aware bias
20+
- write routing decisions as a per-turn override
21+
22+
What autoresearch adds is a disciplined way to tune the constants around that logic.
23+
24+
Instead of "threshold feels too strict" or "this provider bias seems right", the Mahobrain pattern asks:
25+
26+
- What do we optimize?
27+
- What eval set proves it?
28+
- What guardrails stop regressions?
29+
- What candidate should be promoted into live policy?
30+
31+
That same workflow can be applied to ZeroAPI category thresholds, provider weighting, or fallback policy later.
32+
33+
## Framework shape
34+
35+
Mahobrain uses a generic experiment framework plus target-specific tuners:
36+
37+
```text
38+
scripts/autoresearch/
39+
├── framework.py # generic experiment loop
40+
├── autoresearch_loop.py # target entrypoint
41+
├── rollout_state.py # progressive promotion state
42+
├── skill_router_tuner.py # skill routing lane
43+
├── run-overnight.sh # scheduled multi-phase runner
44+
└── results/ # latest_run, leaderboards, rollout state
45+
```
46+
47+
The key design choice is separation:
48+
49+
- runtime behavior stays fast and deterministic
50+
- autoresearch stays offline, file-backed, and repeatable
51+
- only winners are promoted into live config or rollout state
52+
53+
## The relevant target
54+
55+
### Skill routing
56+
57+
Goal: improve the accuracy of selecting the right skill for a user message.
58+
59+
Tuned parameters include:
60+
61+
- match threshold
62+
- recency bonus
63+
- specificity weight
64+
- multi-match penalty
65+
- negative keyword weight
66+
- keyword / semantic blending
67+
68+
Artifacts:
69+
70+
- `results/skill-routing/latest_run.json`
71+
- `results/skill-routing/leaderboard_phase1.json`
72+
73+
This lane is effectively a routing-policy tuner. Conceptually it is the closest sibling to ZeroAPI.
74+
75+
In production this lane is optimized against a fixed eval corpus and guardrailed before promotion. A recent real run looked like this in practice:
76+
77+
- baseline score around `0.892`
78+
- best score around `0.908`
79+
- false-positive guardrail enforced
80+
- p95 kept around `90ms`
81+
82+
That is exactly the kind of loop ZeroAPI would benefit from if routing thresholds, provider bias, or override confidence start drifting away from real user outcomes.
83+
84+
## Execution model
85+
86+
Mahobrain runs two modes:
87+
88+
### Direct/manual run
89+
90+
Run the routing target explicitly:
91+
92+
```bash
93+
cd scripts/autoresearch
94+
python3 autoresearch_loop.py 24 --target skill-routing --phase 1
95+
```
96+
97+
### Scheduled run
98+
99+
Use `run-scheduled.sh` / `run-overnight.sh` for dwell-aware rollout cadence:
100+
101+
```bash
102+
./scripts/autoresearch/run-scheduled.sh
103+
./scripts/autoresearch/run-overnight.sh 24
104+
```
105+
106+
The runner:
107+
108+
1. loads current leaderboard state
109+
2. computes a baseline
110+
3. explores N candidates
111+
4. applies guardrails
112+
5. writes winner artifacts
113+
6. updates rollout state when the target supports live promotion
114+
115+
## Guardrails
116+
117+
The framework is not pure hill-climbing. It rejects candidates that win the objective while harming operational behavior.
118+
119+
Examples from the production lanes:
120+
121+
- `skill-routing`
122+
- reject if accuracy drops below floor
123+
- reject if false-positive rate rises above cap
124+
125+
This is the operationally useful part. The loop is not "search until score goes up"; it is "search inside a safety box."
126+
127+
## Result files and promotion discipline
128+
129+
Each target keeps a narrow file contract:
130+
131+
- `latest_run.json` — last completed experiment batch
132+
- `leaderboard_phase1.json` — best candidate for the first parameter family
133+
- `leaderboard_phase2.json` — best candidate for the second parameter family
134+
- `rollout_state.json` — only for targets with live promotion semantics
135+
- `runs/YYYY-MM-DD/history_*.jsonl` — experiment-by-experiment history
136+
137+
This makes it easy for dashboards and ops panels to read status without understanding the full framework internals.
138+
139+
For ZeroAPI, this pattern is preferable to writing ad hoc notes into config comments or manually editing benchmark weights with no evidence trail.
140+
141+
## What ZeroAPI can borrow
142+
143+
If ZeroAPI later adds its own autoresearch lane, the Mahobrain pattern suggests:
144+
145+
1. Keep runtime routing cheap and synchronous.
146+
2. Keep tuning offline and file-backed.
147+
3. Tune one policy layer at a time.
148+
4. Define explicit guardrails before running search.
149+
5. Promote only winners, never raw experiment output.
150+
6. Preserve auditability via `latest_run`, leaderboards, and rollout state.
151+
152+
Concrete candidate targets for ZeroAPI:
153+
154+
- category keyword thresholds
155+
- provider bias weights
156+
- fallback ordering under subscription constraints
157+
- "stay on default model" vs "override model" confidence thresholds
158+
- per-category fast-lane eligibility
159+
160+
## What not to copy blindly
161+
162+
Mahobrain's framework is broad because it serves memory, skills, and content quality.
163+
ZeroAPI should stay narrower.
164+
165+
Good fit:
166+
167+
- offline routing-policy experiments
168+
- subscription-weight tuning
169+
- fallback and threshold calibration
170+
171+
Bad fit:
172+
173+
- unrelated non-routing eval targets
174+
- any runtime dependence on the autoresearch loop
175+
176+
ZeroAPI should use autoresearch to refine policy, not to make routing depend on a background optimizer.
177+
178+
## Bottom line
179+
180+
Mahobrain proves that autoresearch is useful when:
181+
182+
- the runtime policy is deterministic
183+
- the optimization target is explicit
184+
- guardrails are strong
185+
- file outputs are simple enough for dashboards and operators
186+
187+
That is the practical takeaway for ZeroAPI.
188+
189+
ZeroAPI already has the right architectural boundary for this style of optimization. If and when policy tuning becomes noisy enough to justify automation, this routing-oriented workflow is a solid template.

0 commit comments

Comments
 (0)