Critic context overload at high branch count — pairwise/chunked scoring

The expensive moment in the loop is when the critic has to hold all N branches simultaneously to score and rank them. Observed degradation: at ~60–80K tokens of combined branch output feeding into one critic call, scoring becomes inconsistent. Cost scales with `branch_count × branch_length`, not linearly with branch count.

**Three possible fixes to investigate:**
- **Pairwise tournament scoring** (Elo-style) — each comparison stays under context limit; aggregate over O(N log N) comparisons.
- **Chunked scoring** with normalization across chunks — the critic scores groups of 4–5 ideas at a time, then a second pass normalizes scores across chunks.
- **Hierarchical scoring** — cluster-level first (which angles look promising), then idea-level within winning clusters only. Reduces total comparisons.

**Tradeoff to measure:** extra critic calls vs improved scoring consistency. The eval harness can test each variant on the existing problem set.

---

*Raised by u/Unlikely_Ad_8060 with concrete numbers from their own multi-agent harness (3–5 concurrent agents, raised after Anthropic doubled rate limits in May).*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Critic context overload at high branch count — pairwise/chunked scoring #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Critic context overload at high branch count — pairwise/chunked scoring #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions