Skip to content

Critic context overload at high branch count — pairwise/chunked scoring #7

@UditAkhourii

Description

@UditAkhourii

The expensive moment in the loop is when the critic has to hold all N branches simultaneously to score and rank them. Observed degradation: at ~60–80K tokens of combined branch output feeding into one critic call, scoring becomes inconsistent. Cost scales with branch_count × branch_length, not linearly with branch count.

Three possible fixes to investigate:

  • Pairwise tournament scoring (Elo-style) — each comparison stays under context limit; aggregate over O(N log N) comparisons.
  • Chunked scoring with normalization across chunks — the critic scores groups of 4–5 ideas at a time, then a second pass normalizes scores across chunks.
  • Hierarchical scoring — cluster-level first (which angles look promising), then idea-level within winning clusters only. Reduces total comparisons.

Tradeoff to measure: extra critic calls vs improved scoring consistency. The eval harness can test each variant on the existing problem set.


Raised by u/Unlikely_Ad_8060 with concrete numbers from their own multi-agent harness (3–5 concurrent agents, raised after Anthropic doubled rate limits in May).

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureLoad-bearing design changesenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions