Skip to content

red team: _MAX_BACKTRACKS = 10 is hardcoded; should scale with total_turns #331

@Aryansharma28

Description

@Aryansharma28

Problem

`python/scenario/red_team_agent.py:209`:

```python
self._MAX_BACKTRACKS = 10
```

This is unconditional. Edge cases:

  • `total_turns=5` → 10 backtracks possible but only 5 turns exist → cap is meaningless
  • `total_turns=100` → 10 backtracks against 100 turns means we waste 90% of the budget on dead ends if the target is highly defensive

A backtrack also costs a turn from the budget (documented in `marathon_script` docstring), so they have direct cost.

Why it matters

Real targets vary wildly in defensiveness. A well-aligned model may force 20+ backtracks in a 50-turn run. A weakly-aligned model may need zero. Hardcoding 10 splits the baby badly for both extremes.

Proposed fix

Either:

  • `max(1, total_turns // 3)` as the implicit default, OR
  • Expose as a constructor parameter `max_backtracks: Optional[int] = None` defaulting to the formula above.

Same in TS `backtracksRemaining` initialization.

Acceptance

  • Default scales with `total_turns`
  • Optional explicit override
  • Test: `total_turns=3` doesn't allow more backtracks than turns

Discovered during PR #306 review.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions