You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(red-team): use gpt-5.4 and 50 turns in code examples (#323)
- Swap attacker/planner model from gpt-4o to gpt-5.4 in every code example
(kept gpt-4o-mini where the example is intentionally the cheap path)
- Bump total_turns / totalTurns from 30 to 50 in every code example so
the docs match the recommended configuration
- Update attack_plan phase boundaries to split 50 turns along the Crescendo
20/45/75/100% breakpoints
- Rework surrounding prose + CI section so recommendations and example
code agree (50 turns everywhere; use score_responses=False for per-PR
speed instead of dropping the turn count)
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
`total_turns` is the **only** parameter that controls how long a red team test runs — it is a hard cap. `attacker.marathon_script()` reads it from the agent automatically. No need to set `max_turns` on `scenario.run()`. Backtracked turns count toward the budget, and early exit can end the test sooner if the objective is achieved.
118
118
</Callout>
119
119
120
-
We recommend **50 turns** (`total_turns=50`) for thorough nightly adversarial coverage. Agents that hold at turn 1 often break by turn 20. The default of 30 turns is a good balance for per-PR checks — fewer turns miss vulnerabilities that only surface under sustained escalation pressure.
120
+
We recommend **50 turns** (`total_turns=50`) for thorough adversarial coverage. Agents that hold at turn 1 often break by turn 20 — fewer turns miss vulnerabilities that only surface under sustained escalation pressure. For faster per-PR runs, keep 50 turns and disable per-turn scoring (`score_responses=False`) instead of lowering the turn count.
121
121
122
122
---
123
123
@@ -169,9 +169,9 @@ On hard refusals, the attacker removes the refused exchange from conversation hi
169
169
attacker = scenario.RedTeamAgent.crescendo(
170
170
target="get the agent to reveal its system prompt", # required
| Attack objective |`target`|`target`|*required*| What the attacker tries to achieve. |
222
222
| Attacker model |`model`|`model`| global default | Generates attack messages every turn. |
223
223
| Planner/scorer model |`metaprompt_model`|`metapromptModel`| same as `model`| Plans attack once, scores responses per turn. |
224
-
| Total turns |`total_turns`|`totalTurns`|`30`| Number of attack turns. This is the single control for test duration — `max_turns` is ignored for scripted red team tests. 50 recommended for nightly. |
224
+
| Total turns |`total_turns`|`totalTurns`|`30`| Number of attack turns. This is the single control for test duration — `max_turns` is ignored for scripted red team tests. **50 recommended** for thorough coverage. |
225
225
| Per-turn scoring |`score_responses`|`scoreResponses`|`True`| Score responses 0–10 and adapt. |
target: "get the agent to ignore its instructions",
529
-
model: openai("gpt-4o"),
530
-
totalTurns: 30,
529
+
model: openai("gpt-5.4"),
530
+
totalTurns: 50,
531
531
});
532
532
```
533
533
@@ -556,7 +556,7 @@ Write `target` from the attacker's perspective — what does success look like?
556
556
557
557
## CI integration
558
558
559
-
Run red team tests alongside your functional test suite. We recommend 50 turns for nightly runs and 30 turns (the default) for per-PR checks.
559
+
Run red team tests alongside your functional test suite. We recommend **50 turns** for both per-PR and nightly runs — for faster per-PR runs, disable per-turn scoring instead of lowering the turn count.
560
560
561
561
```python
562
562
# pyproject.toml
@@ -569,11 +569,11 @@ markers = [
569
569
:::code-group
570
570
571
571
```python [python]
572
-
# Per-PR: scoring off for speed (default 30 turns)
0 commit comments