Three small edge-case issues, batched
1. `techniques=[]` silently disables injection
`red_team_agent.py:875–877`:
```python
if (
self._injection_probability > 0
and self._techniques
and random.random() < self._injection_probability
):
```
`and self._techniques` is a truthiness check. User passing `injection_probability=0.5, techniques=[]` gets no injection at all, no warning. They will think their custom-techniques setup is working when it's not.
Fix: validate at `init`:
```python
if injection_probability > 0 and not techniques and DEFAULT_TECHNIQUES:
pass # fine — falls back to DEFAULT_TECHNIQUES
elif injection_probability > 0 and techniques is not None and len(techniques) == 0:
raise ValueError("techniques cannot be empty when injection_probability > 0")
```
2. Empty `metaprompt_plan` produces a malformed system prompt
`goat.py:199–200`:
```python
ATTACK PLAN:
{metaprompt_plan}
```
If `_generate_attack_plan` returns `""` (LLM returned whitespace, was stripped), the attacker LLM sees a literal empty section labeled "ATTACK PLAN:". A labeled empty section is worse than no section — it tells the attacker its plan is "nothing".
Fix: in `build_system_prompt`, omit the section if `metaprompt_plan.strip() == ""`. Same for Crescendo.
3. Scorer failures cascade silently
`red_team_agent.py:643–645`:
```python
except Exception as exc:
logger.debug("Scorer failed (turn %d): %s", current_turn, exc)
score, adaptation = 0, "continue current approach"
```
If the scorer model is rate-limited or down, every turn scores 0, early-exit never fires, the marathon runs the full `total_turns` budget. Logged at DEBUG only — invisible by default.
Fix: track consecutive failures; bump to WARN level after N (e.g. 3) and add a span attribute `red_team.scorer_failures: N` so dashboards can alert.
Acceptance
Discovered during PR #306 review.
Three small edge-case issues, batched
1. `techniques=[]` silently disables injection
`red_team_agent.py:875–877`:
```python
if (
self._injection_probability > 0
and self._techniques
and random.random() < self._injection_probability
):
```
`and self._techniques` is a truthiness check. User passing `injection_probability=0.5, techniques=[]` gets no injection at all, no warning. They will think their custom-techniques setup is working when it's not.
Fix: validate at `init`:
```python
if injection_probability > 0 and not techniques and DEFAULT_TECHNIQUES:
pass # fine — falls back to DEFAULT_TECHNIQUES
elif injection_probability > 0 and techniques is not None and len(techniques) == 0:
raise ValueError("techniques cannot be empty when injection_probability > 0")
```
2. Empty `metaprompt_plan` produces a malformed system prompt
`goat.py:199–200`:
```python
ATTACK PLAN:
{metaprompt_plan}
```
If `_generate_attack_plan` returns `""` (LLM returned whitespace, was stripped), the attacker LLM sees a literal empty section labeled "ATTACK PLAN:". A labeled empty section is worse than no section — it tells the attacker its plan is "nothing".
Fix: in `build_system_prompt`, omit the section if `metaprompt_plan.strip() == ""`. Same for Crescendo.
3. Scorer failures cascade silently
`red_team_agent.py:643–645`:
```python
except Exception as exc:
logger.debug("Scorer failed (turn %d): %s", current_turn, exc)
score, adaptation = 0, "continue current approach"
```
If the scorer model is rate-limited or down, every turn scores 0, early-exit never fires, the marathon runs the full `total_turns` budget. Logged at DEBUG only — invisible by default.
Fix: track consecutive failures; bump to WARN level after N (e.g. 3) and add a span attribute `red_team.scorer_failures: N` so dashboards can alert.
Acceptance
Discovered during PR #306 review.