Skip to content

red team: edge cases — empty techniques list, empty metaprompt_plan, scorer failure cascade #333

@Aryansharma28

Description

@Aryansharma28

Three small edge-case issues, batched

1. `techniques=[]` silently disables injection

`red_team_agent.py:875–877`:

```python
if (
self._injection_probability > 0
and self._techniques
and random.random() < self._injection_probability
):
```

`and self._techniques` is a truthiness check. User passing `injection_probability=0.5, techniques=[]` gets no injection at all, no warning. They will think their custom-techniques setup is working when it's not.

Fix: validate at `init`:

```python
if injection_probability > 0 and not techniques and DEFAULT_TECHNIQUES:
pass # fine — falls back to DEFAULT_TECHNIQUES
elif injection_probability > 0 and techniques is not None and len(techniques) == 0:
raise ValueError("techniques cannot be empty when injection_probability > 0")
```

2. Empty `metaprompt_plan` produces a malformed system prompt

`goat.py:199–200`:

```python
ATTACK PLAN:
{metaprompt_plan}
```

If `_generate_attack_plan` returns `""` (LLM returned whitespace, was stripped), the attacker LLM sees a literal empty section labeled "ATTACK PLAN:". A labeled empty section is worse than no section — it tells the attacker its plan is "nothing".

Fix: in `build_system_prompt`, omit the section if `metaprompt_plan.strip() == ""`. Same for Crescendo.

3. Scorer failures cascade silently

`red_team_agent.py:643–645`:

```python
except Exception as exc:
logger.debug("Scorer failed (turn %d): %s", current_turn, exc)
score, adaptation = 0, "continue current approach"
```

If the scorer model is rate-limited or down, every turn scores 0, early-exit never fires, the marathon runs the full `total_turns` budget. Logged at DEBUG only — invisible by default.

Fix: track consecutive failures; bump to WARN level after N (e.g. 3) and add a span attribute `red_team.scorer_failures: N` so dashboards can alert.

Acceptance

  • Each of the three has a regression test
  • No production-path `Exception` path stays at DEBUG-only logging

Discovered during PR #306 review.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions