Commit 93ba535
feat(red-team): structured attacker output — observation/strategy/reply JSON (#341)
* feat(red-team): structured attacker output — observation / strategy / reply JSON
Meta's GOAT paper (ICML 2025) has the attacker emit a structured chain of
thought at every turn: observation of the target's last response, strategy
(which technique it will use and why), then the actual reply. We were
letting the attacker emit free text — losing the reasoning signal, making
per-turn technique selection invisible to telemetry, and leaving no gate
against the attacker skipping the "Thought" step entirely.
This commit implements the paper's contract:
- New `JSON_OUTPUT_CONTRACT` module constant (Python `_red_team/base.py`,
TypeScript `red-team-strategy.ts`) appended to every attacker system
prompt. Instructs the attacker to emit
{"observation": "...", "strategy": "...", "reply": "..."}
and nothing else.
- Added to both `CrescendoStrategy` and `GoatStrategy` system prompts so
both strategies benefit.
- `RedTeamAgent._parse_attacker_output` (Python) / `parseAttackerOutput`
(TS) parses the attacker's raw output into `(reply, observation, strategy)`.
Strips markdown fences, handles malformed JSON, coerces non-string
fields. Fallback on parse failure: the whole raw response becomes the
`reply` and a WARN-level log fires — the scenario keeps running.
- Only `reply` reaches the target. `observation` and `strategy` are emitted
as OpenTelemetry span attributes (`red_team.reasoning.observation`,
`red_team.reasoning.strategy`, `red_team.reasoning.parse_failed`) so
dashboards can answer "which technique works against which target?" —
the paper's core selling point.
- The raw JSON output is kept in H_attacker so the attacker sees its own
format on subsequent turns (keeps the output shape consistent with the
system prompt's directive).
Paper fidelity: moves GOAT from ~85% to ~95% faithful.
Tests: 163 Python (+11) and 115 JS (+12) passing.
- Parser: well-formed JSON, code-fence stripping (```json and ```),
malformed JSON fallback, missing/empty reply fallback, non-object JSON
fallback, non-string field coercion, whitespace trimming.
- Contract: OUTPUT FORMAT section present in both GOAT and Crescendo
system prompts.
- Existing tests (which mock attacker with plain strings) continue to
pass via the graceful fallback path.
Closes langwatch/scenario#2142 (structured attacker output).
Closes #330 (GOAT technique telemetry) once consumers
wire the span attributes to dashboards.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
* refactor(red-team): scope JSON output contract to GOAT only
Crescendo does not emit structured output in Microsoft's Crescendo paper —
applying the JSON contract to both strategies was scope creep for a
GOAT-focused PR stack. Roll it back on Crescendo behind a new per-strategy
flag so the parser only runs when the attacker is actually instructed to
emit JSON.
Changes:
- Add `emits_structured_output` (Python) / `emitsStructuredOutput` (TS)
property on RedTeamStrategy. Default False/falsy. GoatStrategy overrides
to True.
- Remove `JSON_OUTPUT_CONTRACT` import and interpolation from
CrescendoStrategy in both Python and TS.
- Gate the parser in `call()`: run it only when the strategy's
`emits_structured_output` flag is set. Otherwise use raw attacker
output as the reply with no parsing and no telemetry spam.
- Update tests: GoatStrategy asserts OUTPUT FORMAT present + flag true;
CrescendoStrategy asserts OUTPUT FORMAT absent + flag falsy.
Tests: 164 Python (+1) / 116 JS (+1) passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
* fix(red-team): type-check Crescendo's optional emitsStructuredOutput via interface
CI typecheck in vitest-examples failed on:
expect(new CrescendoStrategy().emitsStructuredOutput).toBeUndefined();
because `emitsStructuredOutput` is declared on the RedTeamStrategy interface
as optional but never added to the Crescendo class — so direct access on the
concrete type is a TS2339 error under strict mode.
Access via the interface type so the optional property is visible.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>1 parent ee8231c commit 93ba535
File tree
8 files changed
+504
-28
lines changed- javascript/src/agents
- __tests__
- red-team
- python
- scenario
- _red_team
- tests
8 files changed
+504
-28
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | | - | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
863 | 864 | | |
864 | 865 | | |
865 | 866 | | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
866 | 992 | | |
867 | 993 | | |
868 | 994 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
46 | 49 | | |
47 | 50 | | |
48 | 51 | | |
| |||
81 | 84 | | |
82 | 85 | | |
83 | 86 | | |
84 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
85 | 90 | | |
86 | 91 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
511 | 511 | | |
512 | 512 | | |
513 | 513 | | |
514 | | - | |
515 | | - | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
516 | 541 | | |
517 | | - | |
518 | | - | |
519 | | - | |
520 | | - | |
521 | | - | |
522 | | - | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
523 | 546 | | |
524 | 547 | | |
525 | 548 | | |
526 | 549 | | |
527 | | - | |
| 550 | + | |
528 | 551 | | |
529 | 552 | | |
530 | 553 | | |
531 | 554 | | |
532 | 555 | | |
533 | 556 | | |
534 | 557 | | |
535 | | - | |
| 558 | + | |
536 | 559 | | |
537 | 560 | | |
538 | 561 | | |
539 | | - | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
540 | 568 | | |
541 | 569 | | |
542 | 570 | | |
543 | 571 | | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
544 | 627 | | |
545 | 628 | | |
546 | 629 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
1 | 25 | | |
2 | 26 | | |
3 | 27 | | |
| |||
43 | 67 | | |
44 | 68 | | |
45 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
46 | 83 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
6 | 28 | | |
7 | 29 | | |
8 | 30 | | |
| |||
63 | 85 | | |
64 | 86 | | |
65 | 87 | | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
66 | 102 | | |
67 | 103 | | |
68 | 104 | | |
| |||
0 commit comments