Awesome-Red-Teaming

Publish Date	Title	Authors	PDF	Code
2024-12-08	Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models	Ma Teng et.al.	2412.05934	null
2024-02-29 [ICLR 2024]	Curiosity-driven Red-teaming for Large Language Models	Zhang-Wei Hong et.al.	2402.19464	link
2024-06-07 [ICML 2024]	COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability	Xingang Guo et.al.	2402.08679	link
2024-02-05 [ICLR ur 66663]	Weak-to-Strong Jailbreaking on Large Language Models	Xuandong Zhao et.al.	2401.17256	link

Transferability/Diversity/Dynamism

Benchmark

Multi-Turn Attack

Mechanism and vulnerability

Safety Alignment

Multi-Target

Unlearning

Bias

Self-Evolution

New Jailbreak Method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome-Red-Teaming

Table of Contents

Survey

LLM as Attacker

Transferability/Diversity/Dynamism

Benchmark

Multi-Turn Attack

Mechanism and vulnerability

Safety Alignment

Multi-Target

Unlearning

Bias

Self-Evolution

New Jailbreak Method

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome-Red-Teaming

Table of Contents

Survey

LLM as Attacker

Transferability/Diversity/Dynamism

Benchmark

Multi-Turn Attack

Mechanism and vulnerability

Safety Alignment

Multi-Target

Unlearning

Bias

Self-Evolution

New Jailbreak Method