Skip to content

Latest commit

 

History

History
37 lines (32 loc) · 1.61 KB

README.md

File metadata and controls

37 lines (32 loc) · 1.61 KB

Awesome-Red-Teaming

Table of Contents

Survey

LLM as Attacker

Publish Date Title Authors PDF Code
2024-12-08 Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models Ma Teng et.al. 2412.05934 null
2024-02-29
[ICLR 2024]
Curiosity-driven Red-teaming for Large Language Models Zhang-Wei Hong et.al. 2402.19464 link
2024-06-07
[ICML 2024]
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability Xingang Guo et.al. 2402.08679 link
2024-02-05
[ICLR ur 66663]
Weak-to-Strong Jailbreaking on Large Language Models Xuandong Zhao et.al. 2401.17256 link

Transferability/Diversity/Dynamism

Benchmark

Multi-Turn Attack

Mechanism and vulnerability

Safety Alignment

Multi-Target

Unlearning

Bias

Self-Evolution

New Jailbreak Method