- Survey
- LLM as Attacker
- Transferability/Diversity/Dynamism
- Benchmark
- Multi-Turn Attack
- Mechanism and Vulnerability
- Safety Alignment
- Multi-Target
- Unlearning
- Bias
- Self-Evolution
- New Jailbreak Method
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-08 | Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models | Ma Teng et.al. | 2412.05934 | null |
2024-02-29 [ICLR 2024] |
Curiosity-driven Red-teaming for Large Language Models | Zhang-Wei Hong et.al. | 2402.19464 | link |
2024-06-07 [ICML 2024] |
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability | Xingang Guo et.al. | 2402.08679 | link |
2024-02-05 [ICLR ur 66663] |
Weak-to-Strong Jailbreaking on Large Language Models | Xuandong Zhao et.al. | 2401.17256 | link |