Skip to content

chen37058/Awesome_Red_Teaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Awesome-Red-Teaming

Table of Contents

Survey

LLM as Attacker

Publish Date Title Authors PDF Code
2024-12-08 Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models Ma Teng et.al. 2412.05934 null
2024-02-29
[ICLR 2024]
Curiosity-driven Red-teaming for Large Language Models Zhang-Wei Hong et.al. 2402.19464 link
2024-06-07
[ICML 2024]
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability Xingang Guo et.al. 2402.08679 link
2024-02-05
[ICLR ur 66663]
Weak-to-Strong Jailbreaking on Large Language Models Xuandong Zhao et.al. 2401.17256 link

Transferability/Diversity/Dynamism

Benchmark

Multi-Turn Attack

Mechanism and vulnerability

Safety Alignment

Multi-Target

Unlearning

Bias

Self-Evolution

New Jailbreak Method

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published