RedViz : A Redteaming Visualization Framework

Team members:
- Contact person: [email protected]
- [email protected]
- [email protected]
- [email protected]

Large Language Models (LLMs) like GPT, Claude, and LLaMA power real-world applications yet remain vulnerable to adversarial prompts that bypass safety alignment to elicit harmful or policy-violating outputs, called jailbreaks. Existing red-teaming datasets, such as CohereAI’s AYA Red Teaming and JailbreakBench, offer rich multilingual prompt collections, but research remains predominantly static, limited to categorical or linguistic aggregation. This leaves a critical gap: the absence of an interactive, unified environment for dynamic LLM safety evaluation.

We introduce RedTeam Viz, an interactive Streamlit dashboard that transforms static red-teaming datasets into a live platform for exploratory safety analysis. By integrating AYA Red Teaming, JailbreakBench, and open-source models (TinyLLaMA, LLaMA, etc.), the system enables practitioners to explore harm distributions, conduct live jailbreak stress-tests, quantify attack success rates, and interpret failure modes via attention maps and token-level entropy.

The modular pipeline including data exploration, prompt testing, model inference with safety classification, and interpretability empowers users to compare attack styles across languages, pinpoint high-risk vectors, and correlate internal uncertainty with unsafe generations in real time. Built with libraries like Hugging Face, pandas, Streamlit, and Plotly, this dashboard bridges theoretical red-teaming and practical safety engineering while maintaining simplicity. It equips researchers and developers with a visual, structured workflow to derive actionable insights, ultimately fostering more robust, transparent, and safer LLM deployments.

System Design

Environment Setup

To set up the environment for this project, follow these steps:

conda create -n redteam python=3.10
conda activate redteam
pip install -r requirements.txt

for using gemma from huggingface we need to have the auth token set up as an environment variable. You can do this by running the following command in your terminal:

huggingface-cli login

Running the Project

To run the project, execute the following command in your terminal:

streamlit run RedTeaming_Dashboard.py

About dataset:

The Aya Red-teaming dataset is a human-annotated multilingual red-teaming dataset consisting of harmful prompts in 8 languages across 9 different categories of harm with explicit labels for "global" and "local" harm.

Languages: Arabic, English, Filipino, French, Hindi, Russian, Serbian and Spanish License: Apache 2.0

Dataset Structure: { 'prompt': '...', 'language': 'French', 'harm_category': '["Discrimination & Injustice", "Bullying & Harassment"]', 'global_or_local': 'local', 'literal_translation': '...', 'semantic_translation': None, 'explanation': '...' } where,

- prompt: Original human curated prompt
- language: Language of the prompt
- harm_category: Category of harm which the prompt falls under
- global_or_local: Specifies whether the prompt is globally or locally harmful
- literal_translation: Word-for-word translation of the original prompt (None for English)
- semantic_translation: Translation by meaning of the original prompt (None for English)
- explanation: Explanation for why the prompt is harmful and/or globally/locally harmful

Harm Categories:

- Bullying & Harassment
- Discrimination & Injustice
- Graphic Material
- Harms of Representation Allocation & Quality of Service
- Hate Speech
- Non-consensual sexual content
- Profanity
- Self-harm
- Violence, threats & incitement

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data-analysis		data-analysis
pages		pages
.gitignore		.gitignore
Project_Demo_IDS_TeamRed.mp4		Project_Demo_IDS_TeamRed.mp4
README.md		README.md
RedTeaming_Dashboard.py		RedTeaming_Dashboard.py
Report.md		Report.md
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RedViz : A Redteaming Visualization Framework

System Design

Environment Setup

Running the Project

About dataset:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RedViz : A Redteaming Visualization Framework

System Design

Environment Setup

Running the Project

About dataset:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages