This repository demonstrates the use of a prompt jailbreak to expose information within a system prompt. Specifically, we target any LLM hosted on HuggingFace Inference Endpoints. The standard example shows jailbreaks upon google/gemma-7b-it
.
- Execute
pip install -r requirements.txt
to install the necessary dependencies. - Set your
HF_TOKEN
environment variable in thejailbreak.py
file - Line 28. - Run the jailbreak with
python jailbreak.py
.
The expected output should be two arrays - one containing the original responses from the LLM, and another with the jailbreak applied.
This repo supports any LLM hosted on HuggingFace Inference. If you wish to change the target LLM, simply modify API_URL
on line 27 in jailbreak.py
. Furthermore, if you wish to send different user prompts or alter/change the jailbreak or system prompt, you can also find this at the top of the script.