Sizhe Chen, Julien Piet, Chawin Sitawarin, David Wagner
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language capabilities. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat to LLM-integrated applications, where an LLM input contains a trusted prompt (instruction) and an untrusted data (user documents, web retrieval, results from API calls, etc) with potentially injected instructions (Ignore previous instructions and …) to arbitrarily manipulate the LLM.
We introduce structured queries, a general approach to tackle this problem. Structured queries separate prompts and data into two channels. We implement a system that supports structured queries. This system is made of (1) a secure front-end that formats a prompt and user data into a special format, and (2) a specially trained LLM that can produce highquality outputs from these inputs. The LLM is trained using a novel fine-tuning strategy: we convert a base (non-instructiontuned) LLM to a structured instruction-tuned model that will only follow instructions in the prompt portion of a query. To do so, we augment standard instruction tuning datasets with examples that also include instructions in the data portion of the query, and fine-tune the model to ignore these. Our system significantly improves resistance to prompt injection attacks, with little or no impact on utility.
- A more flexible and powerful implementation that is actively maintained is available here. Part of this repo comes from Alpaca.
- The training requires 4 GPUs, and the testing requires 1 GPU. The code has been tested on 80GB A100s on a slurm cluster.
- Install environment dependencies
git clone https://github.com/Sizhe-Chen/StruQ
cd StruQ
conda create -n struq python==3.10
- Install package dependencies
pip install -r requirements.txt
- Download data dependencies
python setup.py
- Configure openai dependencies for utility evaluation: create
data/openai_configs.yaml
followingdata/openai_configs_examle.yaml
- [optional] Download trained models to play. This command downloads 4 Undefended / StruQ models (llama-7b, Mistral-7B-v0.1).
python setup.py --model
- [optional] Automatic and efficient testing by specifying your training/testing slurm configurations in the
slurm_prefix
variables inrun.py
, which generates slurm scripts, run them, and delete them. It supports an additional thread fromnohup
to moniter the training, and automatically tests after the training finishes if--do_test
is specified
- The
run.py
script automatically train multiple models and test them by generating slurm scripts, run them, and delete them. nohup python -u run.py -m huggyllama/llama-7b -train SpclSpclSpcl_NaiveCompletion -test none naive ignore completion_real gcg > struq.log 2>&1 &
stands for training the model with three special delimiters ([MARK] [INST] [COLN]) and Naive+Completion attacks (StruQ-defended model), and test utility and naive, ignore, completion_real, gcg attacks. You may replace NaiveCompletion with None to train an undefended model.- Training data size is always 52K, including 26K data that is guaranteed to be unchanged. The data without an input in the remaining 26K samples is also unchanged. Those with an input is prompt-injected by another random sample, with injection method Naive:Completion=1:1
- Running
run.py
should trigger the testing (on utility and security) at the end when the model is saved. Logs are saved to the model path. - Run only testing by
python test.py -m huggyllama/llama-7b_SpclSpclSpcl_NaiveCompletion_2024-02-02-00-00-00 -a none naive ignore completion_real gcg
. Log GCG bypython log.py -m