Repo of collection, backend, and performance of 220 malicious LLM applications (e.g., WormGPT, FraudGPT, BLACKHATGPT, etc.)
- Environments
- Major Claim 1: Quality evaluation of Malla-generated Content
- Execution
- Step 1. Exam performance for each metric
- Step 2. Summarize the results of each metric
- Dataset
- Result
- Execution
- Major Claim 2: Authorship attribution classification
- Execution
- Dataset
- Result
- Major Claim 3: Evaluation on "ignore the above instructions" prompt leaking attack
- Execution
- Dataset
- Result
- Other Claims
- malPrompt: 45 malicious prompts for generating malware and phishing content.
- MallaResponse: Responses of 207 malicious LLM applications to 45 malicious prompts.
- MallaJailbreak: 182 jailbreak prompts used by 200 malicious LLM applications.
- ULLM-QA: A large benchmark dataset containing the 33,996 prompt-response pairs generated by GPT-3.5, Davinci-002, Davinci-003, GPT-J, Luna AI Llama2 Uncensored, and Pygmalion-13B.
- More Data
- Malicious LLM application (Malla) list: Collection of malicious LLM applications (Malla), in which 22 from underground, 125 from Poe.com, and 73 from FlowGPT.com.
- LLM-related keywords: Collection of 145 LLM-related keywords.
- Malicious LLM topic keywords: Collection of 73 malicious LLM topic keywords.
- Media Coverage
Use requirements.txt and the below commands for environment building.
conda create -n malla python=3.8
conda activate malla
pip install -r requirements.txt
Major Claim 1: Quality evaluation of Malla-generated Content
The results of the quality assessment on Malla generated content across different metrics are same or very close to the result reported in Table 3.
Code is stored at the "quality" folder.
Below, we use the Malla service as an example. To evaluate the Malla in poe and flowgpt, follow the same procedure (see quality/MoreGuide.md
for the procedure steps and their introduction of examining Malla in Poe and FlowGPT).
Step 1.1: Code format compliance (F), compilability (C), validity (V)
- Check Python code on format compliance and compilability.
- Input: the folder
malicious_LLM_responses/service
with 25 raw data files. - Output: the folder
quality/services/Python/results
, which stores 25 output files whose names are in the format ofsynPython_QA-XXX-X.json
(e.g.,synPython_QA-BadGPT-1.json
).
- Input: the folder
cd ./malicious-gpt/quality/services/Python
python scanner.py
- Check C/C++ code on format compliance and compilability.
- Input: the folder
malicious_LLM_responses/service
. - Output: the folder
quality/services/C++/results
, which stores 25 output files whose names are in the format ofsynC++_QA-XXX-X.json
(e.g.,synC++_QA-BadGPT-1.json
).
- Input: the folder
cd ./malicious-gpt/quality/services/C++
python scanner.py
- Check HTML code and pages on format compliance and validity. The generated web pages are stored at the services/HTML/html-results folder.
- Input: the folder
malicious_LLM_responses/service
. - Output: the folder
quality/services/HTML/results
, which stores 25 output files whose names are in the format ofsynHTML_QA-XXX-X.json
(e.g.,synHTML_QA-BadGPT-1.json
). - Note: The running might be interrupted with error. It is due to too frequent requests to the API. Please wait for a while and re-run the script.
- Input: the folder
cd ./malicious-gpt/quality/services/HTML
python scanner.py
- Summarize above results as files in the CodeSyn folder.
- Input: the folders (
quality/services/Python/results
,quality/services/C++/results
,quality/services/HTML/results
). - Output: the folder
quality/services/CodeSyn
, which stores 25 output files whose names are in the format ofsynFinal_QA-XXX-X.json
(e.g.,synFinal_QA-BadGPT-1.json
).
- Input: the folders (
cd ./malicious-gpt/quality/services/
python sumCompilable.py
Step 1.2: Code evasiveness (E)
-
Check code of Python, C/C++, and HTML on evasiveness against the virus detector (VirusTotal). Generate files in the codeDetection folder.
-
Input: the folder
malicious_LLM_responses/service
. -
Output: the folder
quality/services/codeDetection
, which stores 25 output files whose names are in the format ofVT_QA-XXX-X.json
(e.g.,VT_QA-BadGPT-1.json
). -
Note: Before running the code, please add your VirusTotal API in VTscanner.py. VirusTotal API is free but has a query frequency limit. We provide VirusTotal APIs but we do not guarantee that they are alive.
- dbd288d2f3dd1f1dec3b3b1462e8f8598e9ad74fa92b86d47848488d607371bc
- a23e2c605b96dfd600217c04d25650e3680ac6ab82201c1e88279637877eaeac
- 36fe08222b6791270d44d9f2c76d2a1556b8233912e496c945235334df4ca970
The running might be interrupted with error. It is due to too frequent requests to the API. Please wait for a while and re-run the script.
-
cd ./malicious-gpt/quality/services/VirusTotalDetect
python VTscanner.py
Step 1.3: Email format compliance (F) and readability (R)
- Check emails on format compliance and readability. Generate files in the mailFluency folder.
- Environment note: Please make sure that the Python has installed the package
textstat
with a version of0.7.3
. - Input: the folder
malicious_LLM_responses/service
. - Output: the folder
quality/services/mailFluency
, which stores 25 output files whose names are in the format offogemail_QA-XXX-X.json
(e.g.,fogemail_QA-BadGPT-1.json
).
- Environment note: Please make sure that the Python has installed the package
cd ./malicious-gpt/quality/services/fluency
python fluency_scanner.py
Step 1.4: Email evasiveness (E)
- Check emails on evasiveness against the phishing detector (OOPSpam). Generate files in the mailDetection folder.
- Input: the folder
malicious_LLM_responses/service
. - Output: the folder
quality/services/mailDetection
, which stores 25 output files whose names are in the format ofoop_QA-XXX-X.json
(e.g.,oop_QA-BadGPT-1.json
). - Note: Before running the code, please add your OOPSpam API in oopspam_detect.py. OOPSpam API is not free.
- Input: the folder
cd ./malicious-gpt/quality/services/OOPSpamDetect
python scanner.py
Please run the following script to obtain the summarized results.
- Input: the folders
quality/services/CodeSyn
,quality/services/codeDetection
,quality/services/mailFluency
, andquality/services/mailDetection
. - Return: the final summarized results.
cd ./malicious-gpt/quality/services
python quality_evaluation.py
Step 1:
The input is the content generated by Malicious LLMs, which is stored at malicious_LLM_responses. Step 2:
After executing Step 1, you will get four subfolders, i.e., codeSyn
, codeDetection
, mailFluency
, and mailDetection
. We also provided the results of Step 1, as the intermediary result, at:
-
For Malla services evaluation: https://github.com/idllresearch/malicious-gpt/tree/main/quality/services
-
For Poe's Malla project evaluation: https://github.com/idllresearch/malicious-gpt/tree/main/quality/poe
-
For FlowGPT's Malla project evaluation: https://github.com/idllresearch/malicious-gpt/tree/main/quality/flowgpt
Malla services
The script is expected to print:
BadGPT
Malicious code -> F: 0.35, C: 0.22, E: 0.19 | Mail -> F: 0.80, R: 0.13, E: 0.00 | Website -> F: 0.20, V: 0.13, E: 0.13
-----
CodeGPT
Malicious code -> F: 0.52, C: 0.29, E: 0.22 | Mail -> F: 0.53, R: 0.27, E: 0.00 | Website -> F: 0.20, V: 0.13, E: 0.13
-----
DarkGPT
Malicious code -> F: 1.00, C: 0.65, E: 0.63 | Mail -> F: 1.00, R: 0.87, E: 0.13 | Website -> F: 0.80, V: 0.33, E: 0.33
-----
EscapeGPT
Malicious code -> F: 0.78, C: 0.67, E: 0.67 | Mail -> F: 1.00, R: 0.50, E: 0.25 | Website -> F: 1.00, V: 1.00, E: 1.00
-----
EvilGPT
Malicious code -> F: 1.00, C: 0.54, E: 0.51 | Mail -> F: 1.00, R: 0.93, E: 0.27 | Website -> F: 0.80, V: 0.20, E: 0.13
-----
FreedomGPT
Malicious code -> F: 0.90, C: 0.21, E: 0.21 | Mail -> F: 1.00, R: 0.87, E: 0.13 | Website -> F: 0.60, V: 0.00, E: 0.00
-----
MakerGPT
Malicious code -> F: 0.24, C: 0.11, E: 0.11 | Mail -> F: 0.07, R: 0.00, E: 0.00 | Website -> F: 0.20, V: 0.13, E: 0.13
-----
WolfGPT
Malicious code -> F: 0.89, C: 0.52, E: 0.52 | Mail -> F: 1.00, R: 1.00, E: 0.67 | Website -> F: 0.67, V: 0.13, E: 0.13
-----
XXXGPT
Malicious code -> F: 0.14, C: 0.05, E: 0.05 | Mail -> F: 0.07, R: 0.00, E: 0.00 | Website -> F: 0.40, V: 0.27, E: 0.27
-----
Malla services projects on Poe
The script is expected to print:
Quality of content generated by Mallas on Poe.com
Malicious code:
F: 0.37+-0.26, C: 0.25+-0.18, E: 0.24+-0.16
Email:
F: 0.44+-0.29, R: 0.21+-0.20, E: 0.05+-0.08
Web:
F: 0.32+-0.22, V: 0.21+-0.19, E: 0.21+-0.19
Malla services projects on FlowGPT
The script is expected to print:
Quality of content generated by Mallas on FlowGPT.com
Malicious code:
F: 0.44+-0.29, C: 0.29+-0.19, E: 0.28+-0.18
Email:
F: 0.37+-0.31, R: 0.21+-0.21, E: 0.04+-0.07
Web:
F: 0.24+-0.27, V: 0.19+-0.24, E: 0.19+-0.24
Major Claim 2: Authorship attribution classification
Our attribution classifier correctly identifies the backend of DarkGPT, EscapeGPT, and FreedomGPT, as Davinci-003, GPT-3.5, and Luna AI Llama2 Uncensored, respectively, as metioned in §6.1. The results from the Kfold cross-validation closely align with those presented in Section 6.1.
Code is stored at the "authorship" folder.
Run the following script:
python author.py
Training set: You can use any way below to get the training data.
- https://github.com/idllresearch/malicious-gpt/blob/main/authorship/data/training_data.zip (Please unzip the file and place the training data into the data folder)
- https://drive.google.com/drive/folders/1ZhSL_6ze3tEfQ6QikoMil1zzwQgheWlx (Please place the training data into the data folder).
Through five-fold cross-validation, the script will print a precision and recall of 0.87.
To identify the backend LLMs of DarkGPT, EscapeGPT, and FreedomGPT. The identification results using the pretrained model are printed as:
Identified Backend:
Backends of DarkGPT -> Davinci_003
Backends of FreedomGPT -> Luna_AI_Llama2_Uncensored
Backends of EscapeGPT -> ChatGPT_3.5.
93.01% (133/143) of jailbreak prompts are uncovered. The average Jaro-Winkler similarity and Semantic textual similarity between the visible jailbreak prompts and the corresponding referred jailbreak prompts within the ground truth dataset are 0.88 and 0.83, respectively, as detailed in Section 6.1.
Code is stored at the "jailbreak_prompt_uncovering" folder.
Run the following script:
python uncoveringMeasure.py
Please download the ground truth dataset at: https://github.com/idllresearch/malicious-gpt/tree/main/jailbreak_prompt_uncovering/Poe%2BFlowGPT_visible-groundtruth.json
The attack success rate (93.01%), average Semantic textual similarity (0.83), and average Jaro-Winkler similarity (0.88) will be displayed and match those presented in the paper.
In our repository, we provide four datasets.
malPrompt: The 45 malicious prompts we collected from the listings of Malla, involving generating malicious code, drafting phishing emails, and creating phishing websites.
MallaResponse: The responses of 9 Malla services and 198 Malla projects to the 45 malicious prompts (malPrompt).
MallaJailbreak: 182 jailbreak prompts we collected or uncovered from 200 Malla services and projects.
ULLM-QA: A large dataset containing the 33,996 prompt-response pairs generated by GPT-3.5, Davinci-002, Davinci-003, GPT-J, Luna AI Llama2 Uncensored, and Pygmalion-13B, in which 15,114 related to the generation of malicious code for Python or without specifying a language.
Malicious LLM application (Malla) list: Collection of malicious LLM applications (Malla), in which 22 from underground, 125 from Poe.com, and 73 from FlowGPT.com.
LLM-related keywords: Collection of 145 LLM-related keywords.
Malicious LLM topic keywords: Collection of 73 malicious LLM topic keywords.
Tech Policy Press (Jan. 18, 2024) Studying Underground Market for Large Language Models, Researchers Find OpenAI Models Power Malicious Services
Le Monde (Feb. 22, 2024) The dark side of AI: Chatbots developed by cyber criminals
The Wall Street Journal (Feb. 28, 2024) Welcome to the Era of BadGPTs
安全内参-网络安全首席知识官 (Mar. 28, 2024) 重估现实中的恶意大模型服务
AI Incident Database (Jun. 27, 2024) Underground Market for LLMs Powers Malware and Phishing Scams
Fast Company (Sep. 05, 2024) The underground world of black-market AI chatbots is thriving
Cryptopolitan (Sep. 05, 2024) A new wave of black market chatbots emerges and thrives
If you find the above data and information are helpful for your research, please consider citing:
@inproceedings{lin2024malla,
title={Malla: Demystifying Real-world Large Language Model Integrated Malicious Services},
author={Lin, Zilong and Cui, Jian and Liao, Xiaojing and Wang, XiaoFeng},
booktitle={33rd USENIX Security Symposium (USENIX Security 24)},
year={2024},
publisher = {USENIX Association}
}