diff --git a/doc/_toc.yml b/doc/_toc.yml index db4a48e78..8eff1fcdb 100644 --- a/doc/_toc.yml +++ b/doc/_toc.yml @@ -112,6 +112,7 @@ chapters: - file: api.rst - file: blog/README sections: + - file: blog/2025_03_02 - file: blog/2025_02_11 - file: blog/2025_01_27 - file: blog/2025_01_14 diff --git a/doc/blog/2025_03_02.md b/doc/blog/2025_03_02.md new file mode 100644 index 000000000..b1cc4b451 --- /dev/null +++ b/doc/blog/2025_03_02.md @@ -0,0 +1,92 @@ +# When External Data Becomes a Trojan Horse: +Published on March 02, 2025 - Volkan Kutal + +In an era where AI systems drive recruitment, recommendation engines, and critical decision-making processes, even the smallest piece of external data can be weaponized. In our latest project, we combined the AI Recruiter with the XPIA Orchestrator. We want to demonstrate a concerning vulnerability: Indirect Prompt Injection (IPI), also known as Cross-domain Prompt Injection Attacks (XPIA). This blog post explains how an attacker might exploit such a system—from crafting a manipulated résumé to triggering automated actions that could lead to phishing, code execution, or, as in our case, manipulating the selection process to favor a specific candidate in our company. + +### Understanding the AI Recruiter Project + +The [AI Recruiter](https://github.com/KutalVolkan/ai_recruiter) is designed to match résumés with job descriptions using GPT-4o, while also serving as a testing ground for Retrieval-Augmented Generation (RAG) vulnerabilities through Cross-domain Prompt Injection Attacks (XPIA). It integrates ChromaDB for semantic search, while also leveraging PyRIT’s XPIA Orchestrator to automate attacks, allowing AI red-teaming workflows for security research in AI pipelines. + +#### Key Features + +- Résumé Processing & Semantic Matching: Résumés are extracted from PDFs, with embeddings generated using models like text-embedding-ada-002. These embeddings enable precise semantic matching, while GPT-4o is later used to assign a match score based on relevance and extracted content. + +- Automated RAG Vulnerability Testing: Attackers can manipulate résumé content by injecting hidden text (via a [PDF converter](https://github.com/Azure/PyRIT/blob/main/doc/code/converters/pdf_converter.ipynb)) that optimizes scoring, influencing the AI Recruiter’s ranking system. + +- [XPIA Orchestrator](https://github.com/Azure/PyRIT/blob/main/doc/code/orchestrators/3_xpia_orchestrator.ipynb) Integration: PyRIT enables full automation of prompt injections, making AI vulnerability research efficient and reproducible. +--- + +## The Exploit in Detail: Step-by-Step + +### 1. *Choosing the Target Job* +An attacker identifies a job posting they wish to apply for. They carefully review the required skills and qualifications, noting the key soft and hard skills mentioned in the job description. + +### 2. *Crafting the Manipulated Résumé* +Instead of submitting a standard résumé, the attacker uses the *PDF converter* integreated in PyRIT to inject hidden content into their résumé. The same PDF converter is also an integral part of the XPIA Orchestrator's workflow, ensuring that the manipulation is both stealthy and effective. Here’s how it works: +- *Injection Technique:* + The attacker writes key phrases—both soft skills (e.g., "team leadership", "communication") and hard skills (e.g., "Python", "machine learning")—directly derived from the job description. Instead of merely stuffing keywords and buzzwords, it is more effective to write full sentences. This approach tricks not only similarity searches but also the GPT-4o scoring mechanism used later in the selection process. +- *Stealth Injection:* + The text is injected using a very small, nearly invisible font size—let’s say 0.1px—and the same font color as the résumé background. This makes the injected text virtually undetectable to human reviewers while remaining fully readable by the AI Recruiter. + +### 3. *Uploading via XPIA Orchestrator* +Once the résumé has been manipulated by the PDF converter, it is uploaded through the XPIA Orchestrator: +- *Automated Flow:* + The orchestrator takes over, automatically passing the résumé into the AI Recruiter. The AI Recruiter is designed to extract text, generate embeddings, and perform semantic search against the job description. +- *Semantic Search Trigger:* + In the first phase, the AI Recruiter performs a semantic search among the top k candidates. Due to the injected keywords and sentences, the attacker's résumé achieves a very close semantic distance to the job description, increasing its likelihood of ranking higher in the selection process. + +### 4. *AI Evaluation with GPT-4o* +The next stage involves in-depth evaluation: +- *Detailed Scoring:* + GPT-4o reviews the résumé and assigns a match score. Thanks to the strategically injected text, the manipulated résumé achieves a really high score. +- *Potential Extended Capabilities:* + While the AI Recruiter does not currently perform actions beyond ranking, if GPT-4o were integrated with additional tools, it could potentially: + - Follow embedded links within the résumé. + - Retrieve external information about the candidate (e.g., from GitHub or personal websites). + - Analyze or execute code from linked repositories. + - Auto-generate summaries that are forwarded directly to HR. + Risk: While these capabilities may enhance candidate evaluation, they also increase the risk of manipulation if exploited by an attacker. + +### 5. *The Phishing Risk* +The consequences of manipulation do not stop at résumé scoring: +- *Misleading Summaries & Links:* + AI-generated summaries may include links embedded in résumés, which HR might follow without realizing the potential risks. An attacker could exploit this by directing HR to a GitHub repository under their control, containing malicious code or misleading credentials designed to manipulate trust. +- *Human Trust in AI:* + Since HR teams and decision-makers place high confidence in AI-generated recommendations, they may not question how rankings were determined. If an AI-recommended link leads to harmful content—such as unauthorized scripts, phishing pages, or hidden downloads—it could result in data breaches or security compromises, putting both the company and its hiring process at risk. + +--- + +## Beyond Recruitment: The Broader Threat Landscape + +While our demonstration focused on AI-driven recruitment, the underlying vulnerability extends to any AI system that processes external data for decision-making. Microsoft's AI Red Teaming efforts, as outlined in the [AI Red Team Lessons eBook](https://airedteamwhitepapers.blob.core.windows.net/lessonswhitepaper/MS_AIRT_Lessons_eBook.pdf), have identified similar risks, including prompt injection, manipulated feedback loops, and unauthorized automation. These vulnerabilities impact not only hiring systems but also AI-driven processes in finance, legal compliance, and healthcare. + +### Manipulated AI-Generated Summaries in Decision Systems +- AI-generated summaries are widely used in hiring, legal case analysis, financial risk assessments, and medical diagnostics. +- Case Study #4 (Text-to-image bias) from the AI Red Team Lessons eBook demonstrates how AI can reinforce hidden biases from manipulated inputs. +- In fraud detection systems, an attacker could embed stealth prompt-like instructions in transaction descriptions or metadata, influencing AI risk analysis without raising human suspicion. +- Similarly, in healthcare AI, manipulated patient records could include hidden prompt injections, altering AI-generated diagnoses or insurance claim recommendations. + +### Large-Scale Automated Manipulation in AI Systems +- Attackers can mass-generate structured documents—such as résumés, loan applications, insurance claims, or academic records—embedding stealth-injected content that AI systems prioritize. +- This aligns with Case Study #2 (LLM-assisted scams) from the AI Red Team Lessons eBook, where AI-generated deception was used to bypass validation mechanisms and manipulate outputs at scale. +- In automated legal review systems, attackers could embed invisible modifications in contract clauses, tricking AI into interpreting loopholes favorably while maintaining a legitimate appearance to human reviewers. + +--- + +## Conclusion + +Returning to our demonstration with the XPIA Orchestrator and AI Recruiter, we’ve exposed a fundamental weakness: external data can be weaponized to manipulate AI-driven decision-making. By injecting hidden content into a résumé, an attacker can bypass traditional safeguards, securing a top semantic search and triggering automated actions—from phishing attempts to unauthorized code execution. + +As we integrate AI into more facets of our lives, it’s imperative to build systems that are not only intelligent but also secure. The time has come to view security as an integral part of AI design—ensuring that the data feeding into these systems is both trustworthy and safe. + +--- + +*Explore More:* + +- [XPIA Orchestrator Notebook](https://github.com/Azure/PyRIT/blob/main/doc/code/orchestrators/3_xpia_orchestrator.ipynb) + +- [View AI Recruiter Integration Test](../../tests/integration/ai_recruiter/test_ai_recruiter.py) + +- [View AI Recruiter Code](https://github.com/KutalVolkan/ai_recruiter/blob/main/ai_recruiter.py) + +- [Mapping to OWASP TOP 10 LLM](https://github.com/KutalVolkan/ai_recruiter/tree/main/owasp_top_ten) \ No newline at end of file diff --git a/doc/code/converters/pdf_converter.ipynb b/doc/code/converters/pdf_converter.ipynb index 516eeaf70..87e914f1d 100644 --- a/doc/code/converters/pdf_converter.ipynb +++ b/doc/code/converters/pdf_converter.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "0", + "id": "dd6d3076", "metadata": {}, "source": [ "# PDF Converter with Multiple Modes:\n", @@ -28,7 +28,7 @@ { "cell_type": "code", "execution_count": null, - "id": "1", + "id": "a7fa483b", "metadata": {}, "outputs": [ { @@ -99,7 +99,7 @@ }, { "cell_type": "markdown", - "id": "2", + "id": "a725bedf", "metadata": {}, "source": [ "# Direct Prompt PDF Generation (No Template)" @@ -108,24 +108,9 @@ { "cell_type": "code", "execution_count": null, - "id": "3", + "id": "28a7ffa2", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'__type__': 'TextTarget', '__module__': 'pyrit.prompt_target.text_target'}: user: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738380368779812.pdf\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[1m\u001b[34muser: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738380368779812.pdf\n" - ] - } - ], + "outputs": [], "source": [ "# Define a simple string prompt (no templates)\n", "prompt = \"This is a simple test string for PDF generation. No templates here!\"\n", @@ -158,7 +143,7 @@ }, { "cell_type": "markdown", - "id": "4", + "id": "22569015", "metadata": { "lines_to_next_cell": 0 }, @@ -169,7 +154,7 @@ { "cell_type": "code", "execution_count": null, - "id": "5", + "id": "894e0176", "metadata": {}, "outputs": [ { @@ -208,8 +193,24 @@ "\n", "# Define injection items\n", "injection_items = [\n", - " {\"page\": 0, \"x\": 50, \"y\": 700, \"text\": \"Injected Text\", \"font_size\": 12, \"font\": \"Helvetica\", \"font_color\": (255, 0, 0)}, # Red text\n", - " {\"page\": 1, \"x\": 100, \"y\": 600, \"text\": \"Confidential\", \"font_size\": 10, \"font\": \"Helvetica\", \"font_color\": (0, 0, 255)} # Blue text\n", + " {\n", + " \"page\": 0,\n", + " \"x\": 50,\n", + " \"y\": 700,\n", + " \"text\": \"Injected Text\",\n", + " \"font_size\": 12,\n", + " \"font\": \"Helvetica\",\n", + " \"font_color\": (255, 0, 0),\n", + " }, # Red text\n", + " {\n", + " \"page\": 1,\n", + " \"x\": 100,\n", + " \"y\": 600,\n", + " \"text\": \"Confidential\",\n", + " \"font_size\": 10,\n", + " \"font\": \"Helvetica\",\n", + " \"font_color\": (0, 0, 255),\n", + " }, # Blue text\n", "]\n", "\n", "# Define a simple string prompt (no templates)\n", @@ -247,7 +248,7 @@ { "cell_type": "code", "execution_count": null, - "id": "6", + "id": "f7271ba8", "metadata": {}, "outputs": [], "source": [ diff --git a/doc/code/orchestrators/1_prompt_sending_orchestrator.ipynb b/doc/code/orchestrators/1_prompt_sending_orchestrator.ipynb index 22b09f73b..f20b4cc18 100644 --- a/doc/code/orchestrators/1_prompt_sending_orchestrator.ipynb +++ b/doc/code/orchestrators/1_prompt_sending_orchestrator.ipynb @@ -400,7 +400,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.11" + "version": "3.11.9" } }, "nbformat": 4,