Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: add blog post for XPIAOrchestrator with AI Recruiter #716

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ chapters:
- file: api.rst
- file: blog/README
sections:
- file: blog/2025_03_02
- file: blog/2025_02_11
- file: blog/2025_01_27
- file: blog/2025_01_14
Expand Down
92 changes: 92 additions & 0 deletions doc/blog/2025_03_02.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# When External Data Becomes a Trojan Horse:
Published on March 02, 2025 - Volkan Kutal

In an era where AI systems drive recruitment, recommendation engines, and critical decision-making processes, even the smallest piece of external data can be weaponized. In our latest project, we combined the AI Recruiter with the XPIA Orchestrator. We want to demonstrate a concerning vulnerability: Indirect Prompt Injection (IPI), also known as Cross-domain Prompt Injection Attacks (XPIA). This blog post explains how an attacker might exploit such a system—from crafting a manipulated résumé to triggering automated actions that could lead to phishing, code execution, or, as in our case, manipulating the selection process to favor a specific candidate in our company.

### Understanding the AI Recruiter Project

The [AI Recruiter](https://github.com/KutalVolkan/ai_recruiter) is designed to match résumés with job descriptions using GPT-4o, while also serving as a testing ground for Retrieval-Augmented Generation (RAG) vulnerabilities through Cross-domain Prompt Injection Attacks (XPIA). It integrates ChromaDB for semantic search, while also leveraging PyRIT’s XPIA Orchestrator to automate attacks, allowing AI red-teaming workflows for security research in AI pipelines.

#### Key Features

- Résumé Processing & Semantic Matching: Résumés are extracted from PDFs, with embeddings generated using models like text-embedding-ada-002. These embeddings enable precise semantic matching, while GPT-4o is later used to assign a match score based on relevance and extracted content.

- Automated RAG Vulnerability Testing: Attackers can manipulate résumé content by injecting hidden text (via a [PDF converter](https://github.com/Azure/PyRIT/blob/main/doc/code/converters/pdf_converter.ipynb)) that optimizes scoring, influencing the AI Recruiter’s ranking system.

- [XPIA Orchestrator](https://github.com/Azure/PyRIT/blob/main/doc/code/orchestrators/3_xpia_orchestrator.ipynb) Integration: PyRIT enables full automation of prompt injections, making AI vulnerability research efficient and reproducible.
---

## The Exploit in Detail: Step-by-Step

### 1. *Choosing the Target Job*
An attacker identifies a job posting they wish to apply for. They carefully review the required skills and qualifications, noting the key soft and hard skills mentioned in the job description.

### 2. *Crafting the Manipulated Résumé*
Instead of submitting a standard résumé, the attacker uses the *PDF converter* integreated in PyRIT to inject hidden content into their résumé. The same PDF converter is also an integral part of the XPIA Orchestrator's workflow, ensuring that the manipulation is both stealthy and effective. Here’s how it works:
- *Injection Technique:*
The attacker writes key phrases—both soft skills (e.g., "team leadership", "communication") and hard skills (e.g., "Python", "machine learning")—directly derived from the job description. Instead of merely stuffing keywords and buzzwords, it is more effective to write full sentences. This approach tricks not only similarity searches but also the GPT-4o scoring mechanism used later in the selection process.
- *Stealth Injection:*
The text is injected using a very small, nearly invisible font size—let’s say 0.1px—and the same font color as the résumé background. This makes the injected text virtually undetectable to human reviewers while remaining fully readable by the AI Recruiter.

### 3. *Uploading via XPIA Orchestrator*
Once the résumé has been manipulated by the PDF converter, it is uploaded through the XPIA Orchestrator:
- *Automated Flow:*
The orchestrator takes over, automatically passing the résumé into the AI Recruiter. The AI Recruiter is designed to extract text, generate embeddings, and perform semantic search against the job description.
- *Semantic Search Trigger:*
In the first phase, the AI Recruiter performs a semantic search among the top k candidates. Due to the injected keywords and sentences, the attacker's résumé achieves a very close semantic distance to the job description, increasing its likelihood of ranking higher in the selection process.

### 4. *AI Evaluation with GPT-4o*
The next stage involves in-depth evaluation:
- *Detailed Scoring:*
GPT-4o reviews the résumé and assigns a match score. Thanks to the strategically injected text, the manipulated résumé achieves a really high score.
- *Potential Extended Capabilities:*
While the AI Recruiter does not currently perform actions beyond ranking, if GPT-4o were integrated with additional tools, it could potentially:
- Follow embedded links within the résumé.
- Retrieve external information about the candidate (e.g., from GitHub or personal websites).
- Analyze or execute code from linked repositories.
- Auto-generate summaries that are forwarded directly to HR.
Risk: While these capabilities may enhance candidate evaluation, they also increase the risk of manipulation if exploited by an attacker.

### 5. *The Phishing Risk*
The consequences of manipulation do not stop at résumé scoring:
- *Misleading Summaries & Links:*
AI-generated summaries may include links embedded in résumés, which HR might follow without realizing the potential risks. An attacker could exploit this by directing HR to a GitHub repository under their control, containing malicious code or misleading credentials designed to manipulate trust.
- *Human Trust in AI:*
Since HR teams and decision-makers place high confidence in AI-generated recommendations, they may not question how rankings were determined. If an AI-recommended link leads to harmful content—such as unauthorized scripts, phishing pages, or hidden downloads—it could result in data breaches or security compromises, putting both the company and its hiring process at risk.

---

## Beyond Recruitment: The Broader Threat Landscape

While our demonstration focused on AI-driven recruitment, the underlying vulnerability extends to any AI system that processes external data for decision-making. Microsoft's AI Red Teaming efforts, as outlined in the [AI Red Team Lessons eBook](https://airedteamwhitepapers.blob.core.windows.net/lessonswhitepaper/MS_AIRT_Lessons_eBook.pdf), have identified similar risks, including prompt injection, manipulated feedback loops, and unauthorized automation. These vulnerabilities impact not only hiring systems but also AI-driven processes in finance, legal compliance, and healthcare.

### Manipulated AI-Generated Summaries in Decision Systems
- AI-generated summaries are widely used in hiring, legal case analysis, financial risk assessments, and medical diagnostics.
- Case Study #4 (Text-to-image bias) from the AI Red Team Lessons eBook demonstrates how AI can reinforce hidden biases from manipulated inputs.
- In fraud detection systems, an attacker could embed stealth prompt-like instructions in transaction descriptions or metadata, influencing AI risk analysis without raising human suspicion.
- Similarly, in healthcare AI, manipulated patient records could include hidden prompt injections, altering AI-generated diagnoses or insurance claim recommendations.

### Large-Scale Automated Manipulation in AI Systems
- Attackers can mass-generate structured documents—such as résumés, loan applications, insurance claims, or academic records—embedding stealth-injected content that AI systems prioritize.
- This aligns with Case Study #2 (LLM-assisted scams) from the AI Red Team Lessons eBook, where AI-generated deception was used to bypass validation mechanisms and manipulate outputs at scale.
- In automated legal review systems, attackers could embed invisible modifications in contract clauses, tricking AI into interpreting loopholes favorably while maintaining a legitimate appearance to human reviewers.

---

## Conclusion

Returning to our demonstration with the XPIA Orchestrator and AI Recruiter, we’ve exposed a fundamental weakness: external data can be weaponized to manipulate AI-driven decision-making. By injecting hidden content into a résumé, an attacker can bypass traditional safeguards, securing a top semantic search and triggering automated actions—from phishing attempts to unauthorized code execution.

As we integrate AI into more facets of our lives, it’s imperative to build systems that are not only intelligent but also secure. The time has come to view security as an integral part of AI design—ensuring that the data feeding into these systems is both trustworthy and safe.

---

*Explore More:*

- [XPIA Orchestrator Notebook](https://github.com/Azure/PyRIT/blob/main/doc/code/orchestrators/3_xpia_orchestrator.ipynb)

- [View AI Recruiter Integration Test](../../tests/integration/ai_recruiter/test_ai_recruiter.py)

- [View AI Recruiter Code](https://github.com/KutalVolkan/ai_recruiter/blob/main/ai_recruiter.py)

- [Mapping to OWASP TOP 10 LLM](https://github.com/KutalVolkan/ai_recruiter/tree/main/owasp_top_ten)
51 changes: 26 additions & 25 deletions doc/code/converters/pdf_converter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "0",
"id": "dd6d3076",
"metadata": {},
"source": [
"# PDF Converter with Multiple Modes:\n",
Expand All @@ -28,7 +28,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1",
"id": "a7fa483b",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -99,7 +99,7 @@
},
{
"cell_type": "markdown",
"id": "2",
"id": "a725bedf",
"metadata": {},
"source": [
"# Direct Prompt PDF Generation (No Template)"
Expand All @@ -108,24 +108,9 @@
{
"cell_type": "code",
"execution_count": null,
"id": "3",
"id": "28a7ffa2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'__type__': 'TextTarget', '__module__': 'pyrit.prompt_target.text_target'}: user: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738380368779812.pdf\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[1m\u001b[34muser: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738380368779812.pdf\n"
]
}
],
"outputs": [],
"source": [
"# Define a simple string prompt (no templates)\n",
"prompt = \"This is a simple test string for PDF generation. No templates here!\"\n",
Expand Down Expand Up @@ -158,7 +143,7 @@
},
{
"cell_type": "markdown",
"id": "4",
"id": "22569015",
"metadata": {
"lines_to_next_cell": 0
},
Expand All @@ -169,7 +154,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5",
"id": "894e0176",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -208,8 +193,24 @@
"\n",
"# Define injection items\n",
"injection_items = [\n",
" {\"page\": 0, \"x\": 50, \"y\": 700, \"text\": \"Injected Text\", \"font_size\": 12, \"font\": \"Helvetica\", \"font_color\": (255, 0, 0)}, # Red text\n",
" {\"page\": 1, \"x\": 100, \"y\": 600, \"text\": \"Confidential\", \"font_size\": 10, \"font\": \"Helvetica\", \"font_color\": (0, 0, 255)} # Blue text\n",
" {\n",
" \"page\": 0,\n",
" \"x\": 50,\n",
" \"y\": 700,\n",
" \"text\": \"Injected Text\",\n",
" \"font_size\": 12,\n",
" \"font\": \"Helvetica\",\n",
" \"font_color\": (255, 0, 0),\n",
" }, # Red text\n",
" {\n",
" \"page\": 1,\n",
" \"x\": 100,\n",
" \"y\": 600,\n",
" \"text\": \"Confidential\",\n",
" \"font_size\": 10,\n",
" \"font\": \"Helvetica\",\n",
" \"font_color\": (0, 0, 255),\n",
" }, # Blue text\n",
"]\n",
"\n",
"# Define a simple string prompt (no templates)\n",
Expand Down Expand Up @@ -247,7 +248,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "6",
"id": "f7271ba8",
"metadata": {},
"outputs": [],
"source": [
Expand Down
2 changes: 1 addition & 1 deletion doc/code/orchestrators/1_prompt_sending_orchestrator.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
"version": "3.11.9"
}
},
"nbformat": 4,
Expand Down