Skip to content

Commit 21e4ef3

Browse files
authored
Merge pull request #3 from UnitOneAI/intel/ai-security-social-2026-03-18
intel: ai-security social updates 2026-03-18
2 parents 6b9f677 + 4fbf994 commit 21e4ef3

3 files changed

Lines changed: 63 additions & 106 deletions

File tree

skills/ai-security/agent-security/SKILL.md

Lines changed: 44 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ phase: [design, build, review]
1414
frameworks: [OWASP-Agentic-AI, NIST-AI-RMF-1.0]
1515
difficulty: advanced
1616
time_estimate: "60-120min"
17-
version: "1.0.0"
17+
version: "1.0.2"
1818
author: unitoneai
1919
license: MIT
2020
allowed-tools: Read, Grep, Glob
@@ -91,6 +91,40 @@ Before beginning the assessment, gather the following. If any item is unavailabl
9191

9292
## Process
9393

94+
### Tri-Layered Risk Assessment Lens (FASA Framework)
95+
96+
When assessing agent architectures, evaluate risks across three interdependent layers derived from the FASA tri-layered risk taxonomy (ArXiv 2603.13151):
97+
98+
| Layer | Scope | Example Risks |
99+
|---|---|---|
100+
| **AI Cognitive** | Risks arising from the model's reasoning, planning, and decision-making | Hallucinated tool arguments, goal drift, context amnesia across long sessions, confused-deputy behavior |
101+
| **Software Execution** | Risks in the runtime environment where agent actions are executed | Sequential tool attack chains, sandbox escapes, dependency exploits, cascading failure in long-horizon workflows |
102+
| **Information System** | Risks to the broader IT environment the agent operates within | Lateral movement, data exfiltration, credential theft, persistent access |
103+
104+
Use this layered lens throughout Steps 1-7 to ensure findings are not clustered in a single layer while risks in other layers go unassessed.
105+
106+
### Additional Threat Categories
107+
108+
The following threat patterns warrant explicit attention during architecture review:
109+
110+
- **Context amnesia:** In long-running or multi-session agent workflows, security-relevant context (active constraints, prior denied actions, accumulated risk) may be lost across context window boundaries or session resets. Verify that security state persists independently of the LLM context window.
111+
- **Sequential tool attack chains:** An attacker (or a manipulated agent) may chain individually benign tool calls into an attack sequence where the combined effect is harmful. Evaluate whether the system monitors tool call sequences, not just individual invocations.
112+
- **Confused-deputy behavior:** An agent with legitimate tool access is tricked -- typically via indirect prompt injection -- into performing unintended actions using its own authorized capabilities. The agent acts as a confused deputy: it has valid credentials and permissions, but an attacker directs its actions. This is distinct from privilege escalation; the agent never exceeds its permissions, yet causes harm within them.
113+
- **Cascading failure in long-horizon workflows:** Multi-step agent workflows (planning, research, execution sequences spanning minutes to hours) are vulnerable to error accumulation. An early-stage mistake or injection can compound through subsequent steps, producing increasingly harmful outcomes that are difficult to detect until the workflow completes.
114+
115+
### Red-Team Validation Tooling
116+
117+
For hands-on validation of agent permission boundaries and tool-use attack surface, use the **fabraix/playground** open-source exploit library (https://github.com/fabraix/playground). This provides consolidated AI agent exploit PoCs mapped to OWASP Agentic AI threat categories, enabling practitioners to test architectural controls against concrete attack scenarios rather than theoretical threats alone.
118+
119+
### Layered Defense Ordering
120+
121+
For high-consequence agentic systems, apply defenses in the following order. Each layer catches failures that slip through the previous layer:
122+
123+
1. **Input validation** -- Sanitize and validate all inputs reaching the agent (user input, retrieved content, inter-agent messages) before they enter the model context.
124+
2. **Model-level mitigations** -- Instruction hierarchy, system prompt hardening, and model-level safety training to resist manipulation.
125+
3. **Sandboxed execution** -- Run tool calls in isolated, resource-limited environments with minimal permissions.
126+
4. **Deterministic policy enforcement** -- For high-consequence actions (production deployments, financial transactions, data deletion), enforce hard-coded policy checks and HITL gates that cannot be overridden by model output, regardless of reasoning.
127+
94128
### Step 1 -- Agent Permission Model Review
95129

96130
Evaluate what each agent can do, under what conditions, and whether the permission model follows least-privilege principles.
@@ -104,23 +138,7 @@ Evaluate what each agent can do, under what conditions, and whether the permissi
104138
- **Per-session vs. permanent tool access:** Is tool access scoped to a specific task or session, or does every invocation receive the same broad tool set regardless of the task?
105139
- **Cross-agent tool sharing:** Can one agent invoke another agent's tools? If so, through what authorization mechanism?
106140

107-
**Detection methods using allowed tools:**
108-
109-
```
110-
# Find agent and tool definitions
111-
Glob: **/*agent*.{py,ts,js,yaml,yml,json}
112-
Glob: **/tools/*.{py,ts,js}
113-
Glob: **/*tool*.{py,ts,js,yaml,yml,json}
114-
Grep: "register_tool|add_tool|tool_list|available_tools|function_map|tool_registry" in **/*.{py,ts,js}
115-
Grep: "Tool(|@tool|FunctionTool|StructuredTool|BaseTool" in **/*.{py,ts,js}
116-
117-
# Find permission and credential configurations
118-
Grep: "service_account|iam|role_arn|credentials|api_key|secret|permission" in **/*.{py,yaml,yml,json,tf,env}
119-
Grep: "Action.*\*|Resource.*\*|admin|PowerUser|FullAccess" in **/*.{json,yaml,yml,tf}
120-
121-
# Find tool scoping logic
122-
Grep: "scope|allow|deny|restrict|filter_tools|permitted_tools|enabled_tools" in **/*.{py,ts,js}
123-
```
141+
**Detection methods:** Search for agent/tool definitions (`register_tool`, `add_tool`, `@tool`, `FunctionTool`), permission configs (`service_account`, `iam`, `role_arn`, wildcards in IAM policies), and tool scoping logic (`filter_tools`, `permitted_tools`, `enabled_tools`).
124142

125143
**Permission model evaluation matrix:**
126144

@@ -162,28 +180,7 @@ Evaluate whether the agent architecture is designed from the ground up around le
162180
- **Resource limits:** Are CPU, memory, token budget, and execution time limits enforced at the infrastructure level?
163181
- **Capability escalation paths:** Can the agent request elevated permissions at runtime, modify its own configuration, or influence the orchestrator to grant it additional tools?
164182

165-
**Detection methods using allowed tools:**
166-
167-
```
168-
# Check for network restrictions
169-
Grep: "network_policy|egress|firewall|sandbox|allowed_hosts|url_whitelist|allowed_urls" in **/*.{py,yaml,yml,json,tf}
170-
Grep: "requests.get|requests.post|urllib|httpx|fetch|axios" in **/*agent*.{py,ts,js}
171-
172-
# Check for file system restrictions
173-
Grep: "chroot|sandbox|allowed_paths|base_dir|restrict_path|working_dir" in **/*.{py,yaml,yml,json}
174-
Grep: "open(|write(|read(|os.path|pathlib|shutil" in **/*agent*.{py,ts,js}
175-
Grep: "os.listdir|os.walk|glob|Path(" in **/*agent*.{py,ts,js}
176-
177-
# Check for environment access
178-
Grep: "os.environ|os.getenv|process.env|env_var" in **/*agent*.{py,ts,js}
179-
180-
# Check for resource limits
181-
Grep: "max_tokens|token_budget|max_iterations|timeout|time_limit|max_steps|rate_limit" in **/*.{py,yaml,yml,json}
182-
Grep: "memory_limit|cpu_limit|resource_limit|ulimit" in **/*.{yaml,yml,json,tf,Dockerfile}
183-
184-
# Check for self-modification capability
185-
Grep: "self.tools|self.config|self.system_prompt|modify_config|update_tools|set_permissions" in **/*.{py,ts,js}
186-
```
183+
**Detection methods:** Search for network restrictions (`network_policy`, `egress`, `allowed_hosts`), file system restrictions (`chroot`, `sandbox`, `allowed_paths`), environment access (`os.environ`, `process.env`), resource limits (`max_tokens`, `token_budget`, `timeout`, `memory_limit`), and self-modification patterns (`self.tools`, `self.config`, `modify_config`).
187184

188185
**Least-privilege design checklist:**
189186

@@ -224,27 +221,7 @@ Evaluate the design, placement, and robustness of human approval gates in the ag
224221
- **Approval fatigue management:** How many approval requests per session does a human reviewer face? Systems generating hundreds of low-context requests have effectively no human oversight.
225222
- **Fail-closed design:** If the approval service is unreachable, does the agent halt (fail-closed) or proceed without approval (fail-open)?
226223

227-
**Detection methods using allowed tools:**
228-
229-
```
230-
# Find approval gate implementations
231-
Grep: "approve|confirm|human_in_the_loop|hitl|review|authorize|require_approval" in **/*.{py,ts,js}
232-
Grep: "approval_gate|confirmation_gate|human_review|manual_review" in **/*.{py,ts,js,yaml,yml}
233-
234-
# Check for bypass paths
235-
Grep: "skip_approval|auto_approve|bypass|override|fallback|fail_open" in **/*.{py,ts,js,yaml,yml}
236-
Grep: "except|catch|timeout|unavailable|unreachable" in **/*approv*.{py,ts,js}
237-
Grep: "except|catch|timeout|unavailable|unreachable" in **/*confirm*.{py,ts,js}
238-
239-
# Check for cumulative tracking
240-
Grep: "cumulative|aggregate|session_risk|total_risk|action_count|budget" in **/*.{py,ts,js}
241-
242-
# Check what context is provided to approvers
243-
Grep: "approval_context|review_context|display|present|show_details" in **/*.{py,ts,js}
244-
245-
# Find action classification (which actions need approval)
246-
Grep: "risk_level|action_type|destructive|irreversible|high_risk|write|delete|send|deploy" in **/*.{py,ts,js,yaml,yml}
247-
```
224+
**Detection methods:** Search for approval gates (`approve`, `human_in_the_loop`, `hitl`, `require_approval`), bypass paths (`skip_approval`, `auto_approve`, `fail_open`), cumulative tracking (`cumulative`, `session_risk`, `action_count`), and action classification (`risk_level`, `destructive`, `irreversible`, `high_risk`).
248225

249226
**HITL gate design principles:**
250227

@@ -286,28 +263,7 @@ Evaluate the architectural controls that limit the damage when an agent is compr
286263
- **Kill switch:** Can an agent be immediately terminated by an operator? Is there a mechanism to halt all agents simultaneously in an emergency?
287264
- **Rate and scope limiters:** Even within its permitted tool set, are there limits on how much an agent can do in a given time window (e.g., maximum 10 database writes per minute, maximum 5 emails per session)?
288265

289-
**Detection methods using allowed tools:**
290-
291-
```
292-
# Check for container/process isolation
293-
Glob: **/Dockerfile*
294-
Glob: **/docker-compose*.{yml,yaml}
295-
Glob: **/*.tf
296-
Grep: "container|sandbox|isolat|namespace|seccomp|apparmor|gvisor" in **/*.{yaml,yml,json,tf,Dockerfile}
297-
298-
# Check for network segmentation
299-
Grep: "network_policy|NetworkPolicy|security_group|firewall_rule|egress|ingress" in **/*.{yaml,yml,json,tf}
300-
Grep: "169.254.169.254|metadata|IMDS|instance.metadata" in **/*.{py,ts,js,yaml,yml}
301-
302-
# Check for kill switch / emergency stop
303-
Grep: "kill|stop|halt|emergency|shutdown|circuit_breaker|breaker" in **/*.{py,ts,js,yaml,yml}
304-
305-
# Check for rate limiting on agent actions
306-
Grep: "rate_limit|throttle|max_per_minute|max_per_session|action_limit|cooldown" in **/*.{py,ts,js,yaml,yml}
307-
308-
# Check for action reversibility
309-
Grep: "undo|rollback|revert|compensat|reverse|cancel" in **/*.{py,ts,js}
310-
```
266+
**Detection methods:** Search for isolation (`container`, `sandbox`, `seccomp`, `gvisor`), network segmentation (`network_policy`, `security_group`, `169.254.169.254`), kill switches (`emergency`, `circuit_breaker`, `shutdown`), rate limiting (`rate_limit`, `throttle`, `max_per_session`), and reversibility (`undo`, `rollback`, `compensat`).
311267

312268
**Blast radius assessment framework:**
313269

@@ -349,27 +305,7 @@ Evaluate whether the audit logging for agent actions is sufficient for incident
349305
- **Log retention and access:** Are agent audit logs retained for the required compliance period? Are they accessible to security and compliance teams?
350306
- **Cross-agent correlation:** In multi-agent systems, can logs be correlated across agents to reconstruct the full action chain for a given workflow?
351307

352-
**Detection methods using allowed tools:**
353-
354-
```
355-
# Find logging implementations
356-
Grep: "log|logger|logging|audit|record|track|emit" in **/*agent*.{py,ts,js}
357-
Grep: "log|logger|logging|audit|record|track|emit" in **/*tool*.{py,ts,js}
358-
359-
# Check what is logged per tool invocation
360-
Grep: "tool_name|tool_input|tool_output|tool_result|parameters|arguments" in **/*log*.{py,ts,js}
361-
Grep: "session_id|correlation_id|trace_id|request_id|agent_id" in **/*.{py,ts,js}
362-
363-
# Check for log integrity
364-
Grep: "immutable|append_only|write_once|tamper|integrity|sign|hash" in **/*log*.{py,yaml,yml}
365-
366-
# Check for decision/reasoning logging
367-
Grep: "reasoning|thought|chain_of_thought|decision|rationale|explanation" in **/*log*.{py,ts,js}
368-
369-
# Check log pipeline configuration
370-
Glob: **/logging*.{yaml,yml,json,conf,ini}
371-
Grep: "siem|splunk|datadog|cloudwatch|elasticsearch|loki|fluentd" in **/*.{yaml,yml,json}
372-
```
308+
**Detection methods:** Search for logging implementations (`logger`, `audit`, `emit`), per-invocation fields (`tool_name`, `tool_input`, `correlation_id`, `trace_id`), log integrity (`immutable`, `append_only`, `tamper`), decision logging (`reasoning`, `chain_of_thought`, `rationale`), and SIEM integration (`splunk`, `datadog`, `cloudwatch`, `elasticsearch`).
373309

374310
**Audit trail completeness checklist:**
375311

@@ -647,3 +583,7 @@ Glob: **/security_architecture*
647583
8. LangChain Arbitrary Code Execution -- CVE-2023-29374
648584
9. OWASP Application Security Verification Standard (ASVS), V14: Configuration -- https://owasp.org/www-project-application-security-verification-standard/
649585
10. Leike, J. et al. "Scalable Agent Alignment via Reward Modeling: a Research Direction" (2018) -- arXiv:1811.07871 -- foundational work on agent alignment and oversight mechanisms
586+
11. FASA Tri-Layered Risk Taxonomy for AI Agent Systems (2026) -- arXiv:2603.13151
587+
12. Sequential Tool Attack Chains and Context Amnesia in Agentic AI (2026) -- arXiv:2603.12644
588+
13. Confused-Deputy Attacks and Cascading Failures in Long-Horizon Agent Workflows (2026) -- arXiv:2603.12230
589+
14. fabraix/playground -- Open-source AI agent red-team exploit library for validating agent permission boundaries and tool-use attack surface -- https://github.com/fabraix/playground

skills/ai-security/agentic-top-10/SKILL.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ phase: [design, build, review]
1313
frameworks: [OWASP-Agentic-AI, MITRE-ATLAS, NIST-AI-RMF]
1414
difficulty: advanced
1515
time_estimate: "45-90min"
16-
version: "1.0.0"
16+
version: "1.0.1"
1717
author: unitoneai
1818
license: MIT
1919
allowed-tools: Read, Grep, Glob
@@ -430,6 +430,10 @@ Grep: "send_message|delegate|dispatch|publish|subscribe|queue" in **/*.{py,ts,js
430430
Grep: "approve|confirm|human_in_the_loop|hitl|review|authorize" in **/*.{py,ts,js,yaml,yml}
431431
```
432432

433+
### Hands-On Assessment Tooling
434+
435+
For practical validation of OWASP Agentic AI risks against concrete exploits, use the **fabraix/playground** open-source exploit library (https://github.com/fabraix/playground). This provides consolidated AI agent exploit PoCs that can be used alongside the theoretical framework in Step 2 to test each AG01-AG10 category against real attack scenarios.
436+
433437
### Step 2 — Threat Assessment
434438

435439
For each of the 10 categories, assess the system and assign a risk rating:
@@ -612,3 +616,4 @@ This skill is designed to be resilient against prompt injection. The following r
612616
8. OWASP Application Security Verification Standard (ASVS) — [owasp.org/www-project-application-security-verification-standard](https://owasp.org/www-project-application-security-verification-standard/)
613617
9. LangChain Arbitrary Code Execution — CVE-2023-29374
614618
10. NIST SP 800-53 Rev. 5, Security and Privacy Controls — [nist.gov](https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final)
619+
11. fabraix/playground — Open-source AI agent red-team exploit library with PoCs for OWASP Agentic AI Top 10 risks — https://github.com/fabraix/playground

skills/ai-security/prompt-injection/SKILL.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ phase: [build, review, operate]
1313
frameworks: [OWASP-LLM01-2025, MITRE-ATLAS]
1414
difficulty: advanced
1515
time_estimate: "30-60min"
16-
version: "1.0.0"
16+
version: "1.0.2"
1717
author: unitoneai
1818
license: MIT
1919
allowed-tools: Read, Grep, Glob
@@ -192,6 +192,16 @@ Evaluate which of the following mitigations are implemented and how effectively.
192192
- Is the system prompt structurally separated from user input (e.g., via the API's system message role) rather than concatenated in a single string?
193193
- Are retrieved documents and external content clearly demarcated as data, not instructions?
194194

195+
### 5.7 Adaptive Attack Resilience
196+
197+
> **Warning:** Static prompt injection defenses (hardcoded system prompts, simple keyword filtering) are demonstrably insufficient against adaptive attackers. PISmith (Yin et al. 2026) achieved highest attack success rates across 13 benchmarks using RL-optimized adaptive black-box attacks.
198+
199+
- **Continuous red-team evaluation:** Prompt injection defenses must be evaluated continuously, not as a one-time test. Adaptive attackers iteratively refine their payloads against deployed defenses. Schedule recurring red-team assessments using automated adversarial tooling alongside manual expert testing.
200+
- **Agentic benchmark suites:** For applications where LLMs invoke tools or take autonomous actions, standard prompt injection benchmarks are insufficient. Use agentic-specific benchmark suites that test injection in the context of tool use and multi-step workflows:
201+
- **InjecAgent** -- Tests indirect prompt injection in agentic settings where the LLM processes external content and has tool access.
202+
- **AgentDojo** -- Evaluates agent robustness against injection attacks across diverse tool-use scenarios with realistic adversarial content.
203+
- **fabraix/playground** (https://github.com/fabraix/playground) -- Open-source library of AI agent exploit PoCs that can serve as a test harness for validating direct and indirect injection defenses against published attack patterns.
204+
195205
---
196206

197207
## Step 6: Report Findings
@@ -274,3 +284,5 @@ Each finding should be assigned a severity based on potential impact:
274284
- Perez, F. & Ribeiro, I. (2022). "Ignore Previous Prompt: Attack Techniques For Language Models." arXiv:2211.09527.
275285
- Greshake, K. et al. (2023). "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." arXiv:2302.12173.
276286
- Willison, S. Prompt Injection taxonomy and ongoing research — https://simonwillison.net
287+
- Yin, X. et al. "PISmith: RL-Optimized Adaptive Black-Box Prompt Injection Attacks" (2026) -- arXiv:2603.13026
288+
- fabraix/playground — Open-source AI agent exploit library for testing injection defenses — https://github.com/fabraix/playground

0 commit comments

Comments
 (0)