Improve prompt-injection multimodal and gateway gates by ASUKAAAAA1204 · Pull Request #1616 · UnitOneAI/SecuritySkills

ASUKAAAAA1204 · 2026-06-07T17:36:41Z

Skill Improvement ($50-150 Bounty)

Skill Modified

Skill name: prompt-injection
Skill path: skills/ai-security/prompt-injection/

What Was Wrong

The skill covered direct and indirect text prompt injection well, but it did not require reviewers to inventory multimodal inputs, OCR/transcript-derived instructions, cross-agent handoffs, or LLM gateway/firewall evidence. That left two gaps from #1437:

Vision/audio/document inputs could carry instructions that bypass text-only sanitizers.
Reviewers had no explicit way to distinguish missing heavyweight gateway evidence in a trusted internal read-only workflow from a high-risk external agentic workflow.

What This PR Fixes

Refs #1437.

This PR updates SKILL.md to add:

Source modality and workflow-risk-tier fields to the interaction map.
Cross-agent context handoff checks in indirect injection review.
A new 4.6 Multimodal Injection category with evidence requirements for image/audio/video/OCR/transcript paths.
5.8 LLM Gateway / AI Firewall Evidence and 5.9 Cross-Agent and Tool-Output Taint Controls defense gates.
Output-report fields for extended category, source modality, and workflow risk tier.
Common pitfalls for text-only testing, guardrails-name-only evidence, and over-severity on trusted internal read-only workflows.

Evidence

Before (skill misses this / false positive on this):

A user uploads an image, screenshot, or audio clip containing hidden instructions.
The app passes OCR/transcript/vision output into the model and then into tools.
The existing skill focuses on text direct/indirect injection and does not force
reviewers to mark multimodal-derived content as untrusted instruction-bearing data.

A trusted internal read-only summarizer has no LLM firewall. The prior skill had
no explicit risk-tier calibration, so this could be over-severitized compared
with an external tool-using agent.

After (now correctly handled):

The interaction map records source modality and workflow risk tier.
The review asks for OCR/transcription/captioning pipeline details, benign and
adversarial multimodal fixtures, gateway/firewall coverage, fallback/retry
coverage, and cross-agent/tool-output taint controls.

Missing gateway evidence is calibrated: internal read-only trusted workflows are
control-gap/informational candidates, while external agentic workflows with
untrusted files, RAG, tools, memory, or sensitive data receive stronger severity.

Test Cases Added/Updated

Added inline vulnerable/benign evidence requirements for multimodal fixtures in SKILL.md.
Existing documentation checks still pass.
Did not add a separate tests/ directory because this existing skill currently consists of a single SKILL.md file.

Validation

git diff --check -> no whitespace errors (Windows Git emitted only the existing LF/CRLF warning).
Marker grep confirmed Multimodal Injection, LLM Gateway / AI Firewall Evidence, Cross-Agent and Tool-Output Taint Controls, Source modality, and Workflow risk tier are present.
Markdown fence balance check -> fence_count=2 and even.
Frontmatter smoke check confirmed name, version, allowed-tools, and injection-hardened remain present.
git diff --stat -> one file changed, 57 insertions.

Bounty Tier

Minor ($50) - Doc update, small logic tweak, typo fix
Moderate ($100) - New edge case coverage, FP reduction with evidence
Substantial ($150) - Rewritten detection logic, major coverage expansion

Bounty Info

I have read and agree to the CONTRIBUTING.md bounty terms
Preferred payment method: Crypto; details can be provided privately after maintainer acceptance.

Improve prompt injection multimodal gates

459ae3b

ASUKAAAAA1204 mentioned this pull request Jun 7, 2026

[REVIEW] prompt-injection: add multimodal injection (vision/audio) and LLM Gateway/Firewall evidence gates #1437

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve prompt-injection multimodal and gateway gates#1616

Improve prompt-injection multimodal and gateway gates#1616
ASUKAAAAA1204 wants to merge 1 commit into
UnitOneAI:mainfrom
ASUKAAAAA1204:improve/prompt-injection-multimodal-gateway

ASUKAAAAA1204 commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ASUKAAAAA1204 commented Jun 7, 2026

Skill Improvement ($50-150 Bounty)

Skill Modified

What Was Wrong

What This PR Fixes

Evidence

Test Cases Added/Updated

Validation

Bounty Tier

Bounty Info

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant