feat(dlp): add visible-text techniques (incl. same-color) to dlp-gen by cdot65 · Pull Request #77 · cdot65/prisma-airs-cli

cdot65 · 2026-05-21T19:26:55Z

Summary

Adds visible-text embedding techniques to airs runtime dlp-gen.

Every format gets a visible technique — the synthetic payload is rendered as on-page / on-canvas text with foreground ≠ background (genuinely visible, OCR-able):
- PDF: dark text on a light band (page content)
- DOCX: visible body run
- SVG: on-canvas <text> painted on top
- PNG / JPEG: text composited onto the image pixels
PDF and DOCX additionally get visible-samecolor — body text drawn in the same color as its background (fg == bg): present and extractable, but camouflaged from the eye.

Corpus is now 21 dirty files per --types all --count 1 (was 15): pdf 5, png/jpeg/svg/docx 4 each.

Tests / gates

New cases for pdf visible/visible-samecolor (text recoverable from content stream), png/jpeg visible (valid image, overlay applied), svg/docx auto-covered by their loops. Orchestrator counts updated 15→21.
Full suite: 568 tests pass; coverage 93.34% lines / 88.14% branches / 99.63% functions (above thresholds). tsc --noEmit + mkdocs build clean.
Smoke: airs runtime dlp-gen --types all → 21 dirty files incl. all visible* variants; verified pdf visible text extracts via pdftotext, svg renders on-canvas, png/jpeg are valid images.

Docs

Updated technique tables + sample outputs in docs/runtime/dlp-gen.md, docs/reference/cli-commands.md, docs/development/full-cli-sweep.md, AGENTS.md, and the dlp-test-files skill. Changeset (minor).

Test plan

Generate a corpus and scan the new visible / visible-samecolor files to compare DLP detection (esp. same-color vs hidden-run/vanish).

Every format gains a `visible` technique (rendered text, foreground != background, OCR-able). PDF and DOCX additionally get `visible-samecolor` — body text drawn in the same color as its background (extractable but camouflaged). 21 dirty files per `--types all --count 1` run.

cdot65 · 2026-05-28T11:11:09Z

Superseded by #233 — retargeted onto the v2.11.0 airs runtime dlp generate command path (was runtime dlp-gen) and bundled with #50 + #112.

#112 redteam report DYNAMIC fix (#233) * test(redteam): RED for getDynamicReport service + renderer (#112) * fix(redteam): route DYNAMIC jobs to getDynamicReport (#112) Previously redteam report fell through to getStaticReport for any non-CUSTOM jobType, including DYNAMIC, which 500s on the static endpoint. Add a RedTeamDynamicReport type, getDynamicReport service method, renderDynamicReport renderer, and the DYNAMIC routing branch. * ci(redteam): rebase scan workflow with CUSTOM prompt sets + ASR gate (#50) Adds scan_config to the litellm target, switches the redteam-scan workflow to CUSTOM scans with prompt sets, skips targets without a scan_config, and adds an ASR-threshold gate plus a step-summary block of scan results. Resolves a env-block conflict with the Node 24 bump (#76) by merging both env keys. * feat(dlp): add visible-text embedding techniques for PDF/PNG/JPEG/SVG/DOCX Adds 6 new dirty-file generators (5 formats × 1-2 techniques each): - PDF: visible, visible-samecolor - PNG: visible (text overlay) - JPEG: visible (text overlay) - SVG: visible (rendered text node) - DOCX: visible, visible-samecolor visible-samecolor renders body text in the same color as background — present and OCR-extractable but camouflaged from the eye. Useful for testing scanner robustness vs. simple visual review. Corpus jumps from 15 → 21 dirty files per full run. * test(dlp): cover visible + visible-samecolor embedders Per-format embed specs add visible-text assertions; orchestrate spec dirty count 15 → 21. * docs(dlp): document visible + visible-samecolor techniques on runtime dlp generate Retargets onto the post-v2.11.0 command (was runtime dlp-gen). Updates AGENTS, SKILL.md, generate.md, and full-cli-sweep corpus counts (15 → 21 dirty). * chore: changesets for bundled dlp visible-text + redteam CI + report dynamic * style: biome single-line formatting fix * docs: regenerate typedoc api ref for new RedTeamDynamicReport types

cdot65 mentioned this pull request May 28, 2026

chore: bundle #77 dlp-gen visible-text retarget + #50 redteam CI yaml + #112 redteam report DYNAMIC fix #233

Merged

6 tasks

cdot65 closed this May 28, 2026

cdot65 mentioned this pull request May 28, 2026

ci: add scan config to redteam targets #50

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dlp): add visible-text techniques (incl. same-color) to dlp-gen#77

feat(dlp): add visible-text techniques (incl. same-color) to dlp-gen#77
cdot65 wants to merge 1 commit into
mainfrom
cdot65/dlp-gen-visible-text

cdot65 commented May 21, 2026

Uh oh!

cdot65 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cdot65 commented May 21, 2026

Summary

Tests / gates

Docs

Test plan

Uh oh!

cdot65 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant