feat(dlp): add visible-text techniques (incl. same-color) to dlp-gen#77
Closed
cdot65 wants to merge 1 commit into
Closed
feat(dlp): add visible-text techniques (incl. same-color) to dlp-gen#77cdot65 wants to merge 1 commit into
cdot65 wants to merge 1 commit into
Conversation
Every format gains a `visible` technique (rendered text, foreground != background, OCR-able). PDF and DOCX additionally get `visible-samecolor` — body text drawn in the same color as its background (extractable but camouflaged). 21 dirty files per `--types all --count 1` run.
6 tasks
Owner
Author
2 tasks
cdot65
added a commit
that referenced
this pull request
May 28, 2026
#112 redteam report DYNAMIC fix (#233) * test(redteam): RED for getDynamicReport service + renderer (#112) * fix(redteam): route DYNAMIC jobs to getDynamicReport (#112) Previously redteam report fell through to getStaticReport for any non-CUSTOM jobType, including DYNAMIC, which 500s on the static endpoint. Add a RedTeamDynamicReport type, getDynamicReport service method, renderDynamicReport renderer, and the DYNAMIC routing branch. * ci(redteam): rebase scan workflow with CUSTOM prompt sets + ASR gate (#50) Adds scan_config to the litellm target, switches the redteam-scan workflow to CUSTOM scans with prompt sets, skips targets without a scan_config, and adds an ASR-threshold gate plus a step-summary block of scan results. Resolves a env-block conflict with the Node 24 bump (#76) by merging both env keys. * feat(dlp): add visible-text embedding techniques for PDF/PNG/JPEG/SVG/DOCX Adds 6 new dirty-file generators (5 formats × 1-2 techniques each): - PDF: visible, visible-samecolor - PNG: visible (text overlay) - JPEG: visible (text overlay) - SVG: visible (rendered text node) - DOCX: visible, visible-samecolor visible-samecolor renders body text in the same color as background — present and OCR-extractable but camouflaged from the eye. Useful for testing scanner robustness vs. simple visual review. Corpus jumps from 15 → 21 dirty files per full run. * test(dlp): cover visible + visible-samecolor embedders Per-format embed specs add visible-text assertions; orchestrate spec dirty count 15 → 21. * docs(dlp): document visible + visible-samecolor techniques on runtime dlp generate Retargets onto the post-v2.11.0 command (was runtime dlp-gen). Updates AGENTS, SKILL.md, generate.md, and full-cli-sweep corpus counts (15 → 21 dirty). * chore: changesets for bundled dlp visible-text + redteam CI + report dynamic * style: biome single-line formatting fix * docs: regenerate typedoc api ref for new RedTeamDynamicReport types
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds visible-text embedding techniques to
airs runtime dlp-gen.visibletechnique — the synthetic payload is rendered as on-page / on-canvas text with foreground ≠ background (genuinely visible, OCR-able):<text>painted on topvisible-samecolor— body text drawn in the same color as its background (fg == bg): present and extractable, but camouflaged from the eye.Corpus is now 21 dirty files per
--types all --count 1(was 15): pdf 5, png/jpeg/svg/docx 4 each.Tests / gates
visible/visible-samecolor(text recoverable from content stream), png/jpegvisible(valid image, overlay applied), svg/docx auto-covered by their loops. Orchestrator counts updated 15→21.tsc --noEmit+mkdocs buildclean.airs runtime dlp-gen --types all→ 21 dirty files incl. allvisible*variants; verified pdfvisibletext extracts viapdftotext, svg renders on-canvas, png/jpeg are valid images.Docs
docs/runtime/dlp-gen.md,docs/reference/cli-commands.md,docs/development/full-cli-sweep.md,AGENTS.md, and thedlp-test-filesskill. Changeset (minor).Test plan
visible/visible-samecolorfiles to compare DLP detection (esp. same-color vs hidden-run/vanish).