Skip to content

feat(dlp): add visible-text techniques (incl. same-color) to dlp-gen#77

Closed
cdot65 wants to merge 1 commit into
mainfrom
cdot65/dlp-gen-visible-text
Closed

feat(dlp): add visible-text techniques (incl. same-color) to dlp-gen#77
cdot65 wants to merge 1 commit into
mainfrom
cdot65/dlp-gen-visible-text

Conversation

@cdot65

@cdot65 cdot65 commented May 21, 2026

Copy link
Copy Markdown
Owner

Summary

Adds visible-text embedding techniques to airs runtime dlp-gen.

  • Every format gets a visible technique — the synthetic payload is rendered as on-page / on-canvas text with foreground ≠ background (genuinely visible, OCR-able):
    • PDF: dark text on a light band (page content)
    • DOCX: visible body run
    • SVG: on-canvas <text> painted on top
    • PNG / JPEG: text composited onto the image pixels
  • PDF and DOCX additionally get visible-samecolor — body text drawn in the same color as its background (fg == bg): present and extractable, but camouflaged from the eye.

Corpus is now 21 dirty files per --types all --count 1 (was 15): pdf 5, png/jpeg/svg/docx 4 each.

Tests / gates

  • New cases for pdf visible/visible-samecolor (text recoverable from content stream), png/jpeg visible (valid image, overlay applied), svg/docx auto-covered by their loops. Orchestrator counts updated 15→21.
  • Full suite: 568 tests pass; coverage 93.34% lines / 88.14% branches / 99.63% functions (above thresholds). tsc --noEmit + mkdocs build clean.
  • Smoke: airs runtime dlp-gen --types all → 21 dirty files incl. all visible* variants; verified pdf visible text extracts via pdftotext, svg renders on-canvas, png/jpeg are valid images.

Docs

  • Updated technique tables + sample outputs in docs/runtime/dlp-gen.md, docs/reference/cli-commands.md, docs/development/full-cli-sweep.md, AGENTS.md, and the dlp-test-files skill. Changeset (minor).

Test plan

  • Generate a corpus and scan the new visible / visible-samecolor files to compare DLP detection (esp. same-color vs hidden-run/vanish).

Every format gains a `visible` technique (rendered text, foreground != background,
OCR-able). PDF and DOCX additionally get `visible-samecolor` — body text drawn in
the same color as its background (extractable but camouflaged). 21 dirty files per
`--types all --count 1` run.
@cdot65

cdot65 commented May 28, 2026

Copy link
Copy Markdown
Owner Author

Superseded by #233 — retargeted onto the v2.11.0 airs runtime dlp generate command path (was runtime dlp-gen) and bundled with #50 + #112.

@cdot65 cdot65 closed this May 28, 2026
@cdot65 cdot65 mentioned this pull request May 28, 2026
2 tasks
cdot65 added a commit that referenced this pull request May 28, 2026
#112 redteam report DYNAMIC fix (#233)

* test(redteam): RED for getDynamicReport service + renderer (#112)

* fix(redteam): route DYNAMIC jobs to getDynamicReport (#112)

Previously redteam report fell through to getStaticReport for any
non-CUSTOM jobType, including DYNAMIC, which 500s on the static
endpoint. Add a RedTeamDynamicReport type, getDynamicReport service
method, renderDynamicReport renderer, and the DYNAMIC routing branch.

* ci(redteam): rebase scan workflow with CUSTOM prompt sets + ASR gate (#50)

Adds scan_config to the litellm target, switches the redteam-scan
workflow to CUSTOM scans with prompt sets, skips targets without a
scan_config, and adds an ASR-threshold gate plus a step-summary block
of scan results.

Resolves a env-block conflict with the Node 24 bump (#76) by merging
both env keys.

* feat(dlp): add visible-text embedding techniques for PDF/PNG/JPEG/SVG/DOCX

Adds 6 new dirty-file generators (5 formats × 1-2 techniques each):
- PDF: visible, visible-samecolor
- PNG: visible (text overlay)
- JPEG: visible (text overlay)
- SVG: visible (rendered text node)
- DOCX: visible, visible-samecolor

visible-samecolor renders body text in the same color as background — present
and OCR-extractable but camouflaged from the eye. Useful for testing scanner
robustness vs. simple visual review.

Corpus jumps from 15 → 21 dirty files per full run.

* test(dlp): cover visible + visible-samecolor embedders

Per-format embed specs add visible-text assertions; orchestrate spec dirty
count 15 → 21.

* docs(dlp): document visible + visible-samecolor techniques on runtime dlp generate

Retargets onto the post-v2.11.0 command (was runtime dlp-gen). Updates AGENTS,
SKILL.md, generate.md, and full-cli-sweep corpus counts (15 → 21 dirty).

* chore: changesets for bundled dlp visible-text + redteam CI + report dynamic

* style: biome single-line formatting fix

* docs: regenerate typedoc api ref for new RedTeamDynamicReport types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant