Skip to content

feat(runtime): dlp-gen — generate DLP test corpora + dlp-test-files skill#75

Merged
cdot65 merged 13 commits into
mainfrom
cdot65/runtime-dlp-gen
May 21, 2026
Merged

feat(runtime): dlp-gen — generate DLP test corpora + dlp-test-files skill#75
cdot65 merged 13 commits into
mainfrom
cdot65/runtime-dlp-gen

Conversation

@cdot65

@cdot65 cdot65 commented May 21, 2026

Copy link
Copy Markdown
Owner

Summary

Adds airs runtime dlp-gen, a local generator for DLP test corpora, plus a companion dlp-test-files skill (the repo's first skill).

  • Generates clean carrier files and dirty copies with synthetic sensitive data embedded via multiple techniques per format.
  • Formats: PDF, PNG, JPEG, SVG, DOCX. 15 techniques total (3 each):
    • PDF: meta, hidden-text, trailer
    • PNG: text-chunks, trailer, stego-lsb
    • JPEG: exif, com, trailer
    • SVG: meta, hidden-text, comment
    • DOCX: core-props, hidden-run, visible
  • Writes <out>/clean/, <out>/dirty/, and manifest.json (dirty file → technique + embedded values) for scanner scoring.
  • Randomized synthetic payloads (reserved-range SSNs, Luhn-valid test PANs, example.com emails, AWS …EXAMPLE keys, …); --seed for reproducibility. No real PII.

Architecture

  • Framework-free, unit-tested core under src/dlp/ (rng, payload, lorem, manifest, generators, embedders, orchestrator).
  • Thin Commander wiring in src/cli/commands/dlp-gen.ts, registered under the runtime group.
  • New deps: pdf-lib, docx, sharp, piexifjs.

Tests / gates

  • 50+ new unit tests under tests/unit/dlp/ (each embedder verified via a real extractor: PNG chunk/LSB decode, EXIF read, PDF stream inflate + hex decode, docx unzip, etc.).
  • Full suite: 562 tests pass; coverage 93.13% lines / 88.11% branches / 99.61% functions (above thresholds). tsc --noEmit clean; mkdocs build clean.

Docs

  • docs/runtime/dlp-gen.md (+ nav), AGENTS.md entry, changeset (minor).

Test plan

  • pnpm test:coverage, pnpm tsc --noEmit, mkdocs build all green locally
  • CLI smoke: airs runtime dlp-gen --types all --seed 1 → 5 clean + 15 dirty + manifest
  • Run dirty files through a scanner and reconcile against manifest.json

@cdot65 cdot65 merged commit 760b706 into main May 21, 2026
4 checks passed
@cdot65 cdot65 deleted the cdot65/runtime-dlp-gen branch May 21, 2026 18:00
cdot65 added a commit that referenced this pull request May 21, 2026
…-test-files skill

- feat(runtime): airs runtime dlp-gen generates clean + dirty DLP test corpora
  across PDF/PNG/JPEG/SVG/DOCX with synthetic data + manifest (#75)
- feat(skill): dlp-test-files skill driving the command (#75)

Changesets: 0018.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant