feat(scan): opt-in multi-modal scan orchestration by scotty595 · Pull Request #10 · lua-ai-global/governance

scotty595 · 2026-04-30T13:16:04Z

Summary

New export governance-sdk/scan/multi-modal with a pluggable extractor registry and the scanMultiModal() orchestrator. Closes the bypass where image / PDF / audio blocks pass through enforce() unscanned.
Ships orchestration only — actual OCR / PDF parsers / ASR are caller-supplied, preserving the zero-runtime-dep promise.
Mirrors the existing InjectionClassifier pattern: async scanner interface + global registry + pre-enforce() invocation.

API surface

```ts
import {
registerModalityScanner,
scanMultiModal,
isFailClosed,
} from 'governance-sdk/scan/multi-modal';

registerModalityScanner('image', {
extractText: async (block) => ocrEngine.recognize(block),
});

const scan = await scanMultiModal(blocks, {
enabled: ['text', 'image'],
onMissingScanner: 'block',
onExtractError: 'block',
timeoutMs: 5_000,
});

if (isFailClosed(scan)) { /* block before enforce() */ }
// otherwise feed scan.text into the existing detectInjection / hybridDetect
```

Defaults (deliberately conservative)

enabled: ['text'] — every other modality off until explicitly opted in
onMissingScanner: 'skip', onExtractError: 'skip'
timeoutMs: 30_000 per block

Failure modes (all surface in `result.blocked[]`)

no_scanner — enabled but no extractor registered
extract_error — scanner threw (sync or async) or returned non-string
extract_timeout — exceeded timeoutMs
extract_empty — scanner returned null / undefined

Tests

20 new tests in src/scan/multi-modal.test.ts covering: defaults, opt-in modes, sync-throw and async-rejection paths, timeouts via Promise.race + setTimeout cleanup, registry CRUD, isFailClosed with mixed reasons. Full suite 1,392 / 0.

Merge order note

Modifies the README "Limitations" multi-modal bullet. If the README honesty pass PR merges first, this branch will have a trivial conflict on that one bullet — both edits are mine, easy resolve. Either way works; suggest merging the docs PR first.

Out of scope (follow-ups)

Framework adapters (Anthropic / Vercel AI / Genkit / LlamaIndex / Bedrock) auto-calling scanMultiModal on detection of non-text blocks. Today the caller wires it in their host code.
Optional reference packages (e.g. governance-sdk-scan-tesseract) shipping default extractors as separate peer-deps. Keeps the core SDK zero-dep.

Test plan

npm run build clean
npm test — 1,392 / 0
Reviewer skims src/scan/multi-modal.ts for the timeout / sync-throw paths
Reviewer confirms package.json exports field is correct

🤖 Generated with Claude Code

Note

Medium Risk
Adds new security-adjacent scanning behavior and a new public export surface; incorrect host wiring or misunderstandings of failClosed/default opt-in semantics could lead to gaps or unintended blocking, though the change is additive and well-tested.

Overview
Adds a new governance-sdk/scan/multi-modal export that orchestrates opt-in text extraction for image/pdf/audio content blocks via a global registry (registerModalityScanner, unregisterModalityScanner, clearModalityScanners) and returns concatenated text plus structured scan outcomes and a pre-evaluated failClosed flag (isFailClosed helper).

Introduces timeout handling and explicit failure classification (no_scanner, extract_error, extract_timeout) while treating null/undefined extractor returns as benign “no text” results, backed by a comprehensive new test suite. Updates both READMEs to document that multi-modal scanning is now available but remains opt-in with caller-supplied OCR/PDF/ASR to preserve zero runtime deps.

^{Reviewed by Cursor Bugbot for commit f89155f. Bugbot is set up for automated code reviews on this repo. Configure here.}

Closes the bypass where image, PDF, and audio content blocks pass through enforce() unscanned. Ships the orchestration only — actual OCR / PDF parsers / ASR are caller-supplied via a registry pattern, keeping the zero-runtime-dep promise. API: - ModalityScanner interface (extractText: async (block) => string | null) - Global registry mirroring the InjectionClassifier pattern: registerModalityScanner, getModalityScanner, unregisterModalityScanner, hasModalityScanner, clearModalityScanners. - scanMultiModal(blocks, options) — walks ContentBlock[], invokes the registered scanner for each enabled modality, concatenates extracted text, returns structured result with modalitiesScanned, modalitiesSkipped, blocked[]. - isFailClosed(result, options) — small helper that maps blocked[] rows to a fail-closed boolean given the caller's onMissingScanner/onExtractError policy. Defaults are deliberately conservative: - enabled: ['text'] (image/pdf/audio off until explicitly opted in) - onMissingScanner: 'skip' - onExtractError: 'skip' - timeoutMs: 30_000 Failure modes recorded in blocked[]: - no_scanner: enabled but no extractor registered. - extract_error: scanner threw or returned non-string. - extract_timeout: extraction exceeded timeoutMs. - extract_empty: scanner returned null/undefined. Sync-throw vs async-rejection paths are both handled and surface as extract_error. Timeouts use Promise.race with a single setTimeout, cleared in finally so test runs don't leak handles. Wiring: - New export path 'governance-sdk/scan/multi-modal' added to package.json. - README "Limitations" bullet updated to reflect opt-in availability with concrete pointer to the new export. Tests: 20 new in src/scan/multi-modal.test.ts. Full suite 1,392/0. Follow-ups (not in this commit): - Framework adapters (Anthropic, Vercel AI, Genkit, LlamaIndex, Bedrock) could auto-call scanMultiModal when they detect non-text blocks. Today the caller wires it in their host code. - A reference 'governance-sdk-scan-tesseract' optional peer-dep package could ship a default OCR scanner for image, but stays out of the core SDK to preserve zero-dep. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit df978db. Configure here.}

Two issues found by Cursor Bugbot on the multi-modal PR. **HIGH: scanMultiModal silently ignored onMissingScanner / onExtractError** ScanOptions declared the two policy fields but the orchestrator never read them — they were only consumed by the separate `isFailClosed` helper. A caller following the README's own example (pass `onMissingScanner: 'block'` to scanMultiModal, then check `isFailClosed(scan)` without options) would silently default to skip and never fail-closed. The fail-closed mechanism was effectively inert in the natural API call shape. Fix: scanMultiModal now reads both policy fields, computes `failClosed` itself, and records both the policy and the boolean on the result. Callers check `result.failClosed` directly. The `isFailClosed(result, override?)` helper is retained for the cases where you want to reapply a different policy after the fact — override now defaults to result.policy rather than 'skip'. **MEDIUM: extract_empty was misclassified as a fail-closed-eligible failure** The ModalityScanner contract says returning `null` means "this block has no extractable text (e.g. an image that's purely visual)" — explicitly a benign signal. But the orchestrator was pushing those into `blocked[]` with reason `extract_empty`, and `isFailClosed` was catch-all-gating any non-no_scanner blocked row under onExtractError. Net effect: a caller setting `onExtractError: 'block'` to defend against scanner failures would have been blocking on every purely-visual image too — false positives on benign content. Fix: null/undefined returns now go to a new `modalitiesEmpty[]` array on the result, count toward `modalitiesScanned` (the scan succeeded — text contribution is just empty), and never trigger fail-closed under any policy. The `extract_empty` value is removed from `ScanBlockReason`; `blocked[]` now only contains `no_scanner` / `extract_error` / `extract_timeout` — failures that genuinely warrant fail-closed treatment. Tests: 7 new regression tests covering both issues plus the override helper's defaulting behaviour. Full suite 1,399 / 0 (was 1,392 / 0 before; +7 new). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…y pass The auto-generated release notes only covered #9 (tool-result adapters). Code for #10 (multi-modal scan) and #11 (README honesty pass) shipped in 0.15.0 but neither got a CHANGELOG entry — the auto-release pulled from CHANGELOG.md so the GitHub Release body and the npm-displayed changelog were both incomplete. This commit extends the 0.15.0 entry with both missing sections. GitHub Release body has been updated to match. No code change; documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

scotty595 changed the title ~~docs(sdk): README honesty pass — counts, audit chain, limitations~~ feat(scan): opt-in multi-modal scan orchestration Apr 30, 2026

scotty595 mentioned this pull request Apr 30, 2026

docs(sdk): README honesty pass — counts, audit chain, limitations #11

Merged

3 tasks

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread packages/governance/src/scan/multi-modal.ts Outdated

Comment thread packages/governance/src/scan/multi-modal.ts

scotty595 force-pushed the feat/multi-modal-scanning branch from 326f2ec to df978db Compare April 30, 2026 13:29

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread packages/governance/src/scan/multi-modal.ts

scotty595 merged commit 988db46 into main Apr 30, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scan): opt-in multi-modal scan orchestration#10

feat(scan): opt-in multi-modal scan orchestration#10
scotty595 merged 2 commits intomainfrom
feat/multi-modal-scanning

scotty595 commented Apr 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scotty595 commented Apr 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

API surface

Defaults (deliberately conservative)

Failure modes (all surface in result.blocked[])

Tests

Merge order note

Out of scope (follow-ups)

Test plan

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

scotty595 commented Apr 30, 2026 •

edited by cursor Bot

Loading

Failure modes (all surface in `result.blocked[]`)