feat(scan): opt-in multi-modal scan orchestration#10
Merged
Conversation
3 tasks
Closes the bypass where image, PDF, and audio content blocks pass through enforce() unscanned. Ships the orchestration only — actual OCR / PDF parsers / ASR are caller-supplied via a registry pattern, keeping the zero-runtime-dep promise. API: - ModalityScanner interface (extractText: async (block) => string | null) - Global registry mirroring the InjectionClassifier pattern: registerModalityScanner, getModalityScanner, unregisterModalityScanner, hasModalityScanner, clearModalityScanners. - scanMultiModal(blocks, options) — walks ContentBlock[], invokes the registered scanner for each enabled modality, concatenates extracted text, returns structured result with modalitiesScanned, modalitiesSkipped, blocked[]. - isFailClosed(result, options) — small helper that maps blocked[] rows to a fail-closed boolean given the caller's onMissingScanner/onExtractError policy. Defaults are deliberately conservative: - enabled: ['text'] (image/pdf/audio off until explicitly opted in) - onMissingScanner: 'skip' - onExtractError: 'skip' - timeoutMs: 30_000 Failure modes recorded in blocked[]: - no_scanner: enabled but no extractor registered. - extract_error: scanner threw or returned non-string. - extract_timeout: extraction exceeded timeoutMs. - extract_empty: scanner returned null/undefined. Sync-throw vs async-rejection paths are both handled and surface as extract_error. Timeouts use Promise.race with a single setTimeout, cleared in finally so test runs don't leak handles. Wiring: - New export path 'governance-sdk/scan/multi-modal' added to package.json. - README "Limitations" bullet updated to reflect opt-in availability with concrete pointer to the new export. Tests: 20 new in src/scan/multi-modal.test.ts. Full suite 1,392/0. Follow-ups (not in this commit): - Framework adapters (Anthropic, Vercel AI, Genkit, LlamaIndex, Bedrock) could auto-call scanMultiModal when they detect non-text blocks. Today the caller wires it in their host code. - A reference 'governance-sdk-scan-tesseract' optional peer-dep package could ship a default OCR scanner for image, but stays out of the core SDK to preserve zero-dep. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
326f2ec to
df978db
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit df978db. Configure here.
Two issues found by Cursor Bugbot on the multi-modal PR. **HIGH: scanMultiModal silently ignored onMissingScanner / onExtractError** ScanOptions declared the two policy fields but the orchestrator never read them — they were only consumed by the separate `isFailClosed` helper. A caller following the README's own example (pass `onMissingScanner: 'block'` to scanMultiModal, then check `isFailClosed(scan)` without options) would silently default to skip and never fail-closed. The fail-closed mechanism was effectively inert in the natural API call shape. Fix: scanMultiModal now reads both policy fields, computes `failClosed` itself, and records both the policy and the boolean on the result. Callers check `result.failClosed` directly. The `isFailClosed(result, override?)` helper is retained for the cases where you want to reapply a different policy after the fact — override now defaults to result.policy rather than 'skip'. **MEDIUM: extract_empty was misclassified as a fail-closed-eligible failure** The ModalityScanner contract says returning `null` means "this block has no extractable text (e.g. an image that's purely visual)" — explicitly a benign signal. But the orchestrator was pushing those into `blocked[]` with reason `extract_empty`, and `isFailClosed` was catch-all-gating any non-no_scanner blocked row under onExtractError. Net effect: a caller setting `onExtractError: 'block'` to defend against scanner failures would have been blocking on every purely-visual image too — false positives on benign content. Fix: null/undefined returns now go to a new `modalitiesEmpty[]` array on the result, count toward `modalitiesScanned` (the scan succeeded — text contribution is just empty), and never trigger fail-closed under any policy. The `extract_empty` value is removed from `ScanBlockReason`; `blocked[]` now only contains `no_scanner` / `extract_error` / `extract_timeout` — failures that genuinely warrant fail-closed treatment. Tests: 7 new regression tests covering both issues plus the override helper's defaulting behaviour. Full suite 1,399 / 0 (was 1,392 / 0 before; +7 new). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
scotty595
added a commit
that referenced
this pull request
Apr 30, 2026
…y pass The auto-generated release notes only covered #9 (tool-result adapters). Code for #10 (multi-modal scan) and #11 (README honesty pass) shipped in 0.15.0 but neither got a CHANGELOG entry — the auto-release pulled from CHANGELOG.md so the GitHub Release body and the npm-displayed changelog were both incomplete. This commit extends the 0.15.0 entry with both missing sections. GitHub Release body has been updated to match. No code change; documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
governance-sdk/scan/multi-modalwith a pluggable extractor registry and thescanMultiModal()orchestrator. Closes the bypass where image / PDF / audio blocks pass throughenforce()unscanned.InjectionClassifierpattern: async scanner interface + global registry + pre-enforce()invocation.API surface
```ts
import {
registerModalityScanner,
scanMultiModal,
isFailClosed,
} from 'governance-sdk/scan/multi-modal';
registerModalityScanner('image', {
extractText: async (block) => ocrEngine.recognize(block),
});
const scan = await scanMultiModal(blocks, {
enabled: ['text', 'image'],
onMissingScanner: 'block',
onExtractError: 'block',
timeoutMs: 5_000,
});
if (isFailClosed(scan)) { /* block before enforce() */ }
// otherwise feed scan.text into the existing detectInjection / hybridDetect
```
Defaults (deliberately conservative)
enabled: ['text']— every other modality off until explicitly opted inonMissingScanner: 'skip',onExtractError: 'skip'timeoutMs: 30_000per blockFailure modes (all surface in
result.blocked[])no_scanner— enabled but no extractor registeredextract_error— scanner threw (sync or async) or returned non-stringextract_timeout— exceededtimeoutMsextract_empty— scanner returnednull/undefinedTests
20 new tests in
src/scan/multi-modal.test.tscovering: defaults, opt-in modes, sync-throw and async-rejection paths, timeouts via Promise.race + setTimeout cleanup, registry CRUD,isFailClosedwith mixed reasons. Full suite 1,392 / 0.Merge order note
Modifies the README "Limitations" multi-modal bullet. If the README honesty pass PR merges first, this branch will have a trivial conflict on that one bullet — both edits are mine, easy resolve. Either way works; suggest merging the docs PR first.
Out of scope (follow-ups)
scanMultiModalon detection of non-text blocks. Today the caller wires it in their host code.governance-sdk-scan-tesseract) shipping default extractors as separate peer-deps. Keeps the core SDK zero-dep.Test plan
npm run buildcleannpm test— 1,392 / 0src/scan/multi-modal.tsfor the timeout / sync-throw pathspackage.jsonexports field is correct🤖 Generated with Claude Code
Note
Medium Risk
Adds new security-adjacent scanning behavior and a new public export surface; incorrect host wiring or misunderstandings of
failClosed/default opt-in semantics could lead to gaps or unintended blocking, though the change is additive and well-tested.Overview
Adds a new
governance-sdk/scan/multi-modalexport that orchestrates opt-in text extraction forimage/pdf/audiocontent blocks via a global registry (registerModalityScanner,unregisterModalityScanner,clearModalityScanners) and returns concatenated text plus structured scan outcomes and a pre-evaluatedfailClosedflag (isFailClosedhelper).Introduces timeout handling and explicit failure classification (
no_scanner,extract_error,extract_timeout) while treatingnull/undefinedextractor returns as benign “no text” results, backed by a comprehensive new test suite. Updates both READMEs to document that multi-modal scanning is now available but remains opt-in with caller-supplied OCR/PDF/ASR to preserve zero runtime deps.Reviewed by Cursor Bugbot for commit f89155f. Bugbot is set up for automated code reviews on this repo. Configure here.