Skip to content

feat(scan): opt-in multi-modal scan orchestration#10

Merged
scotty595 merged 2 commits intomainfrom
feat/multi-modal-scanning
Apr 30, 2026
Merged

feat(scan): opt-in multi-modal scan orchestration#10
scotty595 merged 2 commits intomainfrom
feat/multi-modal-scanning

Conversation

@scotty595
Copy link
Copy Markdown
Contributor

@scotty595 scotty595 commented Apr 30, 2026

Summary

  • New export governance-sdk/scan/multi-modal with a pluggable extractor registry and the scanMultiModal() orchestrator. Closes the bypass where image / PDF / audio blocks pass through enforce() unscanned.
  • Ships orchestration only — actual OCR / PDF parsers / ASR are caller-supplied, preserving the zero-runtime-dep promise.
  • Mirrors the existing InjectionClassifier pattern: async scanner interface + global registry + pre-enforce() invocation.

API surface

```ts
import {
registerModalityScanner,
scanMultiModal,
isFailClosed,
} from 'governance-sdk/scan/multi-modal';

registerModalityScanner('image', {
extractText: async (block) => ocrEngine.recognize(block),
});

const scan = await scanMultiModal(blocks, {
enabled: ['text', 'image'],
onMissingScanner: 'block',
onExtractError: 'block',
timeoutMs: 5_000,
});

if (isFailClosed(scan)) { /* block before enforce() */ }
// otherwise feed scan.text into the existing detectInjection / hybridDetect
```

Defaults (deliberately conservative)

  • enabled: ['text'] — every other modality off until explicitly opted in
  • onMissingScanner: 'skip', onExtractError: 'skip'
  • timeoutMs: 30_000 per block

Failure modes (all surface in result.blocked[])

  • no_scanner — enabled but no extractor registered
  • extract_error — scanner threw (sync or async) or returned non-string
  • extract_timeout — exceeded timeoutMs
  • extract_empty — scanner returned null / undefined

Tests

20 new tests in src/scan/multi-modal.test.ts covering: defaults, opt-in modes, sync-throw and async-rejection paths, timeouts via Promise.race + setTimeout cleanup, registry CRUD, isFailClosed with mixed reasons. Full suite 1,392 / 0.

Merge order note

Modifies the README "Limitations" multi-modal bullet. If the README honesty pass PR merges first, this branch will have a trivial conflict on that one bullet — both edits are mine, easy resolve. Either way works; suggest merging the docs PR first.

Out of scope (follow-ups)

  • Framework adapters (Anthropic / Vercel AI / Genkit / LlamaIndex / Bedrock) auto-calling scanMultiModal on detection of non-text blocks. Today the caller wires it in their host code.
  • Optional reference packages (e.g. governance-sdk-scan-tesseract) shipping default extractors as separate peer-deps. Keeps the core SDK zero-dep.

Test plan

  • npm run build clean
  • npm test — 1,392 / 0
  • Reviewer skims src/scan/multi-modal.ts for the timeout / sync-throw paths
  • Reviewer confirms package.json exports field is correct

🤖 Generated with Claude Code


Note

Medium Risk
Adds new security-adjacent scanning behavior and a new public export surface; incorrect host wiring or misunderstandings of failClosed/default opt-in semantics could lead to gaps or unintended blocking, though the change is additive and well-tested.

Overview
Adds a new governance-sdk/scan/multi-modal export that orchestrates opt-in text extraction for image/pdf/audio content blocks via a global registry (registerModalityScanner, unregisterModalityScanner, clearModalityScanners) and returns concatenated text plus structured scan outcomes and a pre-evaluated failClosed flag (isFailClosed helper).

Introduces timeout handling and explicit failure classification (no_scanner, extract_error, extract_timeout) while treating null/undefined extractor returns as benign “no text” results, backed by a comprehensive new test suite. Updates both READMEs to document that multi-modal scanning is now available but remains opt-in with caller-supplied OCR/PDF/ASR to preserve zero runtime deps.

Reviewed by Cursor Bugbot for commit f89155f. Bugbot is set up for automated code reviews on this repo. Configure here.

@scotty595 scotty595 changed the title docs(sdk): README honesty pass — counts, audit chain, limitations feat(scan): opt-in multi-modal scan orchestration Apr 30, 2026
Comment thread packages/governance/src/scan/multi-modal.ts Outdated
Comment thread packages/governance/src/scan/multi-modal.ts
Closes the bypass where image, PDF, and audio content blocks pass
through enforce() unscanned. Ships the orchestration only — actual
OCR / PDF parsers / ASR are caller-supplied via a registry pattern,
keeping the zero-runtime-dep promise.

API:
- ModalityScanner interface (extractText: async (block) => string | null)
- Global registry mirroring the InjectionClassifier pattern:
  registerModalityScanner, getModalityScanner, unregisterModalityScanner,
  hasModalityScanner, clearModalityScanners.
- scanMultiModal(blocks, options) — walks ContentBlock[], invokes the
  registered scanner for each enabled modality, concatenates extracted
  text, returns structured result with modalitiesScanned, modalitiesSkipped,
  blocked[].
- isFailClosed(result, options) — small helper that maps blocked[] rows
  to a fail-closed boolean given the caller's onMissingScanner/onExtractError
  policy.

Defaults are deliberately conservative:
- enabled: ['text'] (image/pdf/audio off until explicitly opted in)
- onMissingScanner: 'skip'
- onExtractError: 'skip'
- timeoutMs: 30_000

Failure modes recorded in blocked[]:
- no_scanner: enabled but no extractor registered.
- extract_error: scanner threw or returned non-string.
- extract_timeout: extraction exceeded timeoutMs.
- extract_empty: scanner returned null/undefined.

Sync-throw vs async-rejection paths are both handled and surface as
extract_error. Timeouts use Promise.race with a single setTimeout,
cleared in finally so test runs don't leak handles.

Wiring:
- New export path 'governance-sdk/scan/multi-modal' added to package.json.
- README "Limitations" bullet updated to reflect opt-in availability
  with concrete pointer to the new export.

Tests: 20 new in src/scan/multi-modal.test.ts. Full suite 1,392/0.

Follow-ups (not in this commit):
- Framework adapters (Anthropic, Vercel AI, Genkit, LlamaIndex, Bedrock)
  could auto-call scanMultiModal when they detect non-text blocks. Today
  the caller wires it in their host code.
- A reference 'governance-sdk-scan-tesseract' optional peer-dep package
  could ship a default OCR scanner for image, but stays out of the core
  SDK to preserve zero-dep.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@scotty595 scotty595 force-pushed the feat/multi-modal-scanning branch from 326f2ec to df978db Compare April 30, 2026 13:29
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit df978db. Configure here.

Comment thread packages/governance/src/scan/multi-modal.ts
Two issues found by Cursor Bugbot on the multi-modal PR.

**HIGH: scanMultiModal silently ignored onMissingScanner / onExtractError**

ScanOptions declared the two policy fields but the orchestrator never
read them — they were only consumed by the separate `isFailClosed`
helper. A caller following the README's own example (pass
`onMissingScanner: 'block'` to scanMultiModal, then check
`isFailClosed(scan)` without options) would silently default to skip
and never fail-closed. The fail-closed mechanism was effectively
inert in the natural API call shape.

Fix: scanMultiModal now reads both policy fields, computes
`failClosed` itself, and records both the policy and the boolean on
the result. Callers check `result.failClosed` directly. The
`isFailClosed(result, override?)` helper is retained for the cases
where you want to reapply a different policy after the fact —
override now defaults to result.policy rather than 'skip'.

**MEDIUM: extract_empty was misclassified as a fail-closed-eligible failure**

The ModalityScanner contract says returning `null` means "this block
has no extractable text (e.g. an image that's purely visual)" —
explicitly a benign signal. But the orchestrator was pushing those
into `blocked[]` with reason `extract_empty`, and `isFailClosed` was
catch-all-gating any non-no_scanner blocked row under
onExtractError. Net effect: a caller setting `onExtractError: 'block'`
to defend against scanner failures would have been blocking on every
purely-visual image too — false positives on benign content.

Fix: null/undefined returns now go to a new `modalitiesEmpty[]` array
on the result, count toward `modalitiesScanned` (the scan succeeded —
text contribution is just empty), and never trigger fail-closed under
any policy. The `extract_empty` value is removed from `ScanBlockReason`;
`blocked[]` now only contains `no_scanner` / `extract_error` /
`extract_timeout` — failures that genuinely warrant fail-closed
treatment.

Tests: 7 new regression tests covering both issues plus the override
helper's defaulting behaviour. Full suite 1,399 / 0 (was 1,392 / 0
before; +7 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@scotty595 scotty595 merged commit 988db46 into main Apr 30, 2026
4 checks passed
scotty595 added a commit that referenced this pull request Apr 30, 2026
…y pass

The auto-generated release notes only covered #9 (tool-result adapters).
Code for #10 (multi-modal scan) and #11 (README honesty pass) shipped
in 0.15.0 but neither got a CHANGELOG entry — the auto-release pulled
from CHANGELOG.md so the GitHub Release body and the npm-displayed
changelog were both incomplete.

This commit extends the 0.15.0 entry with both missing sections.
GitHub Release body has been updated to match.

No code change; documentation only.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant