Skip to content

fix(security): run unicode normalization before secret redaction#1178

Merged
waynesun09 merged 5 commits into
fullsend-ai:mainfrom
ifireball:cursor/edb3260c
May 26, 2026
Merged

fix(security): run unicode normalization before secret redaction#1178
waynesun09 merged 5 commits into
fullsend-ai:mainfrom
ifireball:cursor/edb3260c

Conversation

@ifireball
Copy link
Copy Markdown
Contributor

Summary

  • Reorder sandbox post-tool hooks so unicode_posttool.py runs before secret_redact_posttool.py, preventing zero-width characters from evading secret regexes and reconstructing tokens in LLM context.
  • Add UnicodeNormalizer to OutputPipeline() and use it in host output scanning (scan output, scanOutputFiles).
  • Add regression tests for hook ordering, Go pipeline behavior, and Python hook chaining.

Fixes #444

Test plan

  • go test ./internal/security/... ./internal/cli/...
  • python3 -m unittest internal/security/hooks/posttool_chain_test.py
  • python3 -m unittest internal/security/hooks/unicode_posttool_test.py

Made with Cursor

@ifireball ifireball requested review from maruiz93, ralphbean, rh-hemartin and waynesun09 and removed request for waynesun09 May 19, 2026 12:57
@ifireball ifireball self-assigned this May 19, 2026
@ifireball ifireball marked this pull request as ready for review May 19, 2026 12:58
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Site preview

Preview: https://3c1015ff-site.fullsend-ai.workers.dev

Commit: a4f164e8741012461019224ff7c41c261a8eac4e

@fullsend-ai-review
Copy link
Copy Markdown

fullsend-ai-review Bot commented May 19, 2026

Review

Findings

Low

  • [stale-terminology] docs/problems/security-threat-model.md:73 — References the "SecretRedactor pipeline" as the output scanning mechanism. PR fix(security): run unicode normalization before secret redaction #1178 replaced the standalone SecretRedactor with OutputPipeline() which chains UnicodeNormalizer before SecretRedactor. The doc understates the pipeline's scope by omitting unicode normalization.
    Remediation: Update the reference to mention the full output pipeline (unicode normalization + secret redaction).

  • [stale-terminology] docs/guides/admin/private-repositories.md:170 — States "Agent output goes through the harness-level SecretRedactor pipeline". The output pipeline now chains NewUnicodeNormalizer() followed by NewSecretRedactor(). The doc only mentions secret redaction, omitting the unicode normalization stage.
    Remediation: Update the note to reference the output security pipeline rather than just the SecretRedactor.

  • [stale-description] docs/guides/dev/cli-internals.md:20 — Describes scan output as scanning "for secrets" only. PR fix(security): run unicode normalization before secret redaction #1178 expanded scan output to also perform unicode normalization (invisible character stripping, NFKC normalization) before secret scanning.
    Remediation: Update the inline comment to reflect the added unicode normalization step.

Previous run

Review

Findings

Medium

  • [stale-doc] docs/guides/admin/private-repositories.md:170 — References the "SecretRedactor pipeline" as the output scanning mechanism. After this PR, output scanning uses OutputPipeline() which composes UnicodeNormalizer before SecretRedactor. The doc is incomplete — readers would not learn that unicode normalization is a prerequisite stage for correct secret redaction.
    Remediation: Replace "SecretRedactor pipeline" with "output sanitization pipeline" or "the OutputPipeline (unicode normalization followed by secret redaction)".

  • [stale-doc] docs/problems/security-threat-model.md:73 — Same issue: refers to "the existing SecretRedactor pipeline" as if secret redaction is the entire output scanning mechanism. After this PR, the output pipeline is a composed multi-stage pipeline where unicode normalization is a security-critical prerequisite.
    Remediation: Update to reference the multi-stage output pipeline rather than SecretRedactor alone.

Low

  • [correctness] internal/cli/run.go and internal/cli/scan.go — The defensive fallback if out == "" && len(result.Findings) == 0 { out = text } is nested inside if len(result.Findings) > 0, making the inner condition unreachable (dead code). The fallback was added to address a prior review finding, but its placement means it can never execute. This doesn't cause incorrect behavior — when Sanitized is empty with findings present, it correctly means the content was fully stripped (e.g., all invisible characters) — but the dead code gives a false impression of defensive coverage.
    Remediation: Either remove the unreachable fallback (since the empty-Sanitized-with-findings case is intentionally correct) or move the fallback outside the if len(result.Findings) > 0 block to cover the theoretical case where Pipeline.Scan returns empty Sanitized without findings.
Previous run (2)

Review

Findings

No findings.

The core fix correctly swaps hook ordering in hooks.go so unicode_posttool.py runs before secret_redact_posttool.py, preventing zero-width characters from evading prefix regexes. The Go-side OutputPipeline() is updated to include UnicodeNormalizer before SecretRedactor, aligning host-side output scanning with the sandbox hook ordering invariant.

Additional hardening — expanded zero-width regex (aligned between Go and Python), ECMA-48 compliant ANSI CSI regex, OSC escape handling, supplementary variation selector stripping, and post-NFKC escape re-scanning — is well-scoped and strengthens the normalizer against adjacent evasion vectors.

Tests are thorough: TestGenerateClaudeSettings_PostToolSanitizeHookOrder asserts the ordering invariant structurally, TestPipeline covers zero-width and LTR-mark obfuscated PATs through the Go pipeline, and posttool_chain_test.py validates the Python hook chain end-to-end including a negative test proving redaction alone misses obfuscated tokens.

Previous run

Review

Findings

Low

  • [correctness] internal/cli/run.go:1382scanOutputFiles writes result.Sanitized to disk without the defensive empty-string fallback that was added to scan.go. Both call sites now use OutputPipeline(), but only scan.go guards against the theoretical edge case where result.Sanitized is empty despite findings existing (which would truncate the file to zero bytes). Current scanners always populate Sanitized when they report findings, so this is not exploitable today, but the inconsistency is worth aligning for robustness.
    Remediation: Add the same fallback pattern used in scan.go — if result.Sanitized is empty, fall back to the original content before writing.

Info

  • [style] internal/cli/scan.go:163-167 — The Long description for scan output ("Reads text from stdin and scans for API keys, tokens, credentials, and sensitive patterns") no longer reflects the full pipeline behavior, which now includes unicode normalization before secret redaction.
Previous run (3)

Review

Findings

No findings.

The core fix correctly swaps hook ordering in hooks.go so unicode_posttool.py runs before secret_redact_posttool.py, preventing zero-width characters from evading prefix regexes. The Go-side OutputPipeline() is updated to include UnicodeNormalizer before SecretRedactor, aligning host-side output scanning with the sandbox hook ordering invariant.

Additional hardening — expanded zero-width regex (aligned between Go and Python), ECMA-48 compliant ANSI CSI regex, OSC escape handling, supplementary variation selector stripping, and post-NFKC escape re-scanning — is well-scoped and strengthens the normalizer against adjacent evasion vectors.

Tests are thorough: TestGenerateClaudeSettings_PostToolSanitizeHookOrder asserts the ordering invariant structurally, TestPipeline covers zero-width and LTR-mark obfuscated PATs through the Go pipeline, and posttool_chain_test.py validates the Python hook chain end-to-end including a negative test proving redaction alone misses obfuscated tokens.

Previous run (4)

Review

Findings

Low

  • [correctness] internal/cli/run.go:1382scanOutputFiles writes result.Sanitized to disk without the defensive empty-string fallback that was added to scan.go. Both call sites now use OutputPipeline(), but only scan.go guards against the theoretical edge case where result.Sanitized is empty despite findings existing (which would truncate the file to zero bytes). Current scanners always populate Sanitized when they report findings, so this is not exploitable today, but the inconsistency is worth aligning for robustness.
    Remediation: Add the same fallback pattern used in scan.go — if result.Sanitized is empty, fall back to the original content before writing.

Info

  • [style] internal/cli/scan.go:163-167 — The Long description for scan output ("Reads text from stdin and scans for API keys, tokens, credentials, and sensitive patterns") no longer reflects the full pipeline behavior, which now includes unicode normalization before secret redaction.

Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One minor note inline.

Comment thread internal/cli/run.go Outdated
Copy link
Copy Markdown
Contributor

@waynesun09 waynesun09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10-Agent Review Squad — #1178

Agents dispatched: 10 (3x claude-coder, 3x claude-researcher, 2x gemini-code-review, 2x cursor-code-review)
Verified findings: 8 MEDIUM+ (1 CRITICAL, 3 HIGH, 4 MEDIUM) — 4 false positives removed after code verification

Summary

The core reordering fix is correct and well-tested across all three execution layers (Go pipeline, sandbox hooks, CLI scanning). The hook ordering invariant is well-documented and the assert.Less index-comparison test pattern is robust. All direct NewSecretRedactor() callers have been migrated to OutputPipeline().

However, the zero-width character coverage has gaps (U+200E/F, U+180E, U+206A-206F) that allow the same class of attack to succeed with different invisible characters — this should be addressed before merge.

Additional finding not in diff

MEDIUM — post-comment and post-review commands skip output sanitization entirely

internal/cli/postcomment.go and internal/cli/postreview.go read agent output and post directly to the GitHub API without calling OutputPipeline(). While output files should have been scanned during fullsend run, if these commands are invoked standalone (as documented in their --help), they could post unsanitized content containing leaked secrets or zero-width obfuscated tokens. Pre-existing gap, but issue #444's scope explicitly calls for auditing all output paths.

Suggestion: File a follow-up issue to add OutputPipeline().Scan() to these commands before they post to the forge API.

Findings by severity

Severity Count Key Theme
CRITICAL 1 Incomplete invisible char coverage (U+200E/F, U+180E, U+206A-206F bypass)
HIGH 3 Missing empty-Sanitized guard in run.go, Go/Python ANSI regex parity, missing post-NFKC rescan
MEDIUM 4 Stale doc/log in run.go, unscanned post-comment/post-review paths, test stderr discarded, supplementary variation selectors

See inline comments for details and suggested fixes.

Comment thread internal/security/scanner.go
Comment thread internal/security/scanner.go
Comment thread internal/cli/run.go Outdated
Comment thread internal/cli/run.go
Comment thread internal/security/hooks/posttool_chain_test.py Outdated
@ifireball
Copy link
Copy Markdown
Contributor Author

Addressed review feedback in 61b7351 + 674340d:

Hook ordering (original #444 fix) — unchanged; unicode_posttool still runs before secret_redact_posttool.

Review follow-ups:

  • Expanded invisible-character coverage in Go UnicodeNormalizer and Python unicode_posttool.py (U+200E/F, U+180E, U+034F, U+206A–206F, U+FFF9–FFFB, supplementary VS U+E0100–E01EF)
  • Aligned Go ANSI/OSC regex with Python; added post-NFKC escape rescan in Go
  • Added empty-Sanitized guard in scanOutputFiles; aligned doc comments and log messages with scan output
  • Chain tests: LTR-mark obfuscation case, stderr in hook test errors

Deferred (follow-up): post-comment / post-review standalone paths still skip OutputPipeline() — pre-existing gap, suggest separate issue.

CI green on latest push. Ready for re-review.

@ifireball
Copy link
Copy Markdown
Contributor Author

Replied to and resolved all inline threads addressed in 61b7351 / 674340d.

Deferred to follow-up issues (from 10-agent review — out of scope for this PR):

These commands can post unsanitized content when invoked standalone outside fullsend run.

@ralphbean ralphbean requested a review from waynesun09 May 21, 2026 19:36
Copy link
Copy Markdown
Contributor

@waynesun09 waynesun09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8-Agent Review Squad — 3 verified MEDIUM findings

Agents: 2x claude-coder, 2x claude-researcher, 2x gemini-code-review, 2x cursor-code-review

The core security fix is correct — hook reordering closes the zero-width character bypass (#444), expanded Unicode coverage is comprehensive, and Go/Python regex parity is aligned. CI is green. No merge-blocking issues.

After verification, 7 false positives were removed (pre-existing issues outside PR scope, misidentified Python as Go, etc.). 3 MEDIUM findings remain as inline comments:

  1. Empty-string Sanitized bypassPipeline.Scan conflates "" (sanitized-to-empty) with "" (no-changes), and CLI fallback writes original text back. Low practical risk but a semantic bug. (6/8 agents agreed)
  2. stripTerminalEscapes double-scanFindAllString + ReplaceAllString runs each regex twice; single-pass with ReplaceAllStringFunc halves the work on the hot path. (2/8 agents agreed)
  3. Missing wrong-order negative test — No Go test proves NewPipeline(SecretRedactor, UnicodeNormalizer) (wrong order) fails to catch obfuscated tokens. (2/8 agents agreed)

Comment thread internal/cli/run.go Outdated
Comment thread internal/security/unicode.go Outdated
Comment thread internal/security/scanner_test.go
Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ifireball ifireball requested a review from waynesun09 May 26, 2026 08:45
@ifireball
Copy link
Copy Markdown
Contributor Author

Rebased onto latest main and pushed 4eefa31 addressing the 3 open MEDIUM threads from the 8-agent review. CI pending on force-push.

@fullsend-ai-review fullsend-ai-review Bot added requires-manual-review Review requires human judgment and removed ready-for-merge All reviewers approved — ready to merge labels May 26, 2026
Copy link
Copy Markdown
Contributor

@waynesun09 waynesun09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review (4 agents: claude-coder, claude-researcher, gemini-code-review, cursor-code-review). 8 verified findings after dedup and false-positive removal. See inline comments for MEDIUM+ issues.

Comment thread internal/cli/run.go Outdated
Comment thread internal/security/unicode.go Outdated
Comment thread internal/security/unicode.go
Comment thread internal/security/hooks/posttool_chain_test.py
ifireball and others added 5 commits May 26, 2026 21:08
Zero-width characters interleaved in token-shaped strings bypassed
secret regexes when redaction ran before unicode stripping in sandbox
post-tool hooks. Reorder hooks to normalize first, extend OutputPipeline
and output scanning paths, and add regression tests.

Fixes fullsend-ai#444

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Barak Korren <bkorren@redhat.com>
Expand invisible-character stripping (U+200E/F, U+180E, etc.), align Go
ANSI/OSC handling with Python including post-NFKC rescan, add empty
Sanitized guards and consistent scan output logging.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Barak Korren <bkorren@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Barak Korren <bkorren@redhat.com>
Only fall back to original text when the pipeline reports no findings,
use single-pass terminal escape stripping, and add wrong-order pipeline test.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Barak Korren <bkorren@redhat.com>
Remove dead Sanitized fallback guards, expand invisible-character coverage
(U+061C, line/paragraph separators, Cf sweep), generalize ST-terminated
escape stripping, and add wrong-order/fullwidth chain tests.

Signed-off-by: Barak Korren <bkorren@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@ifireball ifireball requested a review from waynesun09 May 26, 2026 18:10
@ifireball
Copy link
Copy Markdown
Contributor Author

Rebased onto latest main, pushed a4f164e addressing the 4 open review threads. CI pending.

@fullsend-ai-review fullsend-ai-review Bot added ready-for-merge All reviewers approved — ready to merge and removed requires-manual-review Review requires human judgment labels May 26, 2026
Copy link
Copy Markdown
Contributor

@waynesun09 waynesun09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 13 review items verified as substantively addressed across 3 fix rounds. Line-by-line review of final diff confirms:

  • Hook ordering invariant enforced and tested (unicode before secret_redact)
  • Go/Python regex parity aligned (zero_width, ANSI, ST-terminated)
  • Empty-Sanitized edge case handled correctly (dead-code guard removed)
  • Post-NFKC escape rescan added to Go (parity with Python)
  • Cf category sweep and supplementary VS stripping in place
  • 6 new regression tests covering ZWNJ, LTR mark, ALM, fullwidth, and wrong-order scenarios

CI green across all checks.

@waynesun09 waynesun09 added this pull request to the merge queue May 26, 2026
Merged via the queue into fullsend-ai:main with commit 735648b May 26, 2026
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge All reviewers approved — ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security: zero-width characters evade secret redaction due to hook ordering

4 participants