Skip to content

fix(security): run unicode normalization before secret redaction#1178

Open
ifireball wants to merge 3 commits into
fullsend-ai:mainfrom
ifireball:cursor/edb3260c
Open

fix(security): run unicode normalization before secret redaction#1178
ifireball wants to merge 3 commits into
fullsend-ai:mainfrom
ifireball:cursor/edb3260c

Conversation

@ifireball
Copy link
Copy Markdown
Contributor

Summary

  • Reorder sandbox post-tool hooks so unicode_posttool.py runs before secret_redact_posttool.py, preventing zero-width characters from evading secret regexes and reconstructing tokens in LLM context.
  • Add UnicodeNormalizer to OutputPipeline() and use it in host output scanning (scan output, scanOutputFiles).
  • Add regression tests for hook ordering, Go pipeline behavior, and Python hook chaining.

Fixes #444

Test plan

  • go test ./internal/security/... ./internal/cli/...
  • python3 -m unittest internal/security/hooks/posttool_chain_test.py
  • python3 -m unittest internal/security/hooks/unicode_posttool_test.py

Made with Cursor

Zero-width characters interleaved in token-shaped strings bypassed
secret regexes when redaction ran before unicode stripping in sandbox
post-tool hooks. Reorder hooks to normalize first, extend OutputPipeline
and output scanning paths, and add regression tests.

Fixes fullsend-ai#444

Co-authored-by: Cursor <[email protected]>
@ifireball ifireball requested review from maruiz93, ralphbean, rh-hemartin and waynesun09 and removed request for waynesun09 May 19, 2026 12:57
@ifireball ifireball self-assigned this May 19, 2026
@ifireball ifireball marked this pull request as ready for review May 19, 2026 12:58
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Site preview

Preview: https://696ce045-site.fullsend-ai.workers.dev

Commit: 674340d95392b8adec233f8f8102c3eab33dcc34

@fullsend-ai-review
Copy link
Copy Markdown

fullsend-ai-review Bot commented May 19, 2026

Review

Findings

No findings.

The core fix correctly swaps hook ordering in hooks.go so unicode_posttool.py runs before secret_redact_posttool.py, preventing zero-width characters from evading prefix regexes. The Go-side OutputPipeline() is updated to include UnicodeNormalizer before SecretRedactor, aligning host-side output scanning with the sandbox hook ordering invariant.

Additional hardening — expanded zero-width regex (aligned between Go and Python), ECMA-48 compliant ANSI CSI regex, OSC escape handling, supplementary variation selector stripping, and post-NFKC escape re-scanning — is well-scoped and strengthens the normalizer against adjacent evasion vectors.

Tests are thorough: TestGenerateClaudeSettings_PostToolSanitizeHookOrder asserts the ordering invariant structurally, TestPipeline covers zero-width and LTR-mark obfuscated PATs through the Go pipeline, and posttool_chain_test.py validates the Python hook chain end-to-end including a negative test proving redaction alone misses obfuscated tokens.

Previous run

Review

Findings

Low

  • [correctness] internal/cli/run.go:1382scanOutputFiles writes result.Sanitized to disk without the defensive empty-string fallback that was added to scan.go. Both call sites now use OutputPipeline(), but only scan.go guards against the theoretical edge case where result.Sanitized is empty despite findings existing (which would truncate the file to zero bytes). Current scanners always populate Sanitized when they report findings, so this is not exploitable today, but the inconsistency is worth aligning for robustness.
    Remediation: Add the same fallback pattern used in scan.go — if result.Sanitized is empty, fall back to the original content before writing.

Info

  • [style] internal/cli/scan.go:163-167 — The Long description for scan output ("Reads text from stdin and scans for API keys, tokens, credentials, and sensitive patterns") no longer reflects the full pipeline behavior, which now includes unicode normalization before secret redaction.

Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One minor note inline.

Comment thread internal/cli/run.go Outdated
Copy link
Copy Markdown
Contributor

@waynesun09 waynesun09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10-Agent Review Squad — #1178

Agents dispatched: 10 (3x claude-coder, 3x claude-researcher, 2x gemini-code-review, 2x cursor-code-review)
Verified findings: 8 MEDIUM+ (1 CRITICAL, 3 HIGH, 4 MEDIUM) — 4 false positives removed after code verification

Summary

The core reordering fix is correct and well-tested across all three execution layers (Go pipeline, sandbox hooks, CLI scanning). The hook ordering invariant is well-documented and the assert.Less index-comparison test pattern is robust. All direct NewSecretRedactor() callers have been migrated to OutputPipeline().

However, the zero-width character coverage has gaps (U+200E/F, U+180E, U+206A-206F) that allow the same class of attack to succeed with different invisible characters — this should be addressed before merge.

Additional finding not in diff

MEDIUM — post-comment and post-review commands skip output sanitization entirely

internal/cli/postcomment.go and internal/cli/postreview.go read agent output and post directly to the GitHub API without calling OutputPipeline(). While output files should have been scanned during fullsend run, if these commands are invoked standalone (as documented in their --help), they could post unsanitized content containing leaked secrets or zero-width obfuscated tokens. Pre-existing gap, but issue #444's scope explicitly calls for auditing all output paths.

Suggestion: File a follow-up issue to add OutputPipeline().Scan() to these commands before they post to the forge API.

Findings by severity

Severity Count Key Theme
CRITICAL 1 Incomplete invisible char coverage (U+200E/F, U+180E, U+206A-206F bypass)
HIGH 3 Missing empty-Sanitized guard in run.go, Go/Python ANSI regex parity, missing post-NFKC rescan
MEDIUM 4 Stale doc/log in run.go, unscanned post-comment/post-review paths, test stderr discarded, supplementary variation selectors

See inline comments for details and suggested fixes.

Comment thread internal/security/scanner.go
Comment thread internal/security/scanner.go
Comment thread internal/cli/run.go Outdated
Comment thread internal/cli/run.go
Comment thread internal/security/hooks/posttool_chain_test.py Outdated
Expand invisible-character stripping (U+200E/F, U+180E, etc.), align Go
ANSI/OSC handling with Python including post-NFKC rescan, add empty
Sanitized guards and consistent scan output logging.

Co-authored-by: Cursor <[email protected]>
@ifireball
Copy link
Copy Markdown
Contributor Author

Addressed review feedback in 61b7351 + 674340d:

Hook ordering (original #444 fix) — unchanged; unicode_posttool still runs before secret_redact_posttool.

Review follow-ups:

  • Expanded invisible-character coverage in Go UnicodeNormalizer and Python unicode_posttool.py (U+200E/F, U+180E, U+034F, U+206A–206F, U+FFF9–FFFB, supplementary VS U+E0100–E01EF)
  • Aligned Go ANSI/OSC regex with Python; added post-NFKC escape rescan in Go
  • Added empty-Sanitized guard in scanOutputFiles; aligned doc comments and log messages with scan output
  • Chain tests: LTR-mark obfuscation case, stderr in hook test errors

Deferred (follow-up): post-comment / post-review standalone paths still skip OutputPipeline() — pre-existing gap, suggest separate issue.

CI green on latest push. Ready for re-review.

@ifireball
Copy link
Copy Markdown
Contributor Author

Replied to and resolved all inline threads addressed in 61b7351 / 674340d.

Deferred to follow-up issues (from 10-agent review — out of scope for this PR):

These commands can post unsanitized content when invoked standalone outside fullsend run.

@ralphbean ralphbean requested a review from waynesun09 May 21, 2026 19:36
Copy link
Copy Markdown
Contributor

@waynesun09 waynesun09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8-Agent Review Squad — 3 verified MEDIUM findings

Agents: 2x claude-coder, 2x claude-researcher, 2x gemini-code-review, 2x cursor-code-review

The core security fix is correct — hook reordering closes the zero-width character bypass (#444), expanded Unicode coverage is comprehensive, and Go/Python regex parity is aligned. CI is green. No merge-blocking issues.

After verification, 7 false positives were removed (pre-existing issues outside PR scope, misidentified Python as Go, etc.). 3 MEDIUM findings remain as inline comments:

  1. Empty-string Sanitized bypassPipeline.Scan conflates "" (sanitized-to-empty) with "" (no-changes), and CLI fallback writes original text back. Low practical risk but a semantic bug. (6/8 agents agreed)
  2. stripTerminalEscapes double-scanFindAllString + ReplaceAllString runs each regex twice; single-pass with ReplaceAllStringFunc halves the work on the hot path. (2/8 agents agreed)
  3. Missing wrong-order negative test — No Go test proves NewPipeline(SecretRedactor, UnicodeNormalizer) (wrong order) fails to catch obfuscated tokens. (2/8 agents agreed)

Comment thread internal/cli/run.go
Comment on lines 1383 to +1386
}
if writeErr := os.WriteFile(path, []byte(result.Sanitized), 0o644); writeErr != nil {
printer.StepWarn(fmt.Sprintf("Could not write redacted %s: %v", relPath, writeErr))
out := result.Sanitized
if out == "" {
out = text
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Empty-string fallback can bypass sanitization (6/8 review agents flagged this)

When UnicodeNormalizer strips ALL characters from input (e.g., text consisting entirely of zero-width chars), it sets Sanitized = "". Pipeline.Scan interprets Sanitized == "" as "no changes" (scanner.go:69) and skips updating current, so the original text passes through unsanitized. This fallback then writes the original unsanitized text back.

The practical security impact is low (requires input with ONLY invisible characters — no secrets to leak), but it's a semantic bug in the Pipeline abstraction that this PR makes newly reachable through OutputPipeline.

Suggestion: Guard on findings count:

out := result.Sanitized
if out == "" && len(result.Findings) > 0 {
    // Pipeline stripped everything — don't revert to original
    printer.StepWarn(fmt.Sprintf("Sanitized output empty despite %d finding(s) in %s", len(result.Findings), relPath))
}
if out == "" && len(result.Findings) == 0 {
    out = text
}

Or better, add a Modified bool field to ScanResult so Pipeline.Scan can distinguish "no changes" from "sanitized to empty string".

Comment on lines +54 to +61
if matches := reANSI.FindAllString(current, -1); len(matches) > 0 {
ansiCount = len(matches)
current = reANSI.ReplaceAllString(current, "")
}
if matches := reOSC.FindAllString(current, -1); len(matches) > 0 {
oscCount = len(matches)
current = reOSC.ReplaceAllString(current, "")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Double regex scan — FindAllString then ReplaceAllString (2/8 agents flagged)

Each regex is compiled and executed twice over the text: once to count matches, once to remove them. Since stripTerminalEscapes is called up to twice per Scan() invocation (pre- and post-NFKC), and runs on every Bash/Read/WebFetch result in the sandbox, this is 4-8 regex passes where 2-4 would suffice.

Suggestion: Single-pass with ReplaceAllStringFunc:

func stripTerminalEscapes(text string) (string, int, int) {
	ansiCount := 0
	current := reANSI.ReplaceAllStringFunc(text, func(string) string { ansiCount++; return "" })
	oscCount := 0
	current = reOSC.ReplaceAllStringFunc(current, func(string) string { oscCount++; return "" })
	return current, ansiCount, oscCount
}


t.Run("clean text passes both", func(t *testing.T) {
p := InputPipeline()
r := p.Scan("Normal commit message fixing a null pointer bug.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Missing wrong-order negative test (2/8 agents flagged)

These tests prove the correct pipeline order catches obfuscated tokens, but there's no test proving the wrong order (NewPipeline(NewSecretRedactor(), NewUnicodeNormalizer())) fails to catch them. A wrong-order test directly validates the ordering invariant documented in hooks.go:125-127 and would catch regressions if someone accidentally swaps the pipeline composition in OutputPipeline().

Suggestion:

t.Run("wrong order leaks zero-width obfuscated PAT", func(t *testing.T) {
    p := NewPipeline(NewSecretRedactor(), NewUnicodeNormalizer())
    plain := "ghp_FAKEtesttoken000000000000000000000000"
    var obfuscated strings.Builder
    for _, r := range plain {
        obfuscated.WriteRune(r)
        obfuscated.WriteRune('‌')
    }
    r := p.Scan(obfuscated.String())
    // Redactor runs first, sees obfuscated token, misses it
    assert.True(t, hasFinding(r, "zero_width"))
    assert.False(t, hasFinding(r, "github_pat"), "wrong order must NOT catch the obfuscated token")
})

Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge All reviewers approved — ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security: zero-width characters evade secret redaction due to hook ordering

4 participants