fix(security): run unicode normalization before secret redaction#1178
fix(security): run unicode normalization before secret redaction#1178ifireball wants to merge 3 commits into
Conversation
Zero-width characters interleaved in token-shaped strings bypassed secret regexes when redaction ran before unicode stripping in sandbox post-tool hooks. Reorder hooks to normalize first, extend OutputPipeline and output scanning paths, and add regression tests. Fixes fullsend-ai#444 Co-authored-by: Cursor <[email protected]>
Site previewPreview: https://696ce045-site.fullsend-ai.workers.dev Commit: |
ReviewFindingsNo findings. The core fix correctly swaps hook ordering in Additional hardening — expanded zero-width regex (aligned between Go and Python), ECMA-48 compliant ANSI CSI regex, OSC escape handling, supplementary variation selector stripping, and post-NFKC escape re-scanning — is well-scoped and strengthens the normalizer against adjacent evasion vectors. Tests are thorough: Previous runReviewFindingsLow
Info
|
ralphbean
left a comment
There was a problem hiding this comment.
LGTM. One minor note inline.
waynesun09
left a comment
There was a problem hiding this comment.
10-Agent Review Squad — #1178
Agents dispatched: 10 (3x claude-coder, 3x claude-researcher, 2x gemini-code-review, 2x cursor-code-review)
Verified findings: 8 MEDIUM+ (1 CRITICAL, 3 HIGH, 4 MEDIUM) — 4 false positives removed after code verification
Summary
The core reordering fix is correct and well-tested across all three execution layers (Go pipeline, sandbox hooks, CLI scanning). The hook ordering invariant is well-documented and the assert.Less index-comparison test pattern is robust. All direct NewSecretRedactor() callers have been migrated to OutputPipeline().
However, the zero-width character coverage has gaps (U+200E/F, U+180E, U+206A-206F) that allow the same class of attack to succeed with different invisible characters — this should be addressed before merge.
Additional finding not in diff
MEDIUM — post-comment and post-review commands skip output sanitization entirely
internal/cli/postcomment.go and internal/cli/postreview.go read agent output and post directly to the GitHub API without calling OutputPipeline(). While output files should have been scanned during fullsend run, if these commands are invoked standalone (as documented in their --help), they could post unsanitized content containing leaked secrets or zero-width obfuscated tokens. Pre-existing gap, but issue #444's scope explicitly calls for auditing all output paths.
Suggestion: File a follow-up issue to add OutputPipeline().Scan() to these commands before they post to the forge API.
Findings by severity
| Severity | Count | Key Theme |
|---|---|---|
| CRITICAL | 1 | Incomplete invisible char coverage (U+200E/F, U+180E, U+206A-206F bypass) |
| HIGH | 3 | Missing empty-Sanitized guard in run.go, Go/Python ANSI regex parity, missing post-NFKC rescan |
| MEDIUM | 4 | Stale doc/log in run.go, unscanned post-comment/post-review paths, test stderr discarded, supplementary variation selectors |
See inline comments for details and suggested fixes.
Expand invisible-character stripping (U+200E/F, U+180E, etc.), align Go ANSI/OSC handling with Python including post-NFKC rescan, add empty Sanitized guards and consistent scan output logging. Co-authored-by: Cursor <[email protected]>
Co-authored-by: Cursor <[email protected]>
|
Addressed review feedback in 61b7351 + 674340d: Hook ordering (original #444 fix) — unchanged; Review follow-ups:
Deferred (follow-up): CI green on latest push. Ready for re-review. |
|
Replied to and resolved all inline threads addressed in 61b7351 / 674340d. Deferred to follow-up issues (from 10-agent review — out of scope for this PR):
These commands can post unsanitized content when invoked standalone outside |
waynesun09
left a comment
There was a problem hiding this comment.
8-Agent Review Squad — 3 verified MEDIUM findings
Agents: 2x claude-coder, 2x claude-researcher, 2x gemini-code-review, 2x cursor-code-review
The core security fix is correct — hook reordering closes the zero-width character bypass (#444), expanded Unicode coverage is comprehensive, and Go/Python regex parity is aligned. CI is green. No merge-blocking issues.
After verification, 7 false positives were removed (pre-existing issues outside PR scope, misidentified Python as Go, etc.). 3 MEDIUM findings remain as inline comments:
- Empty-string
Sanitizedbypass —Pipeline.Scanconflates""(sanitized-to-empty) with""(no-changes), and CLI fallback writes original text back. Low practical risk but a semantic bug. (6/8 agents agreed) stripTerminalEscapesdouble-scan —FindAllString+ReplaceAllStringruns each regex twice; single-pass withReplaceAllStringFunchalves the work on the hot path. (2/8 agents agreed)- Missing wrong-order negative test — No Go test proves
NewPipeline(SecretRedactor, UnicodeNormalizer)(wrong order) fails to catch obfuscated tokens. (2/8 agents agreed)
| } | ||
| if writeErr := os.WriteFile(path, []byte(result.Sanitized), 0o644); writeErr != nil { | ||
| printer.StepWarn(fmt.Sprintf("Could not write redacted %s: %v", relPath, writeErr)) | ||
| out := result.Sanitized | ||
| if out == "" { | ||
| out = text |
There was a problem hiding this comment.
[MEDIUM] Empty-string fallback can bypass sanitization (6/8 review agents flagged this)
When UnicodeNormalizer strips ALL characters from input (e.g., text consisting entirely of zero-width chars), it sets Sanitized = "". Pipeline.Scan interprets Sanitized == "" as "no changes" (scanner.go:69) and skips updating current, so the original text passes through unsanitized. This fallback then writes the original unsanitized text back.
The practical security impact is low (requires input with ONLY invisible characters — no secrets to leak), but it's a semantic bug in the Pipeline abstraction that this PR makes newly reachable through OutputPipeline.
Suggestion: Guard on findings count:
out := result.Sanitized
if out == "" && len(result.Findings) > 0 {
// Pipeline stripped everything — don't revert to original
printer.StepWarn(fmt.Sprintf("Sanitized output empty despite %d finding(s) in %s", len(result.Findings), relPath))
}
if out == "" && len(result.Findings) == 0 {
out = text
}Or better, add a Modified bool field to ScanResult so Pipeline.Scan can distinguish "no changes" from "sanitized to empty string".
| if matches := reANSI.FindAllString(current, -1); len(matches) > 0 { | ||
| ansiCount = len(matches) | ||
| current = reANSI.ReplaceAllString(current, "") | ||
| } | ||
| if matches := reOSC.FindAllString(current, -1); len(matches) > 0 { | ||
| oscCount = len(matches) | ||
| current = reOSC.ReplaceAllString(current, "") | ||
| } |
There was a problem hiding this comment.
[MEDIUM] Double regex scan — FindAllString then ReplaceAllString (2/8 agents flagged)
Each regex is compiled and executed twice over the text: once to count matches, once to remove them. Since stripTerminalEscapes is called up to twice per Scan() invocation (pre- and post-NFKC), and runs on every Bash/Read/WebFetch result in the sandbox, this is 4-8 regex passes where 2-4 would suffice.
Suggestion: Single-pass with ReplaceAllStringFunc:
func stripTerminalEscapes(text string) (string, int, int) {
ansiCount := 0
current := reANSI.ReplaceAllStringFunc(text, func(string) string { ansiCount++; return "" })
oscCount := 0
current = reOSC.ReplaceAllStringFunc(current, func(string) string { oscCount++; return "" })
return current, ansiCount, oscCount
}|
|
||
| t.Run("clean text passes both", func(t *testing.T) { | ||
| p := InputPipeline() | ||
| r := p.Scan("Normal commit message fixing a null pointer bug.") |
There was a problem hiding this comment.
[MEDIUM] Missing wrong-order negative test (2/8 agents flagged)
These tests prove the correct pipeline order catches obfuscated tokens, but there's no test proving the wrong order (NewPipeline(NewSecretRedactor(), NewUnicodeNormalizer())) fails to catch them. A wrong-order test directly validates the ordering invariant documented in hooks.go:125-127 and would catch regressions if someone accidentally swaps the pipeline composition in OutputPipeline().
Suggestion:
t.Run("wrong order leaks zero-width obfuscated PAT", func(t *testing.T) {
p := NewPipeline(NewSecretRedactor(), NewUnicodeNormalizer())
plain := "ghp_FAKEtesttoken000000000000000000000000"
var obfuscated strings.Builder
for _, r := range plain {
obfuscated.WriteRune(r)
obfuscated.WriteRune('')
}
r := p.Scan(obfuscated.String())
// Redactor runs first, sees obfuscated token, misses it
assert.True(t, hasFinding(r, "zero_width"))
assert.False(t, hasFinding(r, "github_pat"), "wrong order must NOT catch the obfuscated token")
})
Summary
unicode_posttool.pyruns beforesecret_redact_posttool.py, preventing zero-width characters from evading secret regexes and reconstructing tokens in LLM context.UnicodeNormalizertoOutputPipeline()and use it in host output scanning (scan output,scanOutputFiles).Fixes #444
Test plan
go test ./internal/security/... ./internal/cli/...python3 -m unittest internal/security/hooks/posttool_chain_test.pypython3 -m unittest internal/security/hooks/unicode_posttool_test.pyMade with Cursor