This note captures the current false-positive, miss, and confidence-scoring state of AgentShield after live validation on March 14, 2026.
Audit method:
- real scans were run with
npx tsx src/index.ts scan --format json - repos were chosen to cover template-heavy, hook-heavy, and cleaner baseline configs
- targeted synthetic scans were used to validate source-kind behavior that the live repos no longer exercise directly
- findings below distinguish three states: raw false positive, real miss, and true signal with the wrong score weight
Validation repos used:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/Users/affoon/Documents/GitHub/PMX-backend/Users/affoon/Documents/GitHub/basket-trader
Current scan snapshots:
everything-claude-code:103files,88findings, gradeB (75), now with only7high findings after specialist agent-capability severity downgrades,51findings withruntimeConfidence: template-example, and3info-levelhook-codefindingsPMX-backend:17files,27findings, gradeC (72), now with only1high finding after structured agent-capability downgrades and repo-scoped filesystem MCP gradingbasket-trader:3files,2findings, gradeA (99), now split into1medium and1low after the project-local exact-allowlist downgrade forhooks-no-pretooluse
Recent alerts reviewed on the current scanner:
everything-claude-code/mcp-configs/mcp-servers.jsonremains the largest alert cluster at51findings, but those all carryruntimeConfidence: template-example; the score model now caps that one file at10MCP deduction points, so the remaining issue is report interpretation and count volume, not raw grade distortionPMX-backend/.claude/settings.jsonremains the hottest active-runtime file at11findings, but its repo-scoped filesystem MCP is now gradedmediuminstead ofhighbasket-trader/launch-video/.claude/settings.local.jsonnow emits only2findings, bothproject-local-optional; those are still worth surfacing, but they now read as scope-limited exposure rather than repo-wide runtime risk- conclusion from this pass: the strongest remaining score inflation was template-catalog MCP debt, and that is now reduced; the main remaining noise is template count/interpretation and active-runtime remote MCP URLs
Targeted source-kind confirmation scans:
- a synthetic
docs/guide/settings.jsonexample now emitsruntimeConfidence: docs-example, rewrites titles asExample config: ..., and downgrades structural severities one level (Bash(*)moved fromcriticaltohigh,permissions-no-deny-listfromhightomedium,hooks-no-pretoolusefrommediumtolow) - a docs-only
docs/guide/CLAUDE.mdwith a realghp_...token now scans as a standalonedocs-examplefile and still emits acriticalsecret finding - a synthetic
examples/demo/CLAUDE.mdplusexamples/demo/settings.jsonexample bundle now emitsruntimeConfidence: docs-exampleinstead of looking like live runtime config - a synthetic
tutorials/demo-app/CLAUDE.mdplustutorials/demo-app/settings.jsontutorial bundle now emitsruntimeConfidence: docs-exampleinstead of looking like live runtime config - a synthetic
hooks/hooks.jsonmanifest now emitsruntimeConfidence: plugin-manifestand manifest-aware titles such asPlugin hook manifest: Hook sends data to external service - manifest-resolved non-shell implementations continue to emit
runtimeConfidence: hook-codewith narrow language-aware findings, such asHook code injects content into Claude context, and now also flag explicit remote shell payloads executed via child-process wrappers
High-signal evidence from the refresh scan:
everything-claude-codestill has51/88findings concentrated inmcp-configs/mcp-servers.json, but the MCP category now lands at90and the overall grade improves toB (75)because non-secrettemplate-examplefindings are both score-weighted at0.25xand capped at10deduction points per file and score categorycommands/kotlin-test.mdno longer emits any secret findings after markdown example-password suppression landed- docs-only nested trees under
docs/no longer emit agent noise, but standalone exampleCLAUDE.mdfiles are now still inventoried;everything-claude-codegained4extra scanned files (docs/ja-JP/examples/CLAUDE.md,docs/ko-KR/examples/CLAUDE.md,docs/zh-CN/CLAUDE.md, anddocs/zh-CN/examples/CLAUDE.md) without adding findings hooks/hooks.jsonnow emits0findings; direct manifest false positives forpermissions-no-block,hooks-silent-fail-*, andhooks-chained-commands-*are fixed.claude/settings.local.jsonno longer emitshooks-no-pretoolusewhen companionhooks/hooks.jsondefines real PreToolUse hooks, so the remaining hook gap is language-aware review ofhook-code, not manifest awareness.claude/settings.local.jsonineverything-claude-codeis now down to a singlepermissions-no-deny-listfinding; the prior fourpermissions-permissive-Bash(curl ...)findings for exact npmjs URLs are gone- exact interpreter wrapper permissions such as
Bash(node scripts/build.js --check)andBash(python3 ./tools/audit.py --format json)now stay quiet in synthetic audits; inline eval forms such asnode -eandpython -cstill triggerpermissions-permissive-* - exact read-only Docker inventory permissions such as
Bash(docker ps --format '{{.Names}}')andBash(docker image ls)now stay quiet in synthetic audits;docker runanddocker execstill triggerpermissions-permissive-* everything-claude-codenow resolveshooks/hooks.jsoninto20discoveredhook-codefiles underscripts/hooks/plus2shell hook implementations (scripts/hooks/run-with-flags-shell.shandskills/continuous-learning-v2/hooks/observe.sh)- those non-shell implementations now emit
3narrow info findings:session-start.jsfor explicitoutput(...)context injection, andsession-end.jsplusevaluate-session.jsfor direct transcript input access - comment-only shell-hook lines such as
# curl https://...,# tsc 2>/dev/null, or# avoid ~/.ssh/id_rsanow stay quiet in synthetic audits; the hook rules only flag those patterns when they appear in executable hook code agents/security-reviewer.mdno longer emitsagents-injection-surface; the old hit was caused by defensive example text such asfetch(userProvidedUrl)inside a security review checklistagents-explorer-write-*no longer fires on non-explorer workflows such aschief-of-staff.md,database-reviewer.md,e2e-runner.md,security-reviewer.md,.claude/subagents/e2e-tester.json, or.claude/slash-commands/test-coverage.json; the rule now uses role metadata and lead intro text instead of matching any later occurrence ofsearchagents-oversized-prompt-*no longer fires on example-heavy agents such aschief-of-staff.mdandplanner.md; the rule now measures effective prompt size instead of raw file size, discounting fenced code blocks and markdown tables- narrow specialist agent, subagent, and slash-command Bash/escalation findings now downgrade from
hightomedium;everything-claude-codedropped from29to7high findings andPMX-backenddropped from18to1 settings.local.jsonfindings now emitruntimeConfidence: project-local-optionaloutside MCP too, and non-secret project-local findings now score at0.75xinstead of full runtime weightPMX-backendnow emits9findings from structured slash-command JSON, which confirms that part of the previous miss is fixedPMX-backend/.claude/settings.jsonnow grades the repo-scopedfilesystemMCP atmediuminstead ofhigh; the only remaininghighactive-runtime MCP finding there is the remote URL transport serverbasket-traderremains a useful baseline sanity check because its2findings are plausible and not obviously inflatedbasket-tradernow gradesA (99)because its narrowsettings.local.jsonallowlist downgradespermissions-no-deny-listfrom high to medium and its project-local findings now score at0.75xinstead of full runtime weightbasket-trader/launch-video/.claude/settings.local.jsonnow downgradeshooks-no-pretoolusefrom medium to low because the config is project-local and narrowly scoped to exact local commandsPMX-backendstill emits onehooks-no-pretoolusefrom.claude/settings.json, which also looks legitimate because there is no companion manifest to suppress it
Behavior validation edge cases confirmed in targeted scans:
- a docs-only example tree like
docs/guide/CLAUDE.mdnow scans as a standalone example file, so a real GitHub PAT there emits acriticaldocs-examplesecret finding instead of disappearing - a docs/example tree with a runtime companion does the right thing: real secrets still stay
criticaland structural policy findings downgrade one severity level underruntimeConfidence: docs-example - an example bundle under
examples/, such asexamples/demo/CLAUDE.mdplusexamples/demo/settings.json, now emitsruntimeConfidence: docs-exampleand downgraded structural severities instead of looking like live config - a tutorial bundle under
tutorials/, such astutorials/demo-app/CLAUDE.mdplustutorials/demo-app/settings.json, now also emitsruntimeConfidence: docs-exampleinstead of looking like live config - a manifest-resolved non-shell hook implementation that shells out via
spawnSync('bash', ['-lc', 'curl ... | bash'])now emits ahighhook-codefinding for a remote shell payload executed through a child process
| Pattern | Class | Evidence | Current Status | Next Step |
|---|---|---|---|---|
| MCP template catalogs | Confidence / scoring gap | mcp-configs/mcp-servers.json in everything-claude-code |
Fixed for MCP scoring: template findings are relabeled, severity-adjusted, emit runtimeConfidence: template-example, structural non-secret findings score at 0.25x, and one template file is capped at 10 deduction points per score category |
Extend the same high-confidence language-aware treatment to non-MCP executable sources |
| Docs/example roots treated as live config | False positive | docs/zh-CN/CLAUDE.md in everything-claude-code |
Fixed: docs-only example CLAUDE.md files are inventoried as standalone examples while noisy nested subtrees stay suppressed |
Extend source-aware handling if other tutorial/example bundle names show up beyond the current path set |
| Example passwords in command docs | False positive | commands/kotlin-test.md in everything-claude-code |
Fixed: example/test markdown context now suppresses the hardcoded-password rule in docs/command paths | Extend fixtures as new doc/example patterns appear |
| Exact network allow rules mislabeled as overly permissive | False positive | .claude/settings.local.json in everything-claude-code |
Fixed: exact curl/wget commands with pinned URLs no longer trigger permissions-permissive-* |
Keep the exact-vs-broad allowlist distinction covered in tests |
| Exact interpreter wrapper permissions graded like arbitrary interpreter access | False positive | synthetic Bash(node scripts/build.js --check) and Bash(python3 ./tools/audit.py --format json) |
Fixed: exact script-wrapper commands no longer trigger permissions-permissive-*; inline eval/REPL forms such as node -e and python -c still do |
Keep the distinction narrow so exact script paths stay quiet without muting real interpreter abuse |
| Read-only Docker inventory commands graded like container execution | False positive | synthetic Bash(docker ps --format '{{.Names}}') and Bash(docker image ls) |
Fixed: exact read-only inventory commands no longer trigger permissions-permissive-*; docker run and docker exec still do |
Extend only if future live repos show other truly read-only forms worth allowlisting |
| Project-local no-deny-list severity too high for exact allowlists | Severity inflation | launch-video/.claude/settings.local.json in basket-trader |
Fixed: exact settings.local.json allowlists now downgrade permissions-no-deny-list from high to medium, emit runtimeConfidence: project-local-optional, and score at 0.75x for non-secret findings |
Keep project-local weighting aligned with future source kinds |
| Project-local missing PreToolUse overstated for exact local-only allowlists | Severity inflation | launch-video/.claude/settings.local.json in basket-trader |
Fixed: hooks-no-pretooluse now downgrades from medium to low for exact local-only settings.local.json allowlists |
Keep broader or network-capable project-local configs at medium |
| Specialist agent/subagent/slash-command capability findings overstated as high | Severity inflation | agents/security-reviewer.md in everything-claude-code; .claude/subagents/e2e-tester.json in PMX-backend |
Fixed: narrow specialist configs now downgrade Bash-access and escalation-chain findings from high to medium | Keep broader/generalist configs such as chief-of-staff.md at high |
| Repo-scoped filesystem MCP graded like unrestricted filesystem access | Severity inflation | .claude/settings.json in PMX-backend with filesystem server rooted at ./ |
Fixed: repo-scoped relative filesystem MCP now grades as medium while root/home path access stays high | Revisit HTTPS vendor MCP URL grading separately if live evidence justifies it |
| Defensive security-review prompts mislabeled as injection surface | False positive | agents/security-reviewer.md in everything-claude-code |
Fixed: defensive examples like fetch(userProvidedUrl) no longer trigger agents-injection-surface |
Add source/context metadata if future agent-review prompts need different confidence handling |
| Explorer/search write rule matched generic workflow text | False positive | agents/chief-of-staff.md, agents/security-reviewer.md in everything-claude-code; .claude/subagents/e2e-tester.json in PMX-backend |
Fixed: agents-explorer-write now uses path/name/description plus lead intro text instead of any later search mention in examples or workflow steps |
Keep the heuristic scoped to explicit role signals so procedural search for ... text does not regress |
| Oversized prompt rule counted examples and templates as live prompt size | False positive | agents/chief-of-staff.md, agents/planner.md in everything-claude-code |
Fixed: agents-oversized-prompt now uses effective prompt size, discounting fenced code blocks and markdown tables |
Keep the heuristic focused on actual prompt prose, not embedded examples |
| Hook manifest direct structural findings | False positive | hooks/hooks.json in everything-claude-code |
Fixed: settings-only and wrapper-noise findings no longer fire directly on the manifest | Keep regression coverage and avoid reintroducing settings-only checks on plugin manifests |
| Missing-hook logic ignores plugin manifests | False positive + miss | .claude/settings.local.json plus hooks/hooks.json in everything-claude-code |
Fixed for companion manifests: settings-only hooks-no-pretooluse is now suppressed when plugin manifests define PreToolUse hooks, and manifest findings emit runtimeConfidence: plugin-manifest |
Extend language-aware manifest handling beyond missing-hook suppression |
| Plugin hook implementations behind manifests | Miss reduced to partial coverage | scripts/hooks/session-start.js, session-end.js, evaluate-session.js in everything-claude-code |
Fixed for discovery and initial analysis: manifest references now resolve into repo-local hook-code and shell hook-script files, and non-shell hook code now emits low-noise info findings for explicit context injection and transcript access with runtimeConfidence: hook-code |
Extend language-aware coverage carefully for non-shell execution and external I/O |
| Docs/example configs still looked like live critical findings when intentionally scanned | False positive / presentation inflation | synthetic docs/guide/settings.json fixture |
Fixed: docs/example findings now emit runtimeConfidence: docs-example, use example-aware titles, and downgrade structural severities one level while preserving full weight for real secrets |
Extend source-path coverage if other tutorial/example roots appear outside docs/ and commands/ |
| Plugin hook manifests lacked declarative-context wording | Confidence / interpretation gap | synthetic hooks/hooks.json fixture |
Fixed: manifest findings now emit runtimeConfidence: plugin-manifest and render with manifest-aware titles/descriptions |
Tune severity only if future live repos show persistent manifest-only noise |
| Docs-only example trees could hide real secrets | Miss introduced by false-positive suppression | synthetic docs/guide/CLAUDE.md with a real ghp_... token and no runtime companion |
Fixed: docs-only example CLAUDE.md files now scan as standalone examples, and real secrets still stay critical |
Keep the subtree suppression narrow so only the standalone example file is added back, not the whole translated/example tree |
Tutorial/example bundles outside docs/ and commands/ still looked like live config |
False positive boundary | synthetic tutorials/demo-app/CLAUDE.md + tutorials/demo-app/settings.json |
Fixed for the expanded current path set: examples/, example/, samples/, sample/, demo/, demos/, tutorial/, tutorials/, guide/, guides/, cookbook/, and playground/ now classify as docs-example |
Extend source-kind classification to additional example-root names only when evidence justifies it |
| Non-shell hook code missed child-process execution chains | Miss | synthetic scripts/hooks/post-edit-format.js calling `spawnSync('bash', ['-lc', 'curl ... |
bash'])` | Fixed for explicit remote shell payloads: hook-code now flags child-process downloads piped into shell interpreters |
Structured slash-command JSON with allowedTools |
Miss | .claude/slash-commands/*.json in PMX-backend |
Fixed | Keep regression coverage and monitor other structured config types |
Generated .dmux worktree mirrors |
False positive | .dmux/worktrees/... in everything-claude-code |
Fixed | Keep generated-workspace exclusions narrow and evidence-based |
| Placeholder connection-string docs | False positive | commands/update-docs.md in everything-claude-code |
Fixed | Extend example-vs-real secret fixtures as new markdown cases appear |
| Benign hook probe suppression | Severity inflation | .claude/hooks/onEdit.sh, .claude/hooks/onStop.sh in PMX-backend |
Fixed | Keep /dev/null suppression high-signal for real concealment, not harmless probes |
| Comment-only shell-hook examples matched as live behavior | False positive | synthetic hook scripts with # curl https://..., # tsc 2>/dev/null, and # avoid ~/.ssh/id_rsa |
Fixed: comment-only lines are now ignored by hook exfiltration, silent-fail, and sensitive-path regex rules | Keep shell comment suppression line-scoped so inline executable behavior still flags |
These are the current high-frequency patterns from the latest live scans, ordered by how much they distort first-pass report reading.
Current evidence:
everything-claude-codestill carries51/88findings frommcp-configs/mcp-servers.json- those findings are mostly legitimate template inventory findings, not active runtime exposure
- this is why the scanner now emits
runtimeConfidence: template-example, discounts non-secret template findings to0.25x, and caps one template file at10deduction points per score category
Interpretation:
- the main source of scan noise is still "real but lower-confidence" inventory, not obviously wrong matching logic
- the right fix is source-aware labeling, wording, and weighting before suppression
Current evidence:
- docs/example
CLAUDE.mdfiles are now inventoried as standalone example files, not full live roots examples/demo/CLAUDE.mdplusexamples/demo/settings.jsonnow lands asdocs-examplerather than full-severity runtime config- example/test passwords in
commands/kotlin-test.mdno longer create critical secret findings
Interpretation:
- examples are a real source of shipped risk, but they are not the same thing as active runtime enablement
- structural findings should usually be downgraded and relabeled; committed real secrets should still stay critical
Current evidence:
- direct
hooks/hooks.jsonfalse positives are fixed - manifest findings now emit
runtimeConfidence: plugin-manifest hook-codefindings are still intentionally narrow: context injection, transcript access, and explicit remote shell payloads through child-process wrappers
Interpretation:
- declarative manifests and executable implementations are different source kinds and should not be scored or worded the same way
- broad shell-style pattern matching on non-shell hook code is still the main thing to avoid
Current evidence:
- after the recent fixes,
everything-claude-codeagent findings are mostly medium-severity capability findings plus a small set of broader/higher-risk generalist agents - the previous high-confidence false positives were heuristic bugs:
agents-explorer-write, defensive injection examples, and raw-file-size prompt inflation - those are now fixed, and narrow specialist capability findings are now severity-adjusted rather than reported as uniformly high
Interpretation:
- many remaining agent findings are policy findings, not false positives
- the right question is often "is this agent intentionally privileged?" rather than "is the scanner wrong?"
These cases still need analyst attention, but they usually do not justify weakening a rule.
template-example,docs-example, andplugin-manifestfindings that accurately describe shipped risk with lower-confidence wording are not false positives. They are visibility findings with adjusted confidence.project-local-optionalfindings in committedsettings.local.jsonare not false positives. They are narrower-scope exposure and should stay visible unless the underlying matcher is wrong.- agent findings on intentionally privileged workflows are not false positives just because the repo author meant to grant those capabilities. That is a product and policy decision, not a scanner bug.
- real secrets found in docs, examples, manifests, or local settings are not false positives. Those should continue to stay critical.
Current evidence:
- the latest live scan did not produce a new repeated bad matcher pattern
- the biggest remaining noisy cluster is still template MCP inventory with
runtimeConfidence: template-example, but it no longer dominates the score the way it used to - the smallest remaining scope-noise cluster is project-local config in
settings.local.json
Interpretation:
- current audit work should stay focused on wording, confidence, and score modeling before inventing new suppressions
- if a future scan produces a new high-count cluster outside those source kinds, that is the point to revisit rule logic first
These signatures come from the latest real-repo scans and are a faster way to recognize likely false-positive patterns before changing rules.
| Pattern signature | Seen in current scans | Likely meaning | First action |
|---|---|---|---|
one file dominates the report and almost all findings share runtimeConfidence: template-example |
everything-claude-code/mcp-configs/mcp-servers.json with 51 findings |
shipped template or catalog inventory is being read as live runtime by the operator, not by the scanner | keep the findings visible, but review wording, weighting, and runtime confirmation first |
many agents-* findings spread across agent, subagent, and slash-command files with explicit tool metadata |
everything-claude-code and PMX-backend agent clusters |
this is usually a policy cluster about intentionally privileged workflows, not a false-positive cluster | verify the role metadata and tool list before weakening the rule |
a very small cluster in settings.local.json with runtimeConfidence: project-local-optional |
basket-trader/launch-video/.claude/settings.local.json |
scope is already modeled; the remaining question is severity, not whether the finding should exist | tune severity or score only if the allowlist is exact and local-only |
| info-only findings in manifest-resolved non-shell hook files | scripts/hooks/session-start.js, session-end.js, evaluate-session.js |
the scanner has reached real implementation code but is intentionally using narrow rules | add behavior-specific language-aware rules only when a new explicit risky action is confirmed |
These file types recur in false-positive investigations. Use the file archetype before the finding count as the first hint for how to interpret alerts.
| File archetype | Typical runtimeConfidence |
Common noise pattern | Recommended review approach |
|---|---|---|---|
mcp-configs/*.json, config/mcp/*.json |
template-example |
many MCP findings from one inventory file | confirm whether the same servers are actually enabled in runtime config |
settings.local.json |
project-local-optional |
small clusters of permission or hook findings that look harsher than their real scope | verify whether the allowlist is exact, local-only, and committed for team use |
hooks/hooks.json |
plugin-manifest |
declarative metadata looks like direct execution | follow the referenced implementation before changing the rule |
scripts/hooks/*.js, scripts/hooks/*.ts reached through manifests |
hook-code |
real implementation code with intentionally narrow findings | only add behavior-specific rules when a concrete risky action is present |
.claude/subagents/*.json, .claude/slash-commands/*.json |
usually none | broad agents-* clusters on structured tool definitions |
treat as policy review first, then check for actual heuristic bugs |
These recommendations are based on the current live scan profile, not hypothetical scanner behavior.
- Group findings by
runtimeConfidencefirst. - Review
template-example,docs-example, andplugin-manifestfindings separately fromactive-runtime. - Only change matching logic when the finding is wrong for its own source kind, not merely lower confidence.
- If a finding is real but lower confidence, relabel it and adjust score weight instead of hiding it.
- Examples:
template-example,docs-example, andplugin-manifestare the current successful model. - This preserves operator visibility while reducing grade distortion.
- Do not blanket-suppress secrets in docs, examples, or manifests.
- Structural policy findings can be downgraded in examples; real committed credentials should stay critical.
- The recent docs-only
CLAUDE.mdfix exists specifically to avoid losing real secrets while suppressing example subtree noise.
- Companion manifests, referenced hook implementations, and project-local settings context matter more than isolated string matches.
- The
hooks-no-pretoolusefix worked because it looked across settings and manifest files, not because the core rule got weaker.
- Match explicit risky behavior such as context injection, transcript access, or child-process remote shell payloads.
- Do not treat every
spawnSync,execFileSync, or wrapper helper as suspicious by default. - This is the clearest path to reducing noise without reopening the older hook false positives.
- Prefer path, agent name, description, and lead instructions over arbitrary later prose.
- This is what fixed
agents-explorer-writeand reduced defensive-review false positives. - Treat remaining Bash-access findings as policy questions unless a matcher bug is obvious.
These conventions reduce false-positive interpretation risk without weakening the scanner.
- keep reusable MCP inventories under
mcp-configs/,config/mcp/, or similar template directories instead of mixing them into live runtime config - keep project-local overrides in
settings.local.json, and prefer exact local-only allow entries when possible - keep sample/tutorial bundles under example-like paths such as
docs/,examples/,tutorials/,demos/,guides/, orcookbook/ - keep declarative hook manifests in
hooks/hooks.jsonand the actual implementation code in separatescripts/hooks/orskills/**/hooks/files - keep large agent examples inside fenced code blocks and keep the lead role metadata concise, so the scanner can distinguish live instructions from examples
- clearly mark sample credentials as examples or fixtures and keep them in example-like paths rather than operational config directories
Use this before changing rule code. Most current scan noise is better solved by confidence modeling than by weakening detection.
| Scan symptom | Most likely cause | Preferred fix | Avoid |
|---|---|---|---|
one file dominates the report and most findings share runtimeConfidence: template-example |
template inventory is being read like active runtime config | relabel, reweight, and keep findings visible | suppressing the file entirely |
| docs or tutorial paths emit structural config findings | example content is being treated like live runtime config | classify as docs-example, downgrade structural severity, keep secrets strict |
blanket ignore for docs/ or markdown |
hooks/hooks.json emits shell-style findings |
declarative manifest is being evaluated like executable shell | apply plugin-manifest wording and cross-file context |
reusing settings.json hook rules directly on manifests |
settings.local.json looks overly severe |
project-local scope is not being modeled | mark as project-local-optional and adjust score weight or severity |
hiding local config findings entirely |
| agent rule triggers from examples, code fences, or later workflow text | heuristic is reading too much body text and too little role metadata | anchor on path, name, description, and lead intro | broad regexes over the full prompt body |
| non-shell hook implementation looks suspicious but shell rules do not apply | hook-code needs language-aware analysis |
add one narrow behavior rule with regression tests | flagging every spawnSync or wrapper helper |
Treat a rule change as justified only when most of these checks pass.
- Confirm the same pattern in at least one real repo scan, not just a synthetic fixture.
- Confirm the finding is wrong for its own source kind, not merely lower-confidence than
active-runtime. - Verify whether cross-file context would solve it before changing the matcher.
- Check whether the fix should be reclassification, severity adjustment, score weighting, or true suppression.
- Add one synthetic fixture that proves the false positive and one fixture that preserves nearby true positives.
- Re-run at least the targeted tests for that rule family and confirm the grade delta on the real repo that exposed the issue.
Use these rules to decide whether the scanner is noisy or the repo is simply risky.
- If the finding title is directionally correct and
runtimeConfidencealready saystemplate-example,docs-example, orplugin-manifest, prefer documentation and score changes over rule suppression. - If a finding disappears when the source file is moved from
settings.jsontosettings.local.json, that is probably a scope issue, not a matcher bug. - If a finding depends on whether another file exists, such as
hooks/hooks.jsonor a referenced hook implementation, fix it with cross-file context first. - If the finding is driven by examples, tables, or fenced code blocks inside an agent prompt, narrow the heuristic to effective prompt prose instead of lowering severity globally.
- If the finding is a real secret, do not treat it as false-positive work even when it lives in docs or examples.
Use this order when deciding how to reduce scanner noise. It keeps accuracy work focused on the right intervention.
| Finding kind | First response | Acceptable change | Avoid |
|---|---|---|---|
active-runtime |
assume the finding may be real | matcher narrowing only if the behavior itself is misidentified | downgrading just because the repo author intended the risk |
project-local-optional |
check whether scope, exactness, or local-only behavior is overstated | severity or score adjustment tied to project-local scope | suppressing the finding entirely |
template-example / docs-example |
treat as shipped-risk inventory, not live runtime | relabel, reweight, and downgrade structural findings | hiding real secrets or collapsing the finding into nothing |
plugin-manifest |
verify whether the manifest is declarative or the implementation is the real risk | cross-file context, manifest-aware wording, implementation follow-through | applying shell-execution rules directly to manifest JSON |
hook-code |
ask whether the code performs one explicit risky behavior | add one narrow language-aware rule with tests | generic child-process or wrapper matching |
| agent capability findings | decide whether this is policy or a scanner bug | role-metadata heuristics, prompt-size normalization, or better wording | treating all privileged agents as false positives |
Use these rules when reading current AgentShield output before more source-aware scoring lands.
| If the finding comes from | Treat it as | What to verify manually |
|---|---|---|
mcp-configs/, config/mcp/, configs/mcp/ with runtimeConfidence: template-example |
template debt, not confirmed runtime exposure | whether the same MCP server is actually enabled in mcp.json, .claude/mcp.json, .claude.json, or active settings.json |
settings.local.json with runtimeConfidence: project-local-optional |
real but project-local exposure | whether the file is checked in, distributed to teammates, or only used locally |
findings with runtimeConfidence: docs-example under docs/, commands/, translated bundles, or tutorial trees |
risky example or instructional config, not confirmed active runtime exposure | whether the path is operational config or just reference content |
findings with runtimeConfidence: plugin-manifest in hooks/hooks.json |
active declarative manifest with lower confidence than executable hook code | whether the risk comes from manifest metadata or from the referenced script implementation |
referenced shell hook files under scripts/hooks/ or skills/**/hooks/ |
high-confidence executable logic | whether the scanner found risky shell behavior in the implementation itself |
findings with runtimeConfidence: hook-code under scripts/hooks/*.js or similar |
real implementation logic with narrow language-aware review only | whether the code path injects data back into Claude context, consumes transcript input, executes remote shell payloads through child processes, or does additional non-shell execution that still needs deeper language-aware rules |
Observed in:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/mcp-configs/mcp-servers.json
Current behavior:
- template findings are rewritten with template-aware titles
- severities are lowered relative to active runtime config
- findings now include
runtimeConfidence: "template-example"in JSON, markdown, terminal, and HTML output - non-secret template findings now score at
0.25xrelative to active runtime findings
Example findings:
Template defines risky MCP server: filesystemTemplate MCP server "vercel" connects to external URLTemplate MCP server "github" uses npx -y (auto-install)
Why this still matters:
- the wording is now correct and the score impact is no longer equivalent to active runtime exposure
- in the refreshed
everything-claude-codescan,mcp-configs/mcp-servers.jsonis still the single largest finding source, but the MCP category now lands at68instead of being pinned to0 - this is no longer a scoring bug, but it is still an interpretation problem because template catalogs remain visible and need manual runtime confirmation
Observed in:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/docs/zh-CN/CLAUDE.md
Previous impact included translated docs under:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/docs/zh-CN/agents/build-error-resolver.md/Users/affoon/Documents/GitHub/ECC/everything-claude-code/docs/zh-CN/agents/security-reviewer.md
Why this was a false positive:
- the localized docs bundle is not the repo's active runtime config
- once a nested
CLAUDE.mdis found, the scanner treats that subtree like a real Claude root and emits agent findings such as Bash access and escalation chain
Current status:
- fixed for the current example path set: docs/example trees that only carry
CLAUDE.mdguidance no longer turn the whole subtree into a live Claude root, but the standalone exampleCLAUDE.mdfile is still scanned - the refreshed
everything-claude-codescan now emits0findings underdocs/zh-CN/agents/*, while still inventoryingdocs/zh-CN/CLAUDE.mdas an example file
Remaining follow-up:
- broaden source-aware handling beyond the current
docs/,commands/,examples/,samples/,demo/,tutorial/,guide/,cookbook/, andplayground/path set if similar tutorial/example bundles appear under other top-level names - keep docs-only example discovery narrow so only the standalone example file is added back, not the whole translated/example subtree
Observed in:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/commands/kotlin-test.md
Previous impact:
- the file triggered multiple critical
secrets-hardcoded-password-*findings on example values likeSecureP@ss1andshort - before the fix,
commands/kotlin-test.mdwas one of the noisiest files in the scan with5critical findings by itself
Why it was a false positive:
- these are sample values in documentation/test instructions, not committed operational credentials
- unlike the placeholder DB URL fix, password-like sample literals in markdown code examples are not yet source-aware
Current status:
- fixed for markdown files in example-like paths such as
docs/,commands/,examples/,tutorials/, anddemos/when surrounding context clearly indicates examples or tests - the refreshed
everything-claude-codescan now emits0findings fromcommands/kotlin-test.md
Guardrails on the fix:
- non-doc markdown such as agent prompts still triggers the hardcoded-password rule
- doc files without example/test context still trigger the rule
Observed in:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/hooks/hooks.json
Previous impact:
hooks/hooks.jsonused to trigger findings likepermissions-no-block, even though it is a plugin hook manifest rather than a normal permissions/settings file- session-start wrapper logic in that manifest triggered
hooks-silent-fail-*on benign path-resolution logic such asfind ... 2>/dev/null | head -n 1 - long wrapper commands in the manifest triggered
hooks-chained-commands-*even though the complexity belonged to dispatch logic rather than the actual hook implementation
Why that was a false positive:
- plugin manifests are active in a different way than
settings.json - rule logic written for classic hook/settings layouts can overstate risk when applied directly to manifest JSON
Current status:
- fixed for direct manifest findings
- the refreshed scan now emits
0findings fromhooks/hooks.json
What remains open:
- non-shell hook implementations are now discovered, but they intentionally do not run through the shell-pattern hook rules
- the next step is language-aware review for
hook-codefiles rather than more manifest resolution work
Observed in:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/.claude/settings.local.json
Previous impact:
- the file triggered four
permissions-permissive-Bash(curl ...)findings for exact npmjs download-stat commands - those entries were pinned to literal
https://api.npmjs.org/...URLs and did not contain wildcards, shell expansion, or arbitrary host selection
Why it was a false positive:
- the rule text said "agent can make arbitrary HTTP requests," but the matched permission entries were fixed commands, not general curl access
- a pinned
curlorwgetcommand is still network exposure, but it is not the same thing asBash(curl *)or a wildcard URL permission
Current status:
- fixed for exact
curl/wgetallow entries with literal URLs and no shell expansion - the refreshed
everything-claude-codescan now emits only one finding from.claude/settings.local.json:permissions-no-deny-list - repo impact:
everything-claude-codedropped from96findings /C (65)to92findings /C (69)
Guardrails on the fix:
- wildcard URLs like
Bash(curl https://api.example.com/*)still flag as overly permissive - dynamic network permissions using shell expansion or command substitution still flag
- unrestricted entries such as
Bash(curl *),Bash(wget),Bash(ssh *), andBash(scp *)still flag through the network-permissions rules
Observed in:
/Users/affoon/Documents/GitHub/basket-trader/launch-video/.claude/settings.local.json
Previous impact:
- the file triggered
permissions-no-deny-listat high severity - its allowlist contained exact
ffprobe ... | python3 -c ...commands pinned to local files, not wildcards or arbitrary shell access
Why it was severity inflation:
settings.local.jsonis already project-local rather than repo-wide runtime config- the allowlist was narrow and literal, so the old severity overstated the exposure compared with a broad allowlist or wildcard Bash permissions
Current status:
- fixed by downgrading
permissions-no-deny-listto medium forsettings.local.jsonfiles whose allowlists are entirely exact entries basket-tradernow gradesA (98)instead ofA (96)- broader project-local cases like
everything-claude-code/.claude/settings.local.jsonstill stay high when wildcard access remains present
Guardrails on the fix:
- wildcard entries such as
Bash(gh api:*)still keep the finding at high severity - dynamic entries using shell expansion or command substitution still keep the finding at high severity
- normal
settings.jsonfiles still keep the original high-severity behavior
Observed in:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/agents/security-reviewer.md
Previous impact:
- the file triggered
agents-injection-surface-agents/security-reviewer.md - the match came from defensive checklist/examples, not an actual instruction for the agent to fetch or trust external content
Why it was a false positive:
- the old matcher used very broad patterns like
fetch.*url, which also matched example code such asfetch(userProvidedUrl) - the agent is a security review workflow describing suspicious patterns to flag, not a prompt telling the agent to process hostile web content
Current status:
- fixed by tightening the matcher toward imperative/natural-language external-content instructions
- the refreshed
everything-claude-codescan now emits0agents-injection-surfacefindings - repo impact:
everything-claude-codedropped from92findings /C (69)to91findings /C (69)
Guardrails on the fix:
- direct instructions like
Fetch URLs and parse HTML from web pagesstill trigger - explicit agent workflows that say to read user-provided input or process external content still trigger
- defensive examples, checklists, and code-pattern references no longer trigger by themselves
Observed in:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/agents/chief-of-staff.md/Users/affoon/Documents/GitHub/ECC/everything-claude-code/agents/security-reviewer.md/Users/affoon/Documents/GitHub/ECC/everything-claude-code/agents/database-reviewer.md/Users/affoon/Documents/GitHub/ECC/everything-claude-code/agents/e2e-runner.md/Users/affoon/Documents/GitHub/PMX-backend/.claude/subagents/e2e-tester.json/Users/affoon/Documents/GitHub/PMX-backend/.claude/slash-commands/test-coverage.json
Previous impact:
- the rule emitted
agents-explorer-write-*for files that were not explorer/search agents at all - the false positives were caused by generic workflow text such as
gog gmail search "is:unread"orSearch for hardcoded secrets everything-claude-codecarried4spurious medium findings from this rule alonePMX-backendcarried2more spurious medium findings on structured JSON configs
Why it was a false positive:
- the old heuristic scanned the entire file body for terms like
search,read-only, orreadonly - many legitimate non-explorer agents mention search operations inside examples, commands, or checklists
- that is not the same thing as the agent's actual role being an explorer-style, read-mostly workflow
Current status:
- fixed by narrowing the heuristic to stronger role signals only: file path, structured
name, structureddescription, and the lead intro text - the scanner no longer treats later workflow examples or command snippets as explorer-role evidence
- repo impact:
everything-claude-code:94findings ->90PMX-backend:29findings ->27basket-trader: unchanged
Concrete before/after examples:
| File | Old trigger text | New interpretation |
|---|---|---|
agents/chief-of-staff.md |
gog gmail search "is:unread" --json |
command example only, not explorer role metadata |
agents/security-reviewer.md |
Search for hardcoded secrets |
review workflow step, not an explorer/read-only agent |
.claude/slash-commands/test-coverage.json |
Search coverage gaps and write tests |
procedural task description, not search-only role metadata |
Guardrails on the fix:
- explicit explorer roles such as
codebase explorer,read-only explorer, or configs whose name/description/path saysexplorerstill trigger - the positive test coverage still includes body-intro cases like
Fast codebase explorer for searching files - Bash access and escalation-chain findings are unaffected; only the explorer-role inference was narrowed
Observed in:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/agents/chief-of-staff.md/Users/affoon/Documents/GitHub/ECC/everything-claude-code/agents/planner.md
Previous impact:
- the rule emitted
agents-oversized-prompt-*based on raw file size alone - large fenced examples, output templates, and markdown tables pushed otherwise reasonable agent prompts over the
5000character threshold
Why it was a false positive:
- embedded examples help users understand how to invoke the agent, but they are not the same as additional live prompt instructions
chief-of-staff.mdandplanner.mdwere long mainly because of example-heavy markdown, not because the actual prose instructions were unusually large
Current status:
- fixed by switching the heuristic from raw file length to effective prompt size
- fenced code blocks and markdown tables are now discounted before applying the threshold
- repo impact:
everything-claude-code:90findings ->88agents/chief-of-staff.mdno longer emitsagents-oversized-promptagents/planner.mdno longer emitsagents-oversized-prompt
What still flags, correctly:
agents/architect.mdstill flags at5514 effective characters (6291 raw)agents/code-reviewer.mdstill flags at6296 effective characters (8747 raw)agents/kotlin-reviewer.mdstill flags at5223 effective characters (6612 raw)
Guardrails on the fix:
- genuinely large prose-heavy prompts still trigger even if they contain examples
- the rule still applies only to
agent-mdfiles, notCLAUDE.md - evidence now shows both effective and raw size so users can see why a prompt still flags
Observed in:
/Users/affoon/Documents/GitHub/PMX-backend/.claude/slash-commands/build-and-fix.json/Users/affoon/Documents/GitHub/PMX-backend/.claude/slash-commands/refactor-clean.json/Users/affoon/Documents/GitHub/PMX-backend/.claude/slash-commands/test-coverage.json
Previous behavior:
- these files were discovered as
skill-md - they produced
0findings despite explicitallowedToolsentries containingBash,Write,Edit,Glob, andGrep
Current status:
- fixed in this audit pass
- the scanner now treats structured JSON under
.claude/slash-commands/as agent-like tool config for Bash, missing-tools, and escalation-style checks PMX-backendnow emits9slash-command findings instead of0
Example new findings:
Slash command has Bash access: .claude/slash-commands/build-and-fix.jsonSlash command has full escalation chain: .claude/slash-commands/refactor-clean.json
Observed in:
/Users/affoon/Documents/GitHub/ECC/everything-claude-code/scripts/hooks/session-start.js/Users/affoon/Documents/GitHub/ECC/everything-claude-code/scripts/hooks/session-end.js/Users/affoon/Documents/GitHub/ECC/everything-claude-code/scripts/hooks/evaluate-session.js/Users/affoon/Documents/GitHub/ECC/everything-claude-code/skills/continuous-learning-v2/hooks/observe.sh
Current state:
discoverConfigFiles()now resolves repo-local manifest targets into discovered files- in
everything-claude-code, that currently surfaces20hook-codefiles underscripts/hooks/plus2shell hook implementations - shell implementations continue through existing hook rules
- non-shell
hook-codenow emits narrow language-aware findings for three explicit behaviors:output(...)calls that inject additional content back into Claude context- direct transcript input access via
input.transcript_path,process.env.CLAUDE_TRANSCRIPT_PATH, and equivalent language-specific accessors - remote downloads piped into
bash/shthrough child-process wrappers such asspawnSync("bash", ["-lc", "curl ... | bash"])
Current live findings from everything-claude-code:
scripts/hooks/session-start.jsnow emitshooks-code-context-outputscripts/hooks/session-end.jsnow emitshooks-code-transcript-accessscripts/hooks/evaluate-session.jsnow emitshooks-code-transcript-access
Security-relevant behaviors still outside static coverage:
- non-shell wrappers that use
spawnSync,execFileSync, or similar child-process execution are still quiet unless they also match the explicit context/transcript/remote-shell rules observe.shparses stdin JSON, inspectsagent_id, sources shared scripts, and writes observations under~/.claude/homunculus
These are the highest-value follow-ups after the current false-positive fixes.
| Remaining work | Concrete example | Desired behavior |
|---|---|---|
Broader hook-code language-aware coverage |
scripts/hooks/post-edit-format.js, scripts/hooks/post-edit-typecheck.js, scripts/hooks/quality-gate.js in everything-claude-code |
keep generic spawnSync / execFileSync wrappers quiet unless they also do risky external I/O, context injection, or unsafe shell composition |
| Broader example-root classification beyond the current path set | synthetic tutorial/example bundles outside docs/, commands/, examples/, samples/, demo/, tutorial/, guide/, cookbook/, and playground/ |
keep obvious examples downgraded and labeled without misclassifying real runtime config |
| Skill prompt coverage | freeform skill-md prompts that are not structured slash-command JSON |
analyze more of the real agent-like skill prompts without treating ordinary reference markdown as executable config |
Current output now includes runtimeConfidence on MCP findings:
active-runtimeproject-local-optionaltemplate-exampledocs-exampleplugin-manifesthook-code
Current mapping:
mcp.json,.claude/mcp.json,.claude.json, and activesettings.jsonstyle MCP findings map toactive-runtimesettings.local.jsonMCP findings map toproject-local-optional- template catalogs such as
mcp-configs/map totemplate-example - docs and tutorial config such as
docs/guide/settings.jsonmap todocs-example - declarative hook manifests such as
hooks/hooks.jsonmap toplugin-manifest - manifest-resolved non-shell implementations such as
scripts/hooks/session-start.jsmap tohook-code
This is the correct direction and should remain the baseline for future source-aware scoring.
Current behavior:
- non-secret
template-examplefindings now score at0.25x - non-secret
template-examplefindings are also capped at10deduction points per file and score category - non-secret
docs-examplefindings now score at0.25x - non-secret
project-local-optionalfindings now score at0.75x - non-secret
plugin-manifestfindings now score at0.5x hook-codefindings currently stay at full weight, but the active rules there are intentionally narrow and language-aware- committed real secrets still stay at full score weight even if they appear in a template file
everything-claude-codestill carries51template-example findings in the report, but now lands atB (75)withmcp: 90andpermissions: 83basket-tradernow lands atA (99)instead ofA (98)because itssettings.local.jsonfindings are clearly marked project-local and slightly discounted in score weight- the synthetic docs-example fixture now lands at
A (99)instead of looking like a live broken runtime config - the synthetic plugin-manifest fixture lands at
A (98)with visible, but correctly contextualized, manifest findings
Current weighting:
| Source kind | Example | Current/default score weight |
|---|---|---|
active-runtime |
mcp.json, .claude/mcp.json, active settings.json |
1.0x |
project-local-optional |
settings.local.json |
0.75x for structural non-secret findings, 1.0x for committed real secrets |
template-example |
mcp-configs/mcp-servers.json |
0.25x for structural findings, capped at 10 deduction points per file and score category, 1.0x for committed real secrets |
docs-example |
docs/guide/settings.json, commands/kotlin-test.md |
0.25x for structural findings, 1.0x for committed real secrets |
plugin-manifest |
hooks/hooks.json |
0.5x for structural findings, 1.0x for committed real secrets |
hook-code |
scripts/hooks/session-start.js |
1.0x, but current findings are narrow language-aware signals only |
Remaining follow-up:
- decide whether any future high-confidence
hook-coderules should carry their own weighting, or whether rule severity alone is sufficient
Current behavior:
- docs/example findings now emit
runtimeConfidence: docs-example - structural docs/example findings downgrade one severity level and render with
Example config: ...titles - plugin manifest findings now emit
runtimeConfidence: plugin-manifest - plugin manifest findings render with
Plugin hook manifest: ...titles and manifest-aware descriptions - non-shell hook implementations continue to emit
runtimeConfidence: hook-code
Confirmed examples:
- synthetic
docs/guide/settings.jsonnow emits:Example config: Overly permissive allow rule: Bash(*)with severityhighExample config: No deny list configuredwith severitymediumExample config: No PreToolUse security hooks configuredwith severitylow
- synthetic
hooks/hooks.jsonnow emits:Plugin hook manifest: Hook sends data to external servicePlugin hook manifest: No Stop hooks for session-end verification
Interpretation rule:
- docs/example findings are now visible but clearly presented as risky shipped guidance, not confirmed live runtime exposure
- plugin-manifest findings stay visible because the manifest is operational config, but they are now clearly distinguished from the referenced executable hook implementation
Remaining follow-up:
- expand example-path detection if tutorial/example bundles appear outside the current
docs/,commands/,examples/,samples/,demo/,tutorial/,guide/,cookbook/, andplayground/path set - keep real committed secrets at full weight even when they appear in examples or manifests
- continue to prefer rule-specific suppression over blanket source-based hiding
Current behavior:
- declarative manifests emit
runtimeConfidence: plugin-manifest - shell implementations emit findings directly from executable hook logic
- non-shell implementations emit
runtimeConfidence: hook-code - non-shell
hook-coderemains outside shell-pattern scoring unless a language-aware rule matches
This is now the right baseline:
- manifest entries can stay visible without looking like direct shell execution
- executable shell hooks remain the highest-confidence hook findings
- non-shell hook implementations are visible as implementation logic, but only through narrow language-aware rules today: explicit context injection, transcript access, and remote shell payloads executed via child-process wrappers
Current output model:
| Hook source | Example | Confidence |
|---|---|---|
| declarative manifest | hooks/hooks.json |
medium, emitted as plugin-manifest |
| shell implementation | skills/**/hooks/observe.sh, scripts/hooks/run-with-flags-shell.sh |
high |
| non-shell implementation | scripts/hooks/session-start.js, scripts/hooks/session-end.js |
medium-high, emitted as hook-code and reviewed with narrow language-aware rules |
| missing-control inference | hooks-no-pretooluse emitted from a settings-only view with no companion manifest evidence |
low until manifest references are resolved |
These issues were present earlier in the audit cycle and are now fixed:
- MCP template findings now emit a first-class
runtimeConfidencefield and render it in JSON, markdown, terminal, and HTML output - score weighting now discounts non-secret
template-examplefindings while preserving full weight for committed secrets settings.local.jsonfindings now emitruntimeConfidence: project-local-optional, and non-secret project-local findings score at0.75x- structured
.claude/slash-commands/*.jsonfiles withallowedToolsare now analyzed for Bash access and escalation chains - docs-only nested
CLAUDE.mdroots underdocs/are no longer treated as live Claude roots unless runtime config companions exist - generated
.dmux/worktrees/...mirrors are excluded from discovery - hook manifests such as
hooks/hooks.jsonno longer trigger thepermissions-no-blocksettings-only finding - hook manifests such as
hooks/hooks.jsonno longer trigger wrapper-stylehooks-silent-fail-*andhooks-chained-commands-*findings - missing-hook inference now consults companion plugin manifests before reporting
hooks-no-pretooluse - hook manifests now resolve repo-local executable targets into discovered shell
hook-scriptand non-shellhook-codefiles - docs/example findings now emit
runtimeConfidence: docs-example, downgrade structural severity one level, and render with example-aware titles/descriptions - plugin manifest findings now emit
runtimeConfidence: plugin-manifestand render with manifest-aware titles/descriptions - manifest-resolved
hook-codenow emits targeted findings for explicitoutput(...)context injection, transcript input access, and remote shell payloads executed through child-process wrappers instead of staying inventory-only - markdown example/test passwords in
docs/andcommands/no longer trigger hardcoded-password findings when the surrounding context is clearly instructional - placeholder DB connection strings like
postgres://user:pass@host:5432/dbno longer raise critical secret findings - benign
/dev/nullprobe patterns in hooks are now suppressed and same-line duplicates are deduplicated
- Expand language-aware analysis for manifest-resolved
hook-codebeyond explicit context injection, transcript access, and remote shell payloads without reintroducing wrapper noise - Broaden example-path detection if risky tutorial bundles appear outside the current
docs/,commands/,examples/,samples/,demo/,tutorial/,guide/,cookbook/, andplayground/path set - Revisit
plugin-manifestseverity only if future live repos show persistent manifest-only noise that survives the current wording and score weighting
A false-positive fix is ready to land when it meets all of the following:
- It reduces noise in a real repo scan that previously demonstrated the problem.
- It does not hide committed secrets or obvious execution behavior.
- It preserves or improves
runtimeConfidenceclarity instead of collapsing multiple source kinds together. - It comes with targeted regression tests for both the false-positive case and the closest true-positive neighbor.
- It improves the operator reading experience in the report, not just the raw finding count.
Use this taxonomy when classifying a noisy finding. It keeps audit notes consistent and makes it easier to choose the right fix.
| Type | Meaning | Typical fix | Example from this audit |
|---|---|---|---|
| matcher bug | the rule matched the wrong behavior entirely | narrow the matcher and add true-positive guard tests | agents-explorer-write matching generic workflow text |
| source-confidence inflation | the finding is directionally right but presented like active runtime | add or refine runtimeConfidence, wording, and score weight |
MCP template catalogs under mcp-configs/ |
| severity inflation | the finding is real but the severity overstates likely impact | lower severity only for the narrower source or narrower pattern | permissions-no-deny-list on exact settings.local.json allowlists |
| missing context | the rule lacks cross-file or structural evidence | use manifest, companion config, or referenced implementation context | hooks-no-pretooluse before manifest awareness |
| presentation noise | the finding is technically correct but explained poorly for operators | rewrite title/description before touching detection | Plugin hook manifest: ... wording |
| coverage miss | risky behavior is absent from static detection | add a narrow new rule and prove it does not reopen old noise | hook-code remote shell payloads via child-process wrappers |
Use this worksheet when reviewing a new scan. Copy it into an issue or PR description if the scan is noisy.
| Question | Record |
|---|---|
| repo scanned | |
| date | |
| files scanned | |
| total findings / grade | |
| top 5 files by finding count | |
findings by runtimeConfidence |
|
| top source kind causing noise | |
| likely false positives | |
| likely real misses | |
| suspected severity inflation | |
| proposed fix type | matcher / confidence / severity / context / docs only |
| real-repo proof path | |
| synthetic fixture added | |
| score delta after fix |
Recommended interpretation:
- If the top noisy file is template- or docs-heavy, start with confidence modeling.
- If the top noisy file is active runtime config, assume the findings are more likely to be real until proven otherwise.
- If the issue disappears only when a companion file is present, classify it as missing context, not a bad matcher.
Use this template when opening a scanner-accuracy issue. It keeps future audit work reproducible.
## False-Positive Report
- Repo:
- Scanner commit/version:
- File:
- Finding ID(s):
- Severity:
- runtimeConfidence:
### Why this looks wrong
- [fill in]
### Evidence
- real repo reproduction:
- minimal synthetic reproduction:
- nearby true-positive that must keep working:
### Preferred fix shape
- matcher narrowing / confidence / severity / score / docs only
### Notes
- [fill in]Required evidence before treating it as a rule bug:
- one real repo reproduction
- one minimal fixture
- one nearby true-positive guard case
- explanation for why
runtimeConfidenceand current wording are still insufficient
Before including false-positive work in a release, verify all of the following:
- The affected rule family has targeted regression coverage.
- At least one real repo used in this audit shows a measurable improvement.
- No committed-secret detection regressed in docs, examples, manifests, or local settings.
- The fix improves report readability through title, severity, or
runtimeConfidence, not only raw count reduction. - The audit note and README guidance are updated so operators understand the new behavior.
To keep user expectations aligned with current scanner behavior, the main docs should continue to say:
runtimeConfidenceis now emitted for MCP findings,settings.local.json, docs/examples, plugin manifests, and manifest-resolved non-shell hook code- template findings are useful evidence of shipped risk, not proof of active enablement
- docs/example findings are intentionally visible as risky shipped guidance, but they are downgraded and labeled as examples rather than active runtime exposure
- template MCP catalogs and plugin hook manifests still need manual interpretation
- manifest-resolved
hook-codefiles are now discovered and emit targeted findings for explicit context injection, transcript handling, and remote shell payloads, but broader language-aware implementation coverage is still pending - docs-only example trees are now inventoried as standalone example
CLAUDE.mdfiles, while noisy nested subtrees remain suppressed unless runtime companions exist - example bundles outside the current
docs/,commands/,examples/,samples/,demo/,tutorial/,guide/,cookbook/, andplayground/heuristics may still need manual interpretation until broader example-root classification is added - the audit document is the source of truth for the remaining accuracy caveats, which are now mainly about broader
hook-codeanalysis rather than missing source metadata