Skip to content

[REVIEW] scanner-tuning: add production safety and abort-condition gates #1645

@yanziwei

Description

@yanziwei

Skill Being Reviewed

Skill name: scanner-tuning
Skill path: skills/vuln-management/scanner-tuning/

False Positive Analysis

Benign scan policy that can be incorrectly scored as safe/tuned:

Scanner policy:
  - authenticated Windows and Linux checks enabled
  - dangerous/DoS plugins disabled
  - max simultaneous hosts: 20
  - max checks per host: 8
  - web application checks enabled for all HTTP services
  - scans scheduled weekly during a generic off-peak window

Production target context:
  - legacy login portal locks accounts after 5 bad attempts
  - password-spray detection blocks the scanner NAT after 100 failed logins
  - unauthenticated web checks submit forms with state-changing actions
  - fragile batch server spikes CPU above 95% during local package enumeration
  - no abort threshold is defined for error rate, lockout events, CPU load, or application 5xx rate

Why this is a false positive:

The current skill can classify this as Tuned because it uses credentialed scanning, disables explicitly dangerous plugins, documents scheduling, and has reasonable concurrency settings. That misses an operational safety gap: production scans can still create business impact through account lockouts, IDS/IPS blocking, API rate-limit exhaustion, state-changing web requests, database connection pressure, fragile endpoint CPU spikes, or batch-window contention. A scan policy is not safely tuned unless it has target-specific safety controls, monitored abort conditions, and an owner-approved recovery path.

Coverage Gaps

Missed variant 1: authenticated scan causes account lockouts or identity-control blocking

Credentialed scan uses DOMAIN\svc-scan against 2,000 Windows hosts.
Several hosts reject the credential because they moved to a different OU.
The scanner retries across SMB/WinRM/RDP.
AD lockout policy triggers and the service account is locked.
SIEM password-spray logic also blocks the scanner source.

Why it should be caught:

The skill recommends credential verification before a full scan, but it does not require lockout-safe retry limits, denylisted authentication protocols, pre-flight sample size, AD/IdP lockout threshold review, SIEM allowlisting with compensating monitoring, or an abort rule when authentication failures exceed a threshold. This is a real scanner-safety failure rather than a normal false positive or coverage issue.

Missed variant 2: web scanning mutates production state even without exploit plugins

DAST checks are enabled against an authenticated admin portal.
The scanner crawls "disable user", "send invoice", and "rotate API key" forms.
No test account boundaries, CSRF/action allowlist, safe-method restriction, or transaction rollback exists.

Why it should be caught:

The skill says web application checks should be scoped, but it does not distinguish read-only discovery from state-changing crawls or require production-safe DAST controls. For production web apps, tuning should require safe credentials, test tenants, blocked destructive routes, method/action allowlists, rate limits, and post-scan reconciliation for any created or changed records.

Missed variant 3: scan impact is not monitored with stop conditions

Weekly scan runs during off-peak hours.
During the scan, target 5xx rate rises from 0.1% to 8%, endpoint CPU stays above 95%,
and EDR reports service restarts on multiple hosts.
The scanner keeps running because no abort condition is configured.

Why it should be caught:

Scheduling alone does not make a scan safe. A tuned program should define safety telemetry and stop conditions: scanner error rate, target CPU/memory/load, application 5xx rate, queue depth, database connection exhaustion, account lockouts, IDS/IPS blocks, EDR crash/restart signals, and owner escalation contacts.

Edge Cases

  • A scan source may be allowlisted in a WAF or IDS, but that allowlist can hide real scanner-caused impact unless the exception is time-bound and monitored.
  • Some systems tolerate unauthenticated network probes but fail under authenticated local checks because package enumeration or registry access is expensive.
  • Industrial, medical, lab, and network appliance targets may need passive assessment or vendor-approved scan profiles even when they are not formally excluded.
  • Cloud and SaaS API scans can exhaust provider API quotas or trigger fraud/abuse throttles; these need quota-aware rate limits and retry budgets.
  • Scanner credentials with MFA bypass or conditional-access exceptions need compensating logging and periodic review; otherwise "safe scanning" becomes a privileged access backdoor.

Remediation Quality

  • Fix resolves the vulnerability
  • Fix doesn't introduce new security issues
  • Fix doesn't break functionality
  • Issues found: Add production scan safety gates covering lockout-safe authentication, state-changing web checks, target health monitoring, abort conditions, owner approval, rollback/recovery steps, exception expiry, and post-scan impact review.

Recommended additions:

  1. Add a Production Scan Safety Gate before Step 2 policy recommendations:

    • target owner approval and maintenance/change ticket;
    • fragile-system classification;
    • identity lockout and retry threshold review;
    • state-changing web/API action restrictions;
    • safe scan source allowlisting with compensating monitoring;
    • pre-flight canary scan on representative targets;
    • defined abort thresholds and escalation contact.
  2. Add findings/checks such as:

SCAN-SAFE-01: No lockout-safe retry limits for authenticated scan credentials.
SCAN-SAFE-02: Production DAST crawl can submit state-changing actions.
SCAN-SAFE-03: Scan policy lacks target-health monitoring and abort thresholds.
SCAN-SAFE-04: Scanner source allowlist has no time bound, owner, or compensating monitoring.
SCAN-SAFE-05: Fragile/regulated systems lack passive or vendor-approved scan profile.
SCAN-SAFE-06: Cloud/API scans have no quota-aware rate limit or retry budget.
  1. Update the output report with:
### Production Scan Safety
| Control | Current State | Risk | Required Evidence | Owner |
|---|---|---|---|---|
| Auth retry / lockout limits | | | AD/IdP policy, canary result | |
| State-changing web/API actions | | | route allowlist, test tenant, rollback plan | |
| Health and abort thresholds | | | CPU/5xx/error/lockout thresholds, escalation path | |
| Scanner allowlists | | | time-bound exception, SIEM monitoring, expiry | |
| Fragile system handling | | | vendor guidance or passive scan alternative | |

Comparison to Other Tools

Tool Catches this? Notes
Qualys / Tenable / Rapid7 policy settings Partial They provide concurrency, safe checks, credential tests, and sometimes performance settings, but the analyst still needs explicit production safety criteria and abort thresholds.
DAST tools such as ZAP/Burp Partial They can restrict scope and methods, but state-changing route governance and business rollback evidence are outside default scan tuning.
Semgrep / CodeQL No Static analyzers do not evaluate scanner operational safety or production impact controls.
Change-management systems Partial They can track approvals and windows, but do not enforce scanner-specific lockout, quota, route, and health-stop criteria.

Overall Assessment

Strengths:

The skill has strong guidance for false-positive patterns, credentialed scanning, severity overrides, cross-scanner correlation, and scan scheduling. It correctly warns against intrusive checks in production and against suppressing findings without evidence.

Needs improvement:

The operational safety model is too thin. It treats "dangerous plugins disabled" and "scheduled off-peak" as enough for production safety, but many real scan incidents come from authentication retries, state-changing web crawls, target resource exhaustion, API quotas, and scanner allowlists that remove normal guardrails.

Priority recommendations:

  1. Add a production scan safety gate with lockout, state-change, target-health, and abort-threshold evidence.
  2. Add report fields for canary scan results, scanner allowlist expiry, owner approval, and rollback/recovery contacts.
  3. Add a common pitfall warning that credentialed/non-DoS scans can still cause outages and account lockouts.

Bounty Info

  • I have read and agree to the CONTRIBUTING.md bounty terms
  • Preferred payment method: Crypto; payment details can be provided privately after maintainer acceptance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions