From 5314ecacb381ff5c4dd57b02c28c0b2d19054025 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E7=87=95=E8=B5=84=E4=BC=9F?= <>
Date: Mon, 8 Jun 2026 06:43:33 +0800
Subject: [PATCH] Add scanner production safety gates

---
 .../vuln-management/scanner-tuning/SKILL.md   | 45 ++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/skills/vuln-management/scanner-tuning/SKILL.md b/skills/vuln-management/scanner-tuning/SKILL.md
index 21f8ca12..b6446998 100644
--- a/skills/vuln-management/scanner-tuning/SKILL.md
+++ b/skills/vuln-management/scanner-tuning/SKILL.md
@@ -13,7 +13,7 @@ phase: [operate]
 frameworks: [CVSS-4.0, CWE]
 difficulty: intermediate
 time_estimate: "30-60min"
-version: "1.0.0"
+version: "1.1.0"
 author: unitoneai
 license: MIT
 allowed-tools: Read, Grep, Glob
@@ -50,6 +50,7 @@ Before starting, collect or confirm:
 - [ ] **Authentication status:** Are scans currently authenticated (credentialed) or unauthenticated?
 - [ ] **False positive examples:** Specific findings suspected or confirmed as false positives, with evidence
 - [ ] **Scan frequency:** Current scan schedule and any performance constraints
+- [ ] **Production safety constraints:** Change window, target owner, fragile systems, account lockout thresholds, API quotas, and abort contacts
 - [ ] **Result volume:** Approximate number of findings per scan cycle and false positive rate if known
 - [ ] **Compliance requirements:** Whether scans must meet specific compliance mandates (PCI ASV, DISA STIG, CIS Benchmark)
 - [ ] **Multi-scanner context:** If using multiple scanners, which ones and how results are currently correlated
@@ -138,6 +139,34 @@ Configure or optimize scan policies to balance detection coverage, accuracy, and
 | **Time-based exclusions** | Systems that cannot be scanned during business hours | Scan scheduling adjustment (see Step 6) |
 | **Credential exclusions** | Systems where credentialed scanning is not permitted by policy | Documented reason; accept reduced detection accuracy |
 
+##### 2d. Production Scan Safety Gate
+
+Before recommending any production scan policy, verify that the scan can run without creating avoidable operational impact. Disabling explicitly dangerous plugins is necessary but not sufficient: credentialed checks, web crawling, API enumeration, and high-concurrency probes can still cause account lockouts, service degradation, quota exhaustion, or state changes.
+
+| Safety Control | Required Evidence | Failure Mode Prevented |
+|---|---|---|
+| **Target owner approval** | Change ticket, maintenance window, escalation contact | Uncoordinated scan during business-critical activity |
+| **Canary scan** | Successful pre-flight against representative hosts/apps | Fleet-wide failure from untested credentials or plugin set |
+| **Lockout-safe authentication** | AD/IdP lockout thresholds, retry limits, credential success-rate threshold | Service account lockout, password-spray alerting, scanner source block |
+| **State-changing web/API controls** | Test tenant, safe account, route/method allowlist, destructive action blocklist | DAST crawl disabling users, sending transactions, rotating keys, mutating records |
+| **Target health monitoring** | CPU/load, application 5xx rate, queue depth, DB connections, EDR crash/restart signals | Scanner-induced outage or degraded service |
+| **Abort thresholds** | Stop conditions for auth failures, error rates, latency, health checks, IDS/IPS blocks | Scan continues after it is clearly harming targets |
+| **Allowlist governance** | Time-bound scanner exception, owner, expiry, compensating SIEM monitoring | Scanner allowlist hides real impact or becomes permanent bypass |
+| **Fragile system handling** | Passive assessment, vendor-approved profile, or explicit risk acceptance | Legacy SCADA, medical, lab, appliance, or OT systems crash under active probes |
+| **Cloud/API quota budget** | Rate limit, retry budget, provider quota headroom, backoff settings | API throttling, cost spikes, or control-plane denial of service |
+
+**Production scan safety findings:**
+
+```
+SCAN-SAFE-01: No lockout-safe retry limits for authenticated scan credentials
+SCAN-SAFE-02: Production DAST crawl can submit state-changing actions
+SCAN-SAFE-03: Scan policy lacks target-health monitoring and abort thresholds
+SCAN-SAFE-04: Scanner source allowlist has no expiry, owner, or compensating monitoring
+SCAN-SAFE-05: Fragile or regulated systems lack passive or vendor-approved scan profile
+SCAN-SAFE-06: Cloud/API scans have no quota-aware rate limit or retry budget
+SCAN-SAFE-07: No canary scan before broad production rollout
+```
+
 ### Step 3: Authenticated vs. Unauthenticated Scanning
 
 Evaluate and configure credential-based (authenticated) scanning for improved accuracy.
@@ -175,6 +204,7 @@ Authentication Configuration:
 - Cloud/API Auth:      [API key with read-only role | N/A]
 - Credential Rotation: [Every N days]
 - Last Verification:   [YYYY-MM-DD, success rate: [N]%]
+- Lockout Guardrails:  [Max retries, failure threshold, abort action]
 ```
 
 ### Step 4: Severity Override Criteria
@@ -322,6 +352,17 @@ Highlight the most impactful tuning recommendations.]
 | Scan Frequency | [Current schedule] | [Recommended schedule] | [Priority] |
 | Port Range | [Current range] | [Recommended range] | [Priority] |
 
+### Production Scan Safety
+
+| Control | Current State | Risk | Required Evidence | Owner |
+|---|---|---|---|---|
+| Auth retry / lockout limits | [Configured / Missing] | [Risk] | [AD/IdP policy, canary result] | [Owner] |
+| State-changing web/API actions | [Controlled / Uncontrolled] | [Risk] | [route allowlist, test tenant, rollback plan] | [Owner] |
+| Health and abort thresholds | [Configured / Missing] | [Risk] | [CPU/5xx/error/lockout thresholds, escalation path] | [Owner] |
+| Scanner allowlists | [Time-bound / Permanent / None] | [Risk] | [exception ticket, SIEM monitoring, expiry] | [Owner] |
+| Fragile system handling | [Passive / Vendor profile / Active scan / Excluded] | [Risk] | [vendor guidance or risk acceptance] | [Owner] |
+| Cloud/API quota controls | [Configured / Missing / N/A] | [Risk] | [rate limits, retry budget, quota headroom] | [Owner] |
+
 ### False Positive Analysis
 
 | Plugin/Check ID | CVE ID | FP Pattern | Affected Assets | Evidence | Recommendation |
@@ -399,6 +440,8 @@ Common Weakness Enumeration. A community-developed list of software and hardware
 
 5. **Not correlating results across scanners.** Organizations running multiple scanners often treat each scanner's output independently, leading to duplicate remediation efforts for the same vulnerability and missed findings that only one scanner detects. Establish a correlation process using CVE ID as the primary key and CWE as a fallback for non-CVE findings.
 
+6. **Assuming non-DoS credentialed scans are production-safe.** Account lockouts, scanner allowlist side effects, state-changing web crawls, API quota exhaustion, and fragile endpoint load can still create outages. Require canary scans, target-health monitoring, and abort thresholds before broad production runs.
+
 ---
 
 ## Prompt Injection Safety Notice