UnitOneAI · shensz2017 · Jun 8, 2026
diff --git a/skills/ai-security/ai-data-privacy/SKILL.md b/skills/ai-security/ai-data-privacy/SKILL.md
@@ -13,7 +13,7 @@ phase: [design, build, review, operate]
 frameworks: [NIST-AI-RMF-1.0, OWASP-LLM02-2025]
 difficulty: intermediate
 time_estimate: "30-60min"
-version: "1.0.0"
+version: "1.0.1"
 author: unitoneai
 license: MIT
 allowed-tools: Read, Grep, Glob
@@ -56,6 +56,7 @@ Invoke this skill when any of the following conditions are true:
 - Data retention or deletion policies need to be assessed for AI-specific components (vector stores, conversation logs, training datasets, embeddings).
 - The system uses a third-party LLM API where user data is transmitted to the provider.
 - Consent management for AI training data usage is under review.
+- Data subject erasure requests or consent withdrawals must be propagated through AI-derived stores and model lineage.
 
 Do NOT invoke this skill for:
 
@@ -79,6 +80,7 @@ Before beginning the assessment, gather the following. If any item is unavailabl
 | Logging configuration | Application code, infrastructure configs | Reveals what prompt/completion data is captured |
 | Training/fine-tuning data documentation | Data pipeline docs, dataset cards | Identifies personal data in training corpus |
 | Consent management implementation | Frontend code, API code, database schemas | Shows how user consent is captured and enforced |
+| Erasure propagation evidence | DSAR tooling, deletion jobs, vector DB metadata, training manifests, provider receipts | Shows whether deletion and consent withdrawal reach derived AI data stores |
 | Data classification scheme | Governance documentation | Defines sensitivity levels applied to AI data flows |
 | Regulatory requirements | Compliance documentation, legal counsel input | Identifies applicable data protection obligations |
 
@@ -379,12 +381,62 @@ Grep: "consent_check|is_consented|has_consent|filter_consented|exclude_opted_out
 
 ---
 
+### Step 7 -- Erasure and Consent Withdrawal Propagation
+
+Assess whether data subject erasure requests, consent withdrawals, and training opt-outs propagate through all AI-derived stores, not only the primary source database.
+
+**What to look for in code and configuration:**
+
+- DSAR/deletion jobs that delete only the source document but leave embeddings, vector metadata, RAG chunks, prompt logs, analytics events, cached retrieval results, evaluation datasets, fine-tuning snapshots, or feature stores intact.
+- Consent withdrawal recorded in a user profile but not enforced in dataset export, fine-tuning, evaluation, or provider upload jobs.
+- Training dataset snapshots without subject-level membership indexes, tombstones, or exclusion manifests.
+- Vector stores that cannot identify all chunks/embeddings derived from a deleted source record or subject identifier.
+- Model checkpoints or fine-tuned model bundles with no documented decision on retraining, unlearning, suppression, or risk acceptance after erasure requests.
+- Third-party LLM or vector provider calls without deletion receipts, retention confirmation, or subprocessor propagation evidence.
+- Backups and disaster recovery stores that retain AI-derived data beyond the documented erasure/retention exception period.
+
+**Detection methods using allowed tools:**
+
+```
+# Find erasure and DSAR orchestration
+Grep: "dsar|subject_request|erasure|right_to_delete|delete_request|forget|consent_withdraw" in **/*.{py,ts,js,yaml,yml,json,md}
+
+# Find derived AI stores that need reconciliation
+Grep: "embedding|vector|chunk|rag|retrieval|faiss|pinecone|weaviate|qdrant|milvus|chroma" in **/*.{py,ts,js,yaml,yml,json}
+Grep: "fine_tune|finetune|training_snapshot|dataset_export|eval_dataset|feature_store|checkpoint" in **/*.{py,yaml,yml,json}
+
+# Check for reconciliation evidence
+Grep: "tombstone|deletion_receipt|delete_receipt|reconcile|lineage|source_doc_id|subject_id|manifest|exclusion" in **/*.{py,ts,js,yaml,yml,json,md}
+```
+
+**Evidence to require:**
+
+- Request ID, subject identifier, legal basis, request date, decision authority, and completion deadline.
+- Source record IDs and every derived artifact ID: vector IDs, chunk IDs, prompt/conversation log IDs, analytics event IDs, training snapshot IDs, evaluation dataset IDs, provider object IDs, and model bundle IDs.
+- Deletion receipts or tombstones from each store, including third-party providers and subprocessors where applicable.
+- Reconciliation checks proving derived embeddings/chunks are no longer retrievable by source ID, subject ID, tenant ID, or nearest-neighbor metadata filters.
+- Training exclusion manifests for future dataset exports, plus a documented retraining/unlearning/suppression decision for models already trained on the data.
+- Backup retention exception with expiry date, access controls, and restore-time deletion replay procedure.
+
+**What constitutes a finding:**
+
+| Condition | Severity |
+|---|---|
+| Erasure request deletes source data but leaves derived embeddings, RAG chunks, or prompt logs retrievable | High |
+| Consent withdrawal is not enforced in training/evaluation export jobs | High |
+| No subject-to-artifact lineage exists for vector stores or training snapshots containing personal data | High |
+| Third-party provider deletion has no receipt or retention confirmation | Medium |
+| Backup restore procedure does not replay AI erasure tombstones | Medium |
+| Retraining/unlearning decision for affected model artifacts is undocumented | Medium |
+
+---
+
 ## Findings Classification
 
 | Severity | Criteria | Response SLA |
 |---|---|---|
 | **Critical** | Personal data processed without legal basis, PHI exposed without HIPAA controls, or regulatory non-compliance with immediate enforcement risk. | Immediate -- halt processing |
-| **High** | Significant privacy risk with clear exposure path: PII in prompts without redaction, missing retention policies on PII-containing stores, or no consent mechanism for training data. | 7 days -- remediate before next release |
+| **High** | Significant privacy risk with clear exposure path: PII in prompts without redaction, missing retention policies on PII-containing stores, no consent mechanism for training data, or erasure requests that do not propagate to AI-derived stores. | 7 days -- remediate before next release |
 | **Medium** | Moderate privacy gap requiring specific conditions: incomplete documentation, missing memorization testing, or partial consent implementation. | 30 days -- schedule remediation |
 | **Low** | Minor gap with limited direct privacy risk: defense-in-depth recommendations, documentation improvements, or best practice deviations. | 90 days -- track in backlog |
 | **Informational** | Recommendations for improvement with no current privacy risk. | No SLA -- advisory |
@@ -408,6 +460,12 @@ Grep: "consent_check|is_consented|has_consent|filter_consented|exclude_opted_out
 [Description or reference to diagram showing personal data flows through AI components:
 user input -> prompt assembly -> LLM API -> completion -> output -> logging/storage]
 
+## Erasure Propagation Evidence
+
+| Request ID | Source Records | Derived Stores Checked | Deletion Receipts | Reconciliation Result | Model/Training Decision | Status |
+|---|---|---|---|---|---|---|
+| [request] | [ids] | [vector/logs/datasets/providers/backups] | [yes/no] | [pass/fail] | [retrain/unlearn/suppress/risk accept] | [open/closed] |
+
 ## Findings
 
 ### Finding [N]: [Title]
@@ -430,6 +488,7 @@ user input -> prompt assembly -> LLM API -> completion -> output -> logging/stor
 | Training data privacy | [Yes/Partial/No] | [description] | [severity] |
 | PII in prompts/completions | [Yes/Partial/No] | [description] | [severity] |
 | Data retention | [Yes/Partial/No] | [description] | [severity] |
+| Erasure propagation | [Yes/Partial/No] | [description] | [severity] |
 | Memorization risk | [Yes/Partial/No] | [description] | [severity] |
 | EU AI Act compliance | [Yes/Partial/No/N/A] | [description] | [severity] |
 | Consent management | [Yes/Partial/No] | [description] | [severity] |
@@ -472,6 +531,8 @@ user input -> prompt assembly -> LLM API -> completion -> output -> logging/stor
 
 5. **Ignoring model memorization as a privacy risk.** Organizations that use pre-trained or fine-tuned models often do not test for memorization of personal data. A model that has memorized PII from its training corpus is effectively a data store containing personal data -- it can reproduce that data on specific prompts. This has regulatory implications: if the model contains memorized PII of EU residents, GDPR obligations apply to the model weights themselves, not just the training dataset.
 
+6. **Treating source deletion as AI erasure.** Deleting a row from the application database does not remove embeddings, vector chunks, prompt logs, training snapshots, evaluation datasets, provider-retained objects, backups, or model artifacts that were derived from that row. Maintain subject-to-artifact lineage and prove deletion with receipts plus reconciliation checks.
+
 ---
 
 ## References
@@ -487,3 +548,7 @@ user input -> prompt assembly -> LLM API -> completion -> output -> logging/stor
 - Microsoft Presidio (PII detection and anonymization) -- https://github.com/microsoft/presidio
 - NIST SP 800-188, De-Identifying Government Datasets -- https://csrc.nist.gov/publications/detail/sp/800-188/final
 - Article 29 Working Party, Guidelines on Data Protection Impact Assessment (WP 248) -- https://ec.europa.eu/newsroom/article29/items/611236
+
+## Changelog
+
+- **1.0.1** -- Add erasure and consent-withdrawal propagation gates for AI-derived stores, vector indexes, training snapshots, providers, backups, and model lineage.
diff --git a/tests/benign/ai-data-privacy-erasure-propagated-with-receipts.yaml b/tests/benign/ai-data-privacy-erasure-propagated-with-receipts.yaml
@@ -0,0 +1,53 @@
+scenario: erasure_request_propagated_to_ai_derived_stores
+skill: ai-data-privacy
+expected_result: do_not_flag_erasure_propagation_gap
+data_subject_request:
+  request_id: DSR-2026-1842
+  request_type: erasure
+  subject_id: user-4821
+  legal_basis: gdpr_article_17
+  decision_authority: privacy_counsel
+  deadline: "2026-07-08"
+source_system:
+  crm_record_deleted: true
+  support_ticket_deleted: true
+derived_ai_stores:
+  vector_store:
+    provider: internal_qdrant
+    source_doc_ids:
+      - ticket-931
+    vector_ids:
+      - vec-1001
+      - vec-1002
+    delete_receipt: qdrant-delete-8841
+    metadata_tombstone: tombstone-DSR-2026-1842
+    nearest_neighbor_reconciliation: passed
+  prompt_logs:
+    delete_receipt: logs-delete-8841
+    pii_redacted_in_analytics: true
+  training_snapshot:
+    id: sft-export-2026-05
+    exclusion_manifest_updated: true
+    subject_membership_index_checked: true
+  evaluation_dataset:
+    membership_checked: true
+    subject_records_present: false
+  third_party_llm_provider:
+    deletion_receipt: provider-delete-8841
+    retention_confirmation: zero_retention_endpoint
+  backups:
+    retention_exception_expires: "2026-08-08"
+    restore_replay_tombstones: documented
+model_lineage:
+  fine_tuned_model: support-sft-v14
+  decision: suppress_until_next_retrain
+  decision_authority: model_governance_board
+  next_retrain_excludes_subject: true
+audit:
+  closed_at: "2026-06-20"
+  evidence_package: privacy-evidence-DSR-2026-1842
+why_this_should_pass: >
+  The erasure request is closed only after source systems, vector IDs, prompt
+  logs, training snapshots, evaluation datasets, provider objects, backups, and
+  model lineage have receipts or documented decisions plus reconciliation
+  evidence.
diff --git a/tests/vulnerable/ai-data-privacy-erasure-missing-derived-store-reconciliation.yaml b/tests/vulnerable/ai-data-privacy-erasure-missing-derived-store-reconciliation.yaml
@@ -0,0 +1,42 @@
+scenario: erasure_request_missing_derived_ai_store_reconciliation
+skill: ai-data-privacy
+expected_result: flag_erasure_propagation_gap
+data_subject_request:
+  request_id: DSR-2026-1842
+  request_type: erasure
+  subject_id: user-4821
+  legal_basis: gdpr_article_17
+  deadline: "2026-07-08"
+source_system:
+  crm_record_deleted: true
+  support_ticket_deleted: true
+derived_ai_stores:
+  vector_store:
+    provider: pinecone
+    source_doc_ids:
+      - ticket-931
+    vector_ids: unknown
+    delete_by_source_doc_id: not_supported
+    metadata_tombstone: missing
+    nearest_neighbor_reconciliation: missing
+  prompt_logs:
+    contains_subject_pii: true
+    delete_receipt: missing
+  training_snapshot:
+    id: sft-export-2026-05
+    contains_subject_records: true
+    exclusion_manifest_updated: false
+  evaluation_dataset:
+    contains_subject_records: unknown
+  third_party_llm_provider:
+    deletion_receipt: missing
+  backups:
+    restore_replay_tombstones: missing
+model_lineage:
+  fine_tuned_model: support-sft-v14
+  retraining_or_unlearning_decision: missing
+why_this_should_fail: >
+  The primary CRM and support records were deleted, but the erasure request has
+  no reconciled vector IDs, no prompt-log/provider receipts, no training
+  exclusion manifest, no backup restore replay, and no documented decision for
+  the fine-tuned model trained from the affected snapshot.