Detection Engineering Intelligence is an advanced system that learns from 10,235+ existing detection patterns across multiple security platforms to automatically generate high-quality, convention-matching detections.
The system extracts and analyzes patterns from:
- SPL (Splunk Processing Language) queries from Splunk ESCU
- Sigma rules from the Sigma project
- KQL (Kusto Query Language) queries from Microsoft Sentinel
- Elastic EQL queries from Elastic Security
These patterns are stored in a pattern database and used to generate detection templates that follow established conventions and best practices.
Traditional detection engineering requires deep knowledge of:
- Platform-specific query languages
- Data model structures and field mappings
- Common detection patterns and anti-patterns
- Naming conventions and style guidelines
- Risk-based alerting (RBA) structures
Detection Engineering Intelligence automates this knowledge transfer, enabling:
- Faster detection development - Generate templates in seconds instead of hours
- Consistent quality - All detections follow learned best practices
- Reduced errors - Avoid common mistakes by learning from existing detections
- Continuous improvement - System learns from user feedback and corrections
The system operates through three main phases:
-
Pattern Extraction - Analyzes indexed detections to extract:
- Query structures and data model usage
- Field usage patterns by data model
- Macro usage and conventions
- Naming patterns and style conventions
-
Pattern Learning - Stores extracted patterns in a searchable database:
- Patterns indexed by MITRE technique
- Field references organized by data model
- Style conventions for naming, query structure, and RBA
-
Template Generation - Uses learned patterns to generate detection templates:
- Retrieves patterns for a given technique
- Selects appropriate data model and fields
- Generates SPL query following conventions
- Creates RBA structure with appropriate scores
- Applies learned naming and style conventions
The system extracts patterns from four major detection rule formats, each with unique characteristics.
Splunk Processing Language (SPL) patterns are extracted from Splunk ESCU detections.
The system identifies and tracks usage of Splunk CIM data models:
Endpoint.Processes- Process execution eventsEndpoint.Filesystem- File system operationsEndpoint.Registry- Registry modificationsNetwork_Traffic.All_Traffic- Network connectionsAuthentication.Authentication- Authentication eventsEmail.All_Email- Email activityWeb.Web- Web application activityRisk.All_Risk- Risk eventsAlerts.Alerts- Security alertsUpdates.Published_Updates- Software updatesCertificates.All_Certificates- Certificate operationsIntrusion_Detection.IDS_Attacks- IDS alertsChange.All_Changes- Configuration changesNetwork_Resolution.DNS- DNS queries
Common Splunk macros extracted from queries:
security_content_summariesonly- Used withtstatsfor accelerated data model queriesdrop_dm_object_name(Processes)- RemovesProcesses.prefix from field namesdrop_dm_object_name(Filesystem)- RemovesFilesystem.prefix from field namessecurity_content_ctime(firstTime)- FormatsfirstTimefield for displaysecurity_content_ctime(lastTime)- FormatslastTimefield for displaydetection_name_filter- Custom filter macro for tuning (replaced with actual detection name)sysmon- Filters Sysmon eventso365_management_activity- Filters Office 365 audit logsesxi_syslog- Filters ESXi syslog events
Fields are extracted from:
- Data model references:
Processes.process_name,Filesystem.file_path byclause groupings:by Processes.dest Processes.user- WHERE clause filters:
where Processes.process_name="*"
Common field patterns:
- Process fields:
process_name,process_path,process_id,process_guid,parent_process_name,parent_process_path,parent_process_id,parent_process_guid - File fields:
file_name,file_path,file_hash,file_size - Registry fields:
registry_path,registry_key_name,registry_value_name,registry_value_data - Network fields:
src,src_ip,src_port,dest,dest_ip,dest_port,transport,protocol - Authentication fields:
user,src,action,app,authentication_method,signature,signature_id
Common aggregation functions tracked:
count- Count eventsmin(_time) as firstTime- First occurrence timestampmax(_time) as lastTime- Last occurrence timestampvalues(field)- Distinct valuesdc(field)- Distinct countsum(field)- Sum of valuesavg(field)- Average valuestats- Statistical aggregationseventstats- Event-level statistics
Common WHERE clause patterns extracted:
IN_LIST- Field IN ("value1", "value2")EQUALS- Field = "value"NOT_EQUALS- Field != "value" or Field <> "value"WILDCARD- Field = "pattern"AND- Multiple conditions with ANDOR- Multiple conditions with ORNOT- Negation conditionsLIKE- Pattern matching
Example SPL Pattern Extraction:
| tstats `security_content_summariesonly` count min(_time) as firstTime max(_time) as lastTime
from datamodel=Endpoint.Processes
where Processes.process_name IN ("cmd.exe", "powershell.exe")
by Processes.dest Processes.user Processes.process Processes.parent_process
| `drop_dm_object_name(Processes)`
| `security_content_ctime(firstTime)`
| `security_content_ctime(lastTime)`
| `detection_name_filter`
Extracted Pattern:
- Uses
tstatswithsecurity_content_summariesonly - Data model:
Endpoint.Processes - WHERE pattern:
IN_LIST - Fields:
process_name,dest,user,process,parent_process - Aggregations:
count,min,max - Macros:
security_content_summariesonly,drop_dm_object_name,security_content_ctime,detection_name_filter
Sigma rules use a YAML-based format with condition-based logic.
Fields extracted from Sigma rule conditions:
CommandLine- Process command lineImage- Process image pathParentImage- Parent process imageParentCommandLine- Parent process command lineUser- User accountTargetFilename- Target file nameTargetObject- Target registry objectSourceImage- Source process imageTargetImage- Target process imageProcessId- Process IDParentProcessId- Parent process IDOriginalFileName- Original file nameCurrentDirectory- Current working directoryIntegrityLevel- Process integrity levelHashes- File hashesCompany- File company nameProduct- File product nameDescription- File descriptionFileVersion- File version
Sigma condition operators extracted:
AND- Logical AND (and)OR- Logical OR (or)NOT- Logical NOT (not)CONTAINS- Contains substring (|contains)STARTSWITH- Starts with (|startswith)ENDSWITH- Ends with (|endswith)REGEX- Regular expression (|re)ALL- All conditions must match (|all)
Example Sigma Pattern:
detection:
selection:
Image|endswith:
- '\cmd.exe'
- '\powershell.exe'
CommandLine|contains:
- '-nop -w hidden'
- 'encodedcommand'
condition: selectionExtracted Pattern:
- Fields:
Image,CommandLine - Condition patterns:
ENDSWITH,CONTAINS - Logic:
AND(implicit in selection)
Kusto Query Language (KQL) patterns from Microsoft Sentinel detections.
Common functions extracted:
where- Filter eventsproject- Select columnsextend- Add calculated columnssummarize- Aggregate datajoin- Join tablesunion- Combine resultsparse- Parse text fieldsmv-expand- Expand multi-value fieldsmake-set- Create set of valuesmake-list- Create list of valuesarg_max- Get row with maximum valuearg_min- Get row with minimum valuecount- Count rowsdcount- Distinct countsum,avg,min,max- Aggregationsstrcat,split,extract,replace- String functionstolower,toupper- Case conversiondatetime,ago,now- Time functionsbetween,in,has,contains- Comparison operatorsstartswith,endswith,matches regex- Pattern matchingisnotempty,isempty- Null checks
Common Microsoft Sentinel fields:
TimeGenerated- Event timestampComputer- Computer nameAccount- Account nameSourceIP- Source IP addressDestinationIP- Destination IP addressProcessName- Process nameCommandLine- Command lineParentProcessName- Parent process nameFileName- File nameFilePath- File pathEventID- Windows Event IDEventType- Event typeLogonType- Logon typeTargetUserName- Target user nameSubjectUserName- Subject user nameInitiatingProcessFileName- Initiating process file nameInitiatingProcessCommandLine- Initiating process command lineTargetFileName- Target file nameDeviceName- Device nameRemoteIP,RemotePort- Remote connection detailsLocalIP,LocalPort- Local connection detailsActionType- Action typeRegistryKey,RegistryValueName,RegistryValueData- Registry fields
summarize- Group by and aggregatecount()- Count rowsdcount()- Distinct countsum(),avg(),min(),max()- Statistical aggregationsmake-set()- Create setmake-list()- Create list
Operators:
AND/&&- Logical ANDOR/||- Logical ORNOT/!- Logical NOThas- Field existscontains- Contains substringstartswith,endswith- String matchingmatches regex- Regular expressionin/in~- In list (case-sensitive/insensitive)between- Range check
Example KQL Pattern:
SecurityEvent
| where EventID == 4688
| where ProcessName in~ ("cmd.exe", "powershell.exe")
| where CommandLine contains "-nop" or CommandLine contains "encodedcommand"
| summarize count(), min(TimeGenerated), max(TimeGenerated) by Computer, Account, ProcessName, CommandLine
| where count_ > 5Extracted Pattern:
- Functions:
where,summarize - Fields:
EventID,ProcessName,CommandLine,Computer,Account,TimeGenerated - Aggregations:
count(),min(),max() - Operators:
IN,CONTAINS,OR
Elastic Security uses EQL (Event Query Language) and ECS (Elastic Common Schema).
Common Elastic Common Schema fields:
process.name- Process nameprocess.command_line- Process command lineprocess.parent.name- Parent process nameprocess.executable- Process executable pathfile.name- File namefile.path- File pathfile.hash.sha256- File SHA-256 hashuser.name- User nameuser.domain- User domainhost.name- Host namehost.os.name- Operating system namesource.ip- Source IP addressdestination.ip- Destination IP addressdestination.port- Destination portevent.action- Event actionevent.category- Event categoryevent.type- Event typeevent.outcome- Event outcomeregistry.path- Registry pathregistry.key- Registry keyregistry.value- Registry valuenetwork.protocol- Network protocol
Elastic EQL supports sequence detection:
sequence- Detect event sequencesby- Group sequences by fieldmaxspan- Maximum time span for sequenceuntil- Sequence termination condition
AND/and- Logical ANDOR/or- Logical ORNOT/not- Logical NOTWILDCARD- Wildcard matching (:*)FUZZY- Fuzzy matching (~)RANGE- Range matching (..)WHERE- Filter condition
Example Elastic Pattern:
process where process.name : "cmd.exe" or process.name : "powershell.exe"
and process.command_line : ("-nop", "encodedcommand")
Extracted Pattern:
- Fields:
process.name,process.command_line - Operators:
OR,AND,WILDCARD - Pattern: Simple condition matching
The system maintains a comprehensive field reference database organized by Splunk CIM data model.
Use the get_field_reference tool to retrieve available fields for a data model:
get_field_reference({
data_model: "Endpoint.Processes"
})Response:
{
"data_model": "Endpoint.Processes",
"found": true,
"field_count": 45,
"fields": [
{
"name": "process_name",
"type": "string",
"usage_count": 2847,
"examples": [
"Windows PowerShell Execution",
"Suspicious Process Execution"
]
},
{
"name": "dest",
"type": "string",
"usage_count": 2156,
"examples": [
"Windows Process Execution",
"Endpoint Process Activity"
]
}
// ... more fields
],
"most_used": [
"process_name",
"dest",
"user",
"process_path",
"parent_process_name",
"process_id",
"process_guid",
"parent_process_path",
"parent_process_id",
"parent_process_guid"
]
}Each field includes:
- Usage count - How many detections use this field
- Examples - Detection names that use this field
- Field type - Data type (string, number, timestamp)
Fields are sorted by usage count, with the most commonly used fields appearing first.
The system tracks fields for all major Splunk CIM data models:
- Endpoint.Processes - Process execution events
- Endpoint.Filesystem - File system operations
- Endpoint.Registry - Registry modifications
- Network_Traffic.All_Traffic - Network connections
- Authentication.Authentication - Authentication events
- Email.All_Email - Email activity
- Web.Web - Web application activity
- Risk.All_Risk - Risk events
- Alerts.Alerts - Security alerts
- Updates.Published_Updates - Software updates
- Certificates.All_Certificates - Certificate operations
- Intrusion_Detection.IDS_Attacks - IDS alerts
- Change.All_Changes - Configuration changes
- Network_Resolution.DNS - DNS queries
Splunk macros are essential building blocks for detection queries. The system tracks macro usage patterns across all detections.
Purpose: Use with tstats for accelerated data model queries
Usage: Always include when using tstats with data models
Frequency: Used in 95%+ of data model queries
| tstats `security_content_summariesonly` count
from datamodel=Endpoint.Processes
Purpose: Remove Processes. prefix from field names after data model query
Usage: Required after tstats queries to normalize field names
Frequency: Used in 90%+ of data model queries
| `drop_dm_object_name(Processes)`
Purpose: Remove Filesystem. prefix from field names
Usage: Required after Filesystem data model queries
Frequency: Used in 85%+ of Filesystem queries
Purpose: Format firstTime field for display
Usage: Always include after aggregations with min(_time) as firstTime
Frequency: Used in 98%+ of detections with time aggregations
| `security_content_ctime(firstTime)`
Purpose: Format lastTime field for display
Usage: Always include after aggregations with max(_time) as lastTime
Frequency: Used in 98%+ of detections with time aggregations
| `security_content_ctime(lastTime)`
Purpose: Custom filter macro for tuning false positives
Usage: Replace detection_name with actual detection name (lowercase, underscores)
Frequency: Used in 100% of production detections
| `windows_powershell_execution_filter`
Use the get_macro_reference tool:
get_macro_reference({
filter: "security_content" // Optional: filter by name
})Response:
{
"total_macros": 127,
"essential_macros": [
{
"name": "security_content_summariesonly",
"purpose": "Use with tstats for accelerated data model queries"
},
{
"name": "drop_dm_object_name(Processes)",
"purpose": "Remove Processes. prefix from field names"
}
// ... more macros
],
"top_used": [
{ "name": "security_content_summariesonly", "usage_count": 2847 },
{ "name": "drop_dm_object_name(Processes)", "usage_count": 2156 },
{ "name": "security_content_ctime(firstTime)", "usage_count": 1983 },
{ "name": "security_content_ctime(lastTime)", "usage_count": 1983 }
// ... more macros
],
"usage_tip": "Always include `security_content_summariesonly` with tstats and end with `detection_name_filter`"
}- Data Model Queries - Always use
security_content_summariesonlywithtstats - Field Normalization - Always use
drop_dm_object_nameafter data model queries - Time Formatting - Always use
security_content_ctimefor time fields - False Positive Tuning - Always end with
detection_name_filter
The template generation workflow follows an 8-step process that leverages learned patterns at each stage.
Tool: get_query_patterns
Retrieve existing patterns for the MITRE technique:
get_query_patterns({
technique_id: "T1059.001",
source_type: "splunk_escu" // Optional
})Returns:
- Pattern count (e.g., 69 patterns)
- Data models used (e.g.,
Endpoint.Processes,Endpoint.Filesystem) - Common macros (e.g.,
security_content_summariesonly,drop_dm_object_name) - Common fields (e.g.,
process_name,dest,user,command_line) - Most common data model (e.g.,
Endpoint.Processes) - Query structure examples
Decision: Choose the most common data model for the technique
If get_query_patterns returns:
most_common_data_model: "Endpoint.Processes"
Then use Endpoint.Processes for the template.
Fallback: If no patterns exist, default to Endpoint.Processes for endpoint techniques.
Tool: get_field_reference
Retrieve available fields for the selected data model:
get_field_reference({
data_model: "Endpoint.Processes"
})Use: Select top 8-10 most-used fields for the query structure.
Pattern: {Platform} {Action} {Description}
Examples:
Windows PowerShell ExecutionLinux Suspicious Process ActivityAWS S3 Bucket Policy Modification
Convention:
- Platform prefix (Windows, Linux, AWS, Azure, GCP)
- Action verb (Execution, Modification, Access)
- Description (PowerShell, Suspicious Process, Bucket Policy)
Template Structure:
| tstats `security_content_summariesonly` count min(_time) as firstTime max(_time) as lastTime
from datamodel={DATA_MODEL}
where {OBJECT}.{FIELD}="{PATTERN}"
by {OBJECT}.{FIELD1} {OBJECT}.{FIELD2} {OBJECT}.{FIELD3}
| `drop_dm_object_name({OBJECT})`
| `security_content_ctime(firstTime)`
| `security_content_ctime(lastTime)`
| `{DETECTION_NAME}_filter`
Pattern Application:
- Use
tstatswithsecurity_content_summariesonly(learned from 95%+ of queries) - Include
min(_time) as firstTimeandmax(_time) as lastTime(learned pattern) - Use
byclause with top fields from field reference - Include
drop_dm_object_namemacro (learned convention) - Include
security_content_ctimemacros (learned convention) - End with
{detection_name}_filter(learned convention)
Tool: generate_rba_structure
Generate Risk-Based Alerting configuration:
generate_rba_structure({
detection_type: "TTP",
severity: "high",
description: "PowerShell executing encoded commands",
fields_available: ["dest", "user", "process_name", "command_line"]
})Scoring Logic (Learned):
- TTP detections:
- Low: 40, Medium: 56, High: 72, Critical: 90
- Anomaly detections:
- Low: 16, Medium: 32, High: 48, Critical: 64
- Hunting detections:
- Low: 8, Medium: 16, High: 24, Critical: 32
- Correlation detections:
- Low: 48, Medium: 64, High: 80, Critical: 96
Risk Object Distribution:
dest(system): 60% of scoreuser(user): 40% of score
Threat Objects:
process_name,parent_process_name,file_name,file_path,registry_path
Data Sources: Inferred from data model:
Endpoint.Processes→["Sysmon EventID 1", "Windows Event Log Security 4688"]Endpoint.Filesystem→["Sysmon EventID 11", "Sysmon EventID 23"]Endpoint.Registry→["Sysmon EventID 12", "Sysmon EventID 13", "Sysmon EventID 14"]
Security Domain: Inferred from data model:
Endpoint.*→endpointNetwork_*→networkAuthentication.*→access
Asset Type: Inferred from platform:
Windows,Linux,macOS→EndpointAWS,Azure,GCP→Cloud Instance
Test Template:
tests:
- name: True Positive Test
attack_data:
- data: https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/...
sourcetype: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
source: XmlWinEventLog:Microsoft-Windows-Sysmon/OperationalSourcetype Inference:
Endpoint.Processes→XmlWinEventLog:Microsoft-Windows-Sysmon/OperationalEndpoint.Filesystem→XmlWinEventLog:Microsoft-Windows-Sysmon/OperationalEndpoint.Registry→XmlWinEventLog:Microsoft-Windows-Sysmon/Operational- Default →
WinEventLog:Security
The Detection Engineering Intelligence system provides 8 specialized tools organized into three categories.
Retrieve common query patterns for a MITRE technique based on existing detections.
Input:
{
technique_id: "T1059.001", // Required
source_type: "splunk_escu" // Optional: "sigma", "splunk_escu", "elastic", "kql"
}Output:
{
"technique": "T1059.001",
"found": true,
"patterns": {
"count": 69,
"data_models_used": ["Endpoint.Processes", "Endpoint.Filesystem"],
"common_macros": ["security_content_summariesonly", "drop_dm_object_name", ...],
"common_fields": ["process_name", "dest", "user", "command_line", ...],
"most_common_data_model": "Endpoint.Processes",
"query_structures": [
{
"uses_tstats": true,
"data_model": "Endpoint.Processes",
"aggregations": ["count", "min", "max"],
"where_patterns": ["IN_LIST", "EQUALS"]
}
]
},
"examples": [
{
"name": "Windows PowerShell Execution",
"id": "abc-123",
"severity": "high",
"query_preview": "| tstats `security_content_summariesonly`...",
"data_sources": ["Sysmon EventID 1"]
}
],
"recommendation": "Based on 69 existing detections. Most use Endpoint.Processes."
}Use Case: Call before writing a detection to learn conventions.
Get available fields for a Splunk data model with usage examples.
Input:
{
data_model: "Endpoint.Processes" // Required
}Output:
{
"data_model": "Endpoint.Processes",
"found": true,
"field_count": 45,
"fields": [
{
"name": "process_name",
"type": "string",
"usage_count": 2847,
"examples": ["Windows PowerShell Execution", "Suspicious Process Execution"]
}
],
"most_used": ["process_name", "dest", "user", "process_path", ...]
}Use Case: Understand what fields are available when writing a detection query.
Get common Splunk macros and their usage patterns.
Input:
{
filter: "security_content" // Optional: filter macros by name
}Output:
{
"total_macros": 127,
"essential_macros": [
{
"name": "security_content_summariesonly",
"purpose": "Use with tstats for accelerated data model queries"
}
],
"top_used": [
{ "name": "security_content_summariesonly", "usage_count": 2847 }
],
"usage_tip": "Always include `security_content_summariesonly` with tstats and end with `detection_name_filter`"
}Use Case: Learn which macros to use and when.
Find existing detections similar to what you want to create.
Input:
{
description: "PowerShell downloading files", // Required
technique_id: "T1059.001", // Optional
source_type: "splunk_escu", // Optional
limit: 5 // Optional, default 5
}Output:
{
"found": true,
"count": 5,
"similar_detections": [
{
"name": "Windows PowerShell Execution",
"id": "abc-123",
"description": "Detects PowerShell execution with suspicious parameters...",
"mitre_ids": ["T1059.001"],
"severity": "high",
"detection_type": "TTP",
"data_sources": ["Sysmon EventID 1"],
"query_structure": {
"uses_tstats": true,
"data_model": "Endpoint.Processes",
"length": 450
}
}
],
"recommendation": "Use get_raw_yaml(id) to see the full detection YAML for any of these."
}Use Case: Learn from existing detection logic and structure.
Generate a detection template based on technique, learned patterns, and conventions.
Input:
{
technique_id: "T1059.001", // Required
description: "PowerShell executing encoded commands", // Required
data_model: "Endpoint.Processes", // Optional: will use most common if not specified
detection_type: "TTP", // Optional: "TTP", "Anomaly", "Hunting", default "TTP"
platform: "Windows" // Optional: "Windows", "Linux", "macOS", "AWS", "Azure", "GCP", default "Windows"
}Output:
{
"template": {
"name": "Windows PowerShell Execution",
"id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"version": 1,
"date": "2026-01-29",
"author": "Your Name, Splunk",
"status": "production",
"type": "TTP",
"description": "PowerShell executing encoded commands",
"data_source": ["Sysmon EventID 1", "Windows Event Log Security 4688"],
"search": "| tstats `security_content_summariesonly` count min(_time) as firstTime max(_time) as lastTime\nfrom datamodel=Endpoint.Processes\nwhere Processes.process_name=\"*\"\nby Processes.dest Processes.user Processes.process Processes.parent_process\n| `drop_dm_object_name(Processes)`\n| `security_content_ctime(firstTime)`\n| `security_content_ctime(lastTime)`\n| `windows_powershell_execution_filter`",
"how_to_implement": "Requires data from Endpoint data model. Ensure proper CIM mapping.",
"known_false_positives": "Legitimate administrative activity may trigger this detection. Tune using the filter macro.",
"references": ["https://attack.mitre.org/techniques/T1059/001/"],
"drilldown_searches": [...],
"rba": {
"message": "Suspicious activity detected on $dest$ by $user$. Review process $process_name$ for potential malicious behavior.",
"risk_objects": [
{ "field": "dest", "type": "system", "score": 43 },
{ "field": "user", "type": "user", "score": 29 }
],
"threat_objects": [
{ "field": "process_name", "type": "process_name" }
]
},
"tags": {
"analytic_story": ["Your Story Name"],
"asset_type": "Endpoint",
"mitre_attack_id": ["T1059.001"],
"product": ["Splunk Enterprise", "Splunk Enterprise Security", "Splunk Cloud"],
"security_domain": "endpoint",
"cve": []
},
"tests": [...]
},
"yaml_preview": "name: Windows PowerShell Execution\n...",
"based_on": {
"similar_detections": 69,
"data_model": "Endpoint.Processes",
"macros_included": ["security_content_summariesonly", "drop_dm_object_name", "security_content_ctime", "windows_powershell_execution_filter"],
"fields_available": ["process_name", "dest", "user", "process_path", ...]
},
"notes": [
"Replace the placeholder ID with a real UUID",
"Customize the WHERE clause for specific behavior patterns",
"Adjust RBA scores based on severity and confidence",
"Add specific false positive tuning based on your environment",
"Update the test data URL with actual attack simulation data"
]
}Use Case: Generate a complete detection template ready for customization.
Generate RBA (Risk-Based Alerting) structure for a detection based on learned patterns and best practices.
Input:
{
detection_type: "TTP", // Required: "TTP", "Anomaly", "Hunting", "Correlation"
severity: "high", // Required: "low", "medium", "high", "critical"
description: "PowerShell executing encoded commands", // Required
fields_available: ["dest", "user", "process_name", "command_line"] // Optional
}Output:
{
"rba": {
"message": "PowerShell executing encoded commands detected on $dest$ by user $user$. Review $dest$, $user$, $process_name$ for investigation.",
"risk_objects": [
{ "field": "dest", "type": "system", "score": 43 },
{ "field": "user", "type": "user", "score": 29 }
],
"threat_objects": [
{ "field": "process_name", "type": "process_name" },
{ "field": "command_line", "type": "command_line" }
]
},
"explanation": {
"base_score": 72,
"score_rationale": "TTP detection with high severity. Score distributed across 2 risk object(s).",
"message_variables": ["dest", "user", "process_name"]
}
}Use Case: Generate RBA configuration with appropriate scores based on detection type and severity.
Extract and store patterns from all indexed detections. Run this to populate the pattern database for template generation.
Input:
{
force: false // Optional: force re-extraction even if patterns exist
}Output:
{
"success": true,
"extraction_result": {
"spl_patterns": { "extracted": 2533, "techniques": 428 },
"sigma_patterns": { "extracted": 3884, "techniques": 512 },
"kql_patterns": { "extracted": 493, "techniques": 89 },
"elastic_patterns": { "extracted": 3325, "techniques": 445 },
"field_usage": { "fields": 249, "dataModels": 14 },
"macro_usage": { "macros": 127 },
"naming_conventions": { "conventions": 15 },
"total_patterns": 10235
},
"message": "Extracted 10235 patterns from indexed detections"
}Use Case: Populate the pattern database before using generation tools.
Store user preference or correction to improve future suggestions. Call this when user modifies generated content to build tribal knowledge.
Input:
{
feedback_type: "rba_score", // Required: "naming", "query_structure", "rba_score", "field_usage", "style", "macro_usage"
original: "72", // Required: what was originally suggested
corrected: "85", // Required: what the user changed it to
context: "T1059.001 PowerShell detection in high-security environment" // Optional
}Output:
{
"learned": true,
"learning_id": "learn_abc123",
"feedback_type": "rba_score",
"message": "Stored preference for rba_score. Will apply to future suggestions."
}Feedback Types:
naming- Detection naming conventionsquery_structure- SPL query structure preferencesrba_score- RBA score adjustmentsfield_usage- Field selection preferencesstyle- General style preferencesmacro_usage- Macro usage preferences
Use Case: Improve future template generation by learning from user corrections.
The system continuously improves by learning from user feedback and corrections.
When a user modifies a generated template, the system can learn:
- Naming Preferences - If user changes detection name format
- Query Structure - If user modifies query structure
- RBA Scores - If user adjusts risk scores
- Field Usage - If user adds/removes fields
- Style Conventions - If user changes formatting or style
- Macro Usage - If user adds/removes macros
User corrects detection naming convention:
learn_from_feedback({
feedback_type: "naming",
original: "Windows PowerShell Execution",
corrected: "Windows PowerShell Encoded Command Execution",
context: "T1059.001 - More specific naming preferred"
})Impact: Future templates for T1059.001 will use more specific naming.
User modifies query structure:
learn_from_feedback({
feedback_type: "query_structure",
original: "where Processes.process_name=\"*\"",
corrected: "where Processes.process_name IN (\"powershell.exe\", \"pwsh.exe\")",
context: "T1059.001 - Prefer explicit process list over wildcard"
})Impact: Future templates will prefer explicit process lists.
User adjusts RBA scores:
learn_from_feedback({
feedback_type: "rba_score",
original: "72",
corrected: "85",
context: "T1059.001 PowerShell detection - Higher risk in our environment"
})Impact: Future TTP detections for PowerShell will use higher base scores.
User adds/removes fields:
learn_from_feedback({
feedback_type: "field_usage",
original: "by Processes.dest Processes.user Processes.process",
corrected: "by Processes.dest Processes.user Processes.process Processes.command_line",
context: "T1059.001 - Include command_line for better context"
})Impact: Future templates will include command_line field.
User changes formatting or style:
learn_from_feedback({
feedback_type: "style",
original: "description: PowerShell execution",
corrected: "description: |\n Detects PowerShell execution with encoded commands...",
context: "Prefer multi-line descriptions"
})Impact: Future templates will use multi-line descriptions.
User adds/removes macros:
learn_from_feedback({
feedback_type: "macro_usage",
original: "| `detection_name_filter`",
corrected: "| `detection_name_filter` | `additional_context_macro`",
context: "Always include additional context macro"
})Impact: Future templates will include the additional macro.
Feedback is stored in three locations:
style_conventionstable - Quick lookup for style preferenceskg_learningstable - Knowledge graph learnings for context-aware retrievalkg_decisionstable - Decision log for audit trail
Scenario:
- System generates:
Windows PowerShell Execution - User changes to:
Windows PowerShell Encoded Command Execution - User calls:
learn_from_feedback({ feedback_type: "naming", original: "...", corrected: "..." }) - System stores preference
- Future templates for T1059.001 use:
Windows PowerShell Encoded Command Execution
Result: System learns organization-specific naming preferences and applies them automatically.
The Detection Engineering Intelligence system maintains comprehensive statistics on extracted patterns.
The system has extracted patterns from thousands of detections across multiple platforms.
Patterns organized by detection source:
- SPL (Splunk ESCU): 2,533 patterns
- Sigma: 3,884 patterns
- KQL (Microsoft Sentinel): 493 patterns
- Elastic: 3,325 patterns
Total: 10,235 patterns
- Unique MITRE techniques: 528 techniques
- Patterns per technique: Average 19.4 patterns per technique
- Most covered technique: T1059.001 (PowerShell) with 69 patterns
- Splunk CIM data models tracked: 14 data models
- Most used data model:
Endpoint.Processes(used in 1,847 detections) - Field references indexed: 249 unique fields
- Total field references: 249 fields
- Fields per data model: Average 17.8 fields per data model
- Most referenced field:
process_name(used in 2,847 detections)
- Total macros: 127 unique macros
- Most used macro:
security_content_summariesonly(used in 2,847 detections) - Essential macros: 6 core macros used in 90%+ of detections
- Naming conventions: 15 patterns
- Style conventions: Variable (grows with user feedback)
- Query structure patterns: 4,026 unique structures
This section demonstrates a complete workflow for creating a PowerShell detection using Detection Engineering Intelligence.
Goal: Create a detection for T1059.001 (PowerShell) that identifies encoded command execution.
get_query_patterns({
technique_id: "T1059.001"
})Response:
{
"technique": "T1059.001",
"found": true,
"patterns": {
"count": 69,
"data_models_used": ["Endpoint.Processes"],
"common_macros": [
"security_content_summariesonly",
"drop_dm_object_name",
"security_content_ctime",
"sysmon"
],
"common_fields": [
"process_name",
"dest",
"user",
"command_line",
"parent_process_name",
"process_path"
],
"most_common_data_model": "Endpoint.Processes"
}
}Insight: 69 existing detections use Endpoint.Processes with process_name, command_line, and dest fields.
get_field_reference({
data_model: "Endpoint.Processes"
})Response:
{
"data_model": "Endpoint.Processes",
"found": true,
"field_count": 45,
"most_used": [
"process_name",
"dest",
"user",
"process_path",
"command_line",
"parent_process_name",
"process_id",
"process_guid"
]
}Insight: Top fields are process_name, dest, user, command_line.
suggest_detection_template({
technique_id: "T1059.001",
description: "PowerShell executing encoded commands",
data_model: "Endpoint.Processes",
detection_type: "TTP",
platform: "Windows"
})Response: Complete YAML template with:
- Name:
Windows PowerShell Execution - SPL query using
Endpoint.Processesdata model - RBA structure with scores 43 (dest) and 29 (user)
- Metadata (data sources, security domain, tags)
- Test structure
User modifies the generated template:
- Changes WHERE clause to:
where Processes.process_name IN ("powershell.exe", "pwsh.exe") AND Processes.command_line LIKE "*encodedcommand*" - Adjusts RBA score from 72 to 85 (higher risk in their environment)
- Adds
command_lineto threat objects
learn_from_feedback({
feedback_type: "rba_score",
original: "72",
corrected: "85",
context: "T1059.001 PowerShell detection - Higher risk in our environment"
})Result: System learns this organization prefers higher RBA scores for PowerShell detections.
When generating future T1059.001 detections for this organization:
- System uses learned RBA score of 85 (instead of default 72)
- System includes
command_linein threat objects (learned preference) - System uses explicit process list in WHERE clause (learned preference)
Outcome: Detection generated in seconds, customized to organization preferences, and ready for deployment.
Follow these best practices to maximize the effectiveness of Detection Engineering Intelligence.
Before using generation tools, populate the pattern database:
extract_patterns({
force: false // Set to true to re-extract all patterns
})Why: Generation tools rely on extracted patterns. Without patterns, templates will use defaults instead of learned conventions.
When:
- First time using the system
- After indexing new detections
- When patterns seem outdated
Before creating a new detection, learn from existing ones:
find_similar_detections({
description: "What you want to detect",
technique_id: "T1059.001",
limit: 5
})Why: Understanding existing approaches helps you:
- Avoid duplicating existing detections
- Learn from proven patterns
- Identify gaps in coverage
When:
- Starting a new detection
- Uncertain about approach
- Want to see examples
Store your preferences for future use:
learn_from_feedback({
feedback_type: "rba_score",
original: "72",
corrected: "85",
context: "Higher risk in our environment"
})Why: Your corrections improve future template generation:
- System learns your organization's preferences
- Future templates match your style
- Reduces manual customization over time
When:
- After modifying generated templates
- When you have organization-specific preferences
- When you want to standardize across team
When documenting detections, reference the patterns used:
description: |
This detection uses patterns learned from 69 existing T1059.001 detections.
Most common data model: Endpoint.Processes
Common fields: process_name, dest, user, command_lineWhy:
- Provides context for future maintainers
- Documents decision rationale
- Helps with troubleshooting
Before writing queries, check available fields:
get_field_reference({
data_model: "Endpoint.Processes"
})Why:
- Ensures you use correct field names
- Identifies most commonly used fields
- Provides usage examples
When:
- Writing new queries
- Uncertain about field names
- Want to see field usage patterns
Always use standard macros:
security_content_summariesonlywithtstatsdrop_dm_object_nameafter data model queriessecurity_content_ctimefor time fieldsdetection_name_filterat the end
Why:
- Ensures consistency
- Follows repository conventions
- Improves query performance
Before deploying, validate:
- Replace placeholder UUID
- Customize WHERE clause for specific behavior
- Adjust RBA scores based on environment
- Add false positive tuning
- Update test data URLs
Why:
- Templates are starting points, not final detections
- Customization ensures accuracy
- Validation prevents errors
Build tribal knowledge:
- Log feedback for all customizations
- Share learnings with team
- Document organization-specific preferences
Why:
- Improves system for everyone
- Builds organizational knowledge
- Reduces future manual work
The following diagram illustrates how patterns flow through the Detection Engineering Intelligence system:
graph TD
A[Indexed Detections] --> B[Pattern Extraction]
B --> C[SPL Parser]
B --> D[Sigma Parser]
B --> E[KQL Parser]
B --> F[Elastic Parser]
C --> G[Pattern Database]
D --> G
E --> G
F --> G
G --> H[Field Reference]
G --> I[Macro Reference]
G --> J[Style Conventions]
K[User Request] --> L[get_query_patterns]
K --> M[get_field_reference]
K --> N[get_macro_reference]
K --> O[find_similar_detections]
L --> P[Pattern Retrieval]
M --> P
N --> P
O --> P
P --> Q[suggest_detection_template]
Q --> R[Template Generation]
R --> S[SPL Query Builder]
R --> T[RBA Structure Generator]
R --> U[Metadata Generator]
S --> V[Generated Template]
T --> V
U --> V
V --> W[User Customization]
W --> X[learn_from_feedback]
X --> J
style A fill:#e1f5ff
style G fill:#fff4e1
style V fill:#e8f5e9
style J fill:#fce4ec
Flow Explanation:
- Pattern Extraction (Top): Detections are parsed by source-specific parsers (SPL, Sigma, KQL, Elastic)
- Pattern Storage (Middle): Extracted patterns stored in database, organized into field reference, macro reference, and style conventions
- Pattern Retrieval (Left): User requests patterns via retrieval tools
- Template Generation (Right): System uses patterns to generate detection templates
- Learning Loop (Bottom): User feedback improves future generations by updating style conventions
Detection Engineering Intelligence transforms detection engineering from a manual, knowledge-intensive process into an automated, pattern-driven workflow. By learning from 10,235+ existing detection patterns, the system generates high-quality templates that follow established conventions and best practices.
Key Benefits:
- Faster development - Generate templates in seconds
- Consistent quality - All detections follow learned best practices
- Continuous improvement - System learns from user feedback
- Reduced errors - Avoid common mistakes by learning from existing detections
Next Steps:
- Run
extract_patternsto populate the pattern database - Use
get_query_patternsto explore existing patterns for your techniques - Generate templates with
suggest_detection_template - Customize templates and provide feedback with
learn_from_feedback
For more information, see the Tool Reference section for detailed tool documentation.