-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
The domain rule in the YARA ruleset matches unintended strings that are not actual domains. This leads to false positives when scanning files that contain generic words, filenames, or localhost-like addresses.
To Reproduce
Steps to reproduce the behavior:
Run YARA scan with the domain rule enabled.
Scan a file that contains common words, filenames, or IP addresses.
Observe that many non-domain strings are detected.
Example false positives:
test-123
file.txt
localhost
random_text
All these strings are incorrectly flagged as domains.
Expected behavior
The domain rule should only match valid domains, such as example.com, sub.example.net, or test-site.org. It should not match:
Plain text words
Filenames like file.txt
Localhost or internal references
Additional context
The issue is caused by the overly broad regex pattern:
$domain_regex = /([\w.-]+)/ wide ascii
This matches any word that includes dots, hyphens, or alphanumeric characters, leading to many false positives.
Suggested Fix: Update the regex to a stricter pattern that ensures a valid TLD is present:
$domain_regex = /([a-zA-Z0-9-]+.[a-zA-Z]{2,6})/ wide ascii
This ensures only real domains are detected.