Skip to content

bigsnarfdude/Malware-Probabilistic-Data-Structres

Repository files navigation

Malware-Probabilistic-Data-Structures

NSRL BloomFilter, Mandiant BloomFilter, Hyperloglog Malware Data Structure

Bloom Filters save space - Millions of NSRL MD5s in 17megabytes instead of 2.6gigs

In [44]: ls -la NSRLGood.bloomfilter
-rwxr-xr-x  1 antigen  staff  17973266 12 Mar 13:15 NSRLGood.bloomfilter

In [45]: ls -la NSRLFile.txt
-r--r--r--@ 1 antigen  staff  2611139266 30 Jan 14:00 NSRLFile.txt

##Class ingests MD5 data from NSRLFile and updates bloomfilter for fast in-memory queries

Example:

>>> from NSRLmd5BloomFilter import MD5BloomFilter
>>> filename = '~/Downloads/unique/NSRLFile.txt'
>>> size = 100000000
>>> error = 0.001
>>> bloom_filename = 'NSRLgood.bloomfilter'
>>> hash_bloom = MD5BloomFilter(size, error, bloom_filename)
>>> hash_bloom.process()
>>> print "F16FF81271ADA49847E6EB6BB9CB8A90" in NSRL_good # positive
>>> print 'testTESTtest' in NSRL_good # false

##TODO

Figure out how to load a hash_bloom.bloomfilter that is already loaded with data and use it

About

NSRL BloomFilter, Mandiant BloomFilter, Hyperloglog Malware Data Structure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages