NSRL BloomFilter, Mandiant BloomFilter, Hyperloglog Malware Data Structure
Bloom Filters save space - Millions of NSRL MD5s in 17megabytes instead of 2.6gigs
In [44]: ls -la NSRLGood.bloomfilter
-rwxr-xr-x 1 antigen staff 17973266 12 Mar 13:15 NSRLGood.bloomfilter
In [45]: ls -la NSRLFile.txt
-r--r--r--@ 1 antigen staff 2611139266 30 Jan 14:00 NSRLFile.txt
##Class ingests MD5 data from NSRLFile and updates bloomfilter for fast in-memory queries
Example:
>>> from NSRLmd5BloomFilter import MD5BloomFilter
>>> filename = '~/Downloads/unique/NSRLFile.txt'
>>> size = 100000000
>>> error = 0.001
>>> bloom_filename = 'NSRLgood.bloomfilter'
>>> hash_bloom = MD5BloomFilter(size, error, bloom_filename)
>>> hash_bloom.process()
>>> print "F16FF81271ADA49847E6EB6BB9CB8A90" in NSRL_good # positive
>>> print 'testTESTtest' in NSRL_good # false
##TODO
Figure out how to load a hash_bloom.bloomfilter that is already loaded with data and use it