This one is just for fun. Simple conclusion can be made is many false positive because of similar behavior for Zeus and Crypto.
Best parameter to classifiy this data using Random Forest:
Grid score: 0.5354242509101093
Grid parameter:
'clf-rf__random_state': 42,
'tfidf__use_idf': False,
'vect__ngram_range': (1, 1)
Accuracy of RF classifier on training set: 0.68
Accuracy of RF classifier on test set: 0.62
Classification report :
precision recall f1-score support
APT1 0.96 0.89 0.92 75
Crypto 0.57 0.75 0.65 513
Locker 0.97 0.54 0.69 114
Zeus 0.59 0.46 0.52 489
avg / total 0.64 0.62 0.62 1191
Confusion Matrix for training using Random Forest:
Confusion Matrix for testing using Random Forest:
Using dataset from https://github.com/marcoramilli/MalwareTrainingSets
