A tool to parse logs generated by Fortify products and run machine learning algorithms on them to classify them.
There are 4 main tasks the tool does to take in a logfile and output the log type.
- Parse logfiles and store it in a database
- Convert the parsed log files into a list of features, then store the features into the database
- Train machine learning models on the features in the database
- Predict the type of a new log file based on the trained models
It might be easier to get this python 3.5 install: https://www.continuum.io/downloads. It has a lot of the libraries pre-loaded, there are still a few libraries you will need to download though. There needs to be a SQL Server database already created to store the parsed logs and their features. Once you have that, replace the servername and databasename variables in /storage/storage.py with the hostname and DB names. The default I’ve gone with is ‘parsedlogs’ but you can change that.
Also, a quick breakdown of the important files:
- /classes folder is where the log parsing happens
- /storage/storage.py folder is where the connection from the log parser to the database is managed. This is where most of the db queries are (along with dbtrainer.py)
- /training_logs is where the logs to be trained on are stored. For SSC, they’re split up by category so that we can detect that eventually. The SCA and DSCA folders haven’t been split up yet.
- dbtrainer.py is the machine learning front end to the database. It manages the storage and retrieval of logfiles. Also, it loads up the models and trains them on the parsed log data.
- controlla.py is the interface between the GUI front end and dbtrainer.py. It manages the threading also.
- /gui/main.py is the GUI for the application.
- /nb_models is where the trained classifiers are saved.