Trained Models and the Dataset are too large to upload to GitHub.
The structure to replicate is as follows (names of folders):
- data
- model_checkpoints
- test
- train
Note that data was a smaller subset that I could do analysis on quicker, and while the entire dataset unpacked, which took hours.