Multiclass-Classification, code to preprocess datasets and train multiclass classcifer In this notebook, it contains all classification models that I did base on my factuality factors.
To Run Multiclass-Classification.ipynb, please download below nltk:
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')- Liar Classification.ipynb, code to preprocess datasets and train multiclass classcifer. In this notebook, it contains the Liar Hackthon
- politifact_plus_data.csv, data newly scraped from Politifact
- train2.tsv, test2.tsv, val2.tsv, data originally from Liar-Plus dataset.
Dataset politifact_plus_data is around 890 newly scrapped data from Politifact.com. The dataset contains label, statement, and justification from the TRUTH-O-METER on Politifact. Justification is scraped from the summary of the fact linked article.
Dataset test2.tsv, train2.tsv, val2.tsv comes from the Liar-Plus dataset.
Total of 11129 data were used to train the 6 label classification model.
Reference to Triple Branch BERT Siamese Network.
Instead of using three branches of BERT model, I connected only two branch of BERT model for faster process. The two branch uses column statement and justification to tokenize and predict the labels.