Skip to content

Latest commit

 

History

History
31 lines (19 loc) · 1.08 KB

README.md

File metadata and controls

31 lines (19 loc) · 1.08 KB

Fine-tuning DistilBERT on senator tweets

A guide to fine-tuning DistilBERT on the tweets of American Senators with snscrape, SQLite, and Transformers (PyTorch) on Google Colab.

Built in 🐍

using 🤗 Transformers and

deployed on Streamlit 🎈 (coming soon!).

Read the Medium article here.

Code

Part 1: Creating the dataset - get_tweets.ipynb

Part 2: Fine-tuning DistilBERT - finetune_distilbert_senator_tweets_pt.ipynb

Sample

All 2021 tweets (~100,000) posted by 100 United States Senators and scraped by me.

Model

DistilBERT base model (uncased) for sequence classification.

Evaluation

The model was evaluated on a test dataset (20%):

{'accuracy': 0.908, 
'f1': 0.912}