COVID-19 Tweets Country of Origin Classification Kaggle Competition

Dataset

This dataset consists of Covid-19 related tweets posted by users coming from six English-speaking countries: Australia, Canada, Ireland, New Zealand, the United Kingdom, and the United States. A total of 6 columns were provided, but only the tweet text and country were used to train the models.

Dataset was extend to provide better results by replacing emojis with their respective word and expanding the shortened urls to the original link in order to extract relevant words.

text	country
Remember the #WuhanCoronaVirus? The pandemic w...	us
While we hit 150,000 in #COVID19 deaths, the P...	new_zealand
🇺🇸 Pandémie de #coronavirus: 30 pasteurs améri...	us

Analysis

Within the Notebook is an extensive descriptive analysis using numerous NLP techniques in an attempt extract useful information from the dataset.

Finding top ten hashtags
Calculating statistics based on words, characters, and hashtags within the tweets
Utilizing LDA to topics from the dataset
Performing Non-negative Matrix Factorization for topic analysis

Modeling

model	accuracy
Ensemble Model	51.3%
Logistic Regression	45.2%
Linear SVC	48.7%
Multinomial Naive Bayes	49.4%

Ensemble model combined results from a CNN built using keras and a Multinomial Naive Bayes model built using Scikit-learn.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Code.ipynb.ipynb		Code.ipynb.ipynb
Confusion Matrix.PNG		Confusion Matrix.PNG
README.md		README.md
Report.pdf		Report.pdf
test_data.csv		test_data.csv
training_data.csv		training_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

COVID-19 Tweets Country of Origin Classification Kaggle Competition

Dataset

Analysis

Modeling

About

Uh oh!

Releases

Packages

Uh oh!

Languages

thomasdurkin/Capstone-Kaggle-Competition

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Tweets Country of Origin Classification Kaggle Competition

Dataset

Analysis

Modeling

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages