This repository contains the code and resources for the Classification Sprint challenge, where the goal is to create a machine learning model that can classify whether a person believes in climate change based on their novel tweet data. The challenge aims to provide companies focused on environmental impact and sustainability with valuable insights into public perception of climate change. The challenge can be found here, and the accompanying notebook serves as our entry.
Many companies are built around lessening one’s environmental impact or carbon footprint. They offer products and services that are environmentally friendly and sustainable, in line with their values and ideals. They would like to determine how people perceive climate change and whether or not they believe it is a real threat. This would add to their market research efforts in gauging how their product/service may be received.
The task is to build a classification model that accurately predicts whether a person believes in climate change or not, based on their tweets. The dataset provided contains a collection of tweets from different users, which have been labeled as pro climate change, anti climate change, neutral, or factual news tweets.
Providing an accurate and robust solution to this task gives companies access to a broad base of consumer sentiment, spanning multiple demographic and geographic categories - thus increasing their insights and informing future marketing strategies.
The repository contains the following resources:
-
profanity_en.csv: A csv file containing common english curse words used in the data preprocessing -
train.csv: The data used to train our models. It consists of the tweet sentiment, message, and tweetid. -
train.csv: The data used to test our models by means of prediction. It consists of the tweet message, and tweetid. -
FinalNotebook.ipynb: The notebook in which the preprocessing and modelling takes place. This is also used to produce files for submission to the competition
The following libraries need to be installed, but can be done so by following the steps in the notebook:
-
wordcloud
-
contractions
-
catboost
-
xgboost
-
comet_ml
-
emoji
To reproduce the results or use the provided code, follow these steps:
- Clone the repository:
git clone https://github.com/Kabous0017/Advanced_Classification_-_Team_GM3_2301FTDS.git - Explore the Jupyter notebooks
FinalNotebook.ipynbto understand the data preprocessing, feature engineering, and model development steps.
The notebook sometimes fails to render on GitHub, sue to the size and images contained within. If this is the case, please find a fully rendered version [here] (https://nbviewer.org/github/Kabous0017/Advanced_Classification_-_Team_GM3_2301FTDS/blob/e2210041ba9a2083b44f795fc00feba5a68fa145/FinalNotebook.ipynb)