Sentiment Analysis on the comments of Pewdiepie videos

Introduction

Pewdiepie has become the biggest English speaking channel on Youtube with currently more than 95 million subscribers. With this huge amount of audience, it is interesting to see how the audience reacts to his videos.

In our paper we analyse the comments of 50 recent videos of Pewdiepie and we evaluate the polarity and toxicity leveraging libraries like TextBlob and the pre trained model BERT.

Datasets

200,000 top-threaded comments scraped from Pewdiepie videos with Youtube API (4,000 per video, 50 videos)
Jigsaw Unintended Bias in Toxicity Classification dataset that contains sentences with a score of toxicity with multiple labels like (”toxic”, ”severe toxic”, ”obscene”, ”threat”, ”insult”, ”identity hate”) indicating the type of toxicity.From this dataseth we used the first 90,000 entries of the training set to fine tune BERT

Experiment

In order to analyse the comments, we used the TextBlob and Pattern libraries to score the sentiment polarity per comment and then averaged them per video being our general polarity score for such video. Then we fine tuned BERT using Pytorch and the JUBTC dataset in Google Colab to score the videos toxicity (Our tuned model yielded a .908 accuracy value in the validation set).

Results

Polarity results

Categorized videos with the top 5 highest and least polarity (Higher is more positive):

Rank	Video ID	Category	Pattern Polarity	TextBlob Polarity
1	qPnTTA8BC8A	Book review	0.4933	0.4956
2	C2fRC55rA8w	Travel vlog	0.3267	0.3277
3	PGbAWTqUuxQ	Hameplay	0.3185	0.3218
4	QNLARCvIATo	Travel vlog	0.2999	0.3009
5	OEUsKLW1th4	Gameplay	0.2640	0.2656
46	WOSC6uGtBFw	Meme review	0.0935	0.0964
47	rdaQsl9jqmw	Gameplay	0.0901	0.0899
48	wFxCAWqvmBE	Meme review	0.0628	0.0635
49	zYZ1Fd7iH90	Cringe Tue.	0.0581	0.0587
50	DCkydkdhL8M	Meme review	0.0422	0.0448

Toxicity results

Categorized top 5 toxic videos and least 5 toxic videos (Higher is more toxic):

Rank	Video ID	Category	Toxicity
1	JLREgYXXdB8	Cringe Tue.	0.2964
2	eHYkTUmsJlY	Pew news	0.1592
3	JxAUHg8AguA	Cringe Tue.	0.1536
4	4QnLRnKwFM0	Pew news	0.1501
5	3m4mF9-7L-Y	Pew news	0.1368
46	rc1VR54nHV0	Collab.	0.0612
47	OEUsKLW1th4	Gameplay	0.0604
48	wFxCAWqvmBE	Meme re.	0.0522
49	C2fRC55rA8w	Travel vlog	0.0498
50	qPnTTA8BC8A	Book re.	0.0482

Analysis

Comments are biased, for example the gameplay of ”Happy Wheel” is the 3rd most positive video while the gameplay of ”The Walking Dead” is in 49th place. The word 'happy' occurred a lot more times since it’s part of the game name which increases the polarity score while the oppositve happens with the word 'dead'.

Rank 'Happy' frequency 'Dead' frequency Polarity ( Pattern)

3 438 130 0.3185

49 63 173 0.0581
From the full results, we found that book review videos are more positive than other categories and also travel vlogs tend to have higher polarity while meme reviews tend to have lower polarity. The polarity of a gameplay can differ drastically based on the game.
Pew news and Cringe Tuesday categories remained in the most toxic videos, there could be multiple explanations to this, one of that we found in our results is that the model is biased and categorize wrongly certain sentences. For example the most toxic video "I broke my ass" contains misclassified comments like "I love you and your broken ass" with really toxic score of 0.9687.
While toxicity and polarity are two different attributes we found that 5 of the top 10 positive videos are also in the top 10 least toxic videos. Furthermore 4 of the most negative videos are in the top 10 most toxic videos. The difference in the top10 list can be mainly explain due to the bias andthe different focus of the algorithms where the po-larity of a comment can be low if is sad while itcould remains as not toxic.

Conclusion

Based on the results, we can conclude that gener-ally the comments of Pewdiepie’s videos are morepositive than negative, and in 80% of the samplevideos, less than 10% of the comments are toxic(Table 7). We also found out that the sentiment po-larity and toxicity somewhat correlates in the top 10% percentile. Finally after analysing the resultswe discovered that the models weren’t unbiasedand further research is recommended.

Resources

Bert fined tunned weights (>400mbs)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Json		Json
Jupyter Notebooks		Jupyter Notebooks
Python src		Python src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Sentiment Analysis on the comments of Pewdiepie videos.pdf		Sentiment Analysis on the comments of Pewdiepie videos.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on the comments of Pewdiepie videos

Introduction

Datasets

Experiment

Results

Polarity results

Toxicity results

Analysis

Conclusion

Resources

About

Releases

Packages

Contributors 2

Languages

Rank	'Happy' frequency	'Dead' frequency	Polarity ( Pattern)
3	438	130	0.3185
49	63	173	0.0581

License

Noixas/BERT-Toxicity

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on the comments of Pewdiepie videos

Introduction

Datasets

Experiment

Results

Polarity results

Toxicity results

Analysis

Conclusion

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages