It's important to know that you will be hiding all of the code chunks for the final project, so if you're going to have things printed, you'll want to look into making them as tidy tables (using Kable is a good idea for formatting).
There is no actual narrative in here for me to understand what you are doing and why you are doing it, nor how it will help you answer your original research question. Without the code chunks, it would be just a series of printouts without any context or information surrounding them or what they mean or why they are there.
For the year-based visualizations, you should relabel the x-axis so that the years are the correct years, and then for the lines discuss how the years in the linear modeling (geom_smooth) were done since 2020 because your data is only from there (as well as justifying why it's okay to do the lines off of years since 2020).
The logistic regression looks good but I'm not sure exactly how it fits your model, and if you are going to use it, you should be separating into a train/test split and then assessing how good the model is on the test set.
Finally, looking at all of these means, it seems like you're going to be making comparisons of the means, in which case you might want to consider using a hypothesis test so that you can provide more concrete statistical evidence if you are going to make claims about the relative levels of the means.
It's important to know that you will be hiding all of the code chunks for the final project, so if you're going to have things printed, you'll want to look into making them as tidy tables (using Kable is a good idea for formatting).
There is no actual narrative in here for me to understand what you are doing and why you are doing it, nor how it will help you answer your original research question. Without the code chunks, it would be just a series of printouts without any context or information surrounding them or what they mean or why they are there.
For the year-based visualizations, you should relabel the x-axis so that the years are the correct years, and then for the lines discuss how the years in the linear modeling (geom_smooth) were done since 2020 because your data is only from there (as well as justifying why it's okay to do the lines off of years since 2020).
The logistic regression looks good but I'm not sure exactly how it fits your model, and if you are going to use it, you should be separating into a train/test split and then assessing how good the model is on the test set.
Finally, looking at all of these means, it seems like you're going to be making comparisons of the means, in which case you might want to consider using a hypothesis test so that you can provide more concrete statistical evidence if you are going to make claims about the relative levels of the means.