With that, here are some of the questions we seek to answer:
- What are the different variables that affect a song’s popularity on spotify?
- Can we accurately predict whether a song will be popular given numerical statistics of the song’s traits?
Through the use of the decision tree and random forest classifier, our data driven insight from the small dataset of 40,000 songs is that Instrumentalness is the most important factor in predicting the popularity of a song. More specifically, a lower instrumentalness (more vocals) is more popular with listeners.
However, our model is not perfect and there are many other factors that were not accounted for. For example, song themes, popularity of the music artist, and the shift in trends of music over the decades. Despite these limitations, our model provides a good starting point for exploring how different song traits impact the popularity of a song.
- @agentzhao - Data Cleaning, Analysis and Visualization
- @bohyanggg - Machine Learning and AI
- Python 3
- Jupyter Notebook
- Pandas
- Numpy
- Matplotlib
- Seaborn
- Scikit-Learn
- Wordcloud
- Yellowbrick
- Decision Tree (Uni and Multi-Variate) - Highest Univariate Decision tree with 68.9% accuracy
- Random Forest Classifier - Best Model, with a 76.2% accuracy
- Support Vector Clustering - 71.2% accuracy
- -KNeighborsClassifier - 70.6% accuracy