This project can pull stats related to your Spotify streaming history. Given a csv file containing your Spotify history, we can determine your information such as favorite song/artist using different data structures.
The members of our group like to listen to a vast amount of music on Spotify, however, unless you use a paid third party service there is no easy way to look in depth at your listening history year-round. Spotify does have a “Spotify Wrapped”, where they display your top artists, songs, and more intriguing data for that year, however this is only available for a short time at the end of the year and it does not include data from November or December. However, Spotify does give users the option to request their listening history, which after about a month they will send you your listening history in a series of json files in a zip file. This json data is difficult to use directly, so we decided to compare efficient ways to store and recall this data to be able to see our top songs and artists.
When the user runs the code, they are presented with six options: Create Unordered Map, Create Ordered Map, Output Unordered Map, Output Ordered Map, Search Unordered Map, Search Ordered Map, and End Program. When you select one of the first two options, you are then asked whether you want to create it sorted by artists or songs, it then creates a map/unordered_map that stores key: artist/song and value: # of streams of that artist/song. When you select to output, you are then given four more options, to Display Top Song Titles, Display Top Artist Names, Display All Song Titles, and Display All Artist Names. When you select Display Top, you are then prompted for the number of songs/artists you wish to display (n) and it will display your ‘n’ top songs/artist based on the number of total streams. When you select to search, you are given two more options, to Search by Artist or to Search by Song and it outputs the number of streams of that corresponding artist/song. Most of these functions are self-evident, and can be seen in the Video.
Our data came in the form of a series of .json files that each contain various data points for a series of songs. There are 21 data points per song: ts, username, platform, ms_played, conn_country, ip_addr_decrypted, user_agent_decrypted, master_metadata_track_name, master_metadata_album_artist_name, master_metadata_album_album_name, spotify_track_uri, episode_name episode_show_name, spotify_episode_uri, reason_start, reason_end, shuffle skipped, offline, offline_timestamp, and incognito_mode. However, of these, there are really only two that we care about: master_metadata_track_name and master_metadata_album_artist_name, that contain the song’s name and artist’s name respectively. We then converted these .json files to .csv files separated by tabs using an online converter, as CSV files are far easier to work with in C++ than Json files, which would likely require a separate, external library. Between the three of us, we listened to 117815 songs, each with their respective set of data.
In our project, we used Replit as an IDE to share code in real time as well as our own IDEs, Visual Studio and CLion, for developing new portions of the code. We used C++ as our programming language and used the following built-in libraries: standard, algorithm, fstream, iostream, sstream, string, unordered_map, map, vector, and chrono.
The primary data structures we used were maps, which are backed by red-black trees, and unordered maps, which are backed by hash tables. We also used a vector pair in order to get our top-played songs and artists. The primary comparisons of our project were between the unordered map and the ordered map. In terms of speed/efficiency when testing creating and printing out the maps, the unordered_map was more efficient, but the ordered map has the advantage of printing out in alphabetical order if all results are printed. We also used the built in C++ sort function, which uses a sort called Introsort that essentially combines Quicksort, Heapsort, and Insertion Sort.