diff --git a/.gitignore b/.gitignore
index a81c8ee1..868fe908 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,8 @@
__pycache__/
*.py[cod]
*$py.class
+# testing notebooks
+.ipynb
# C extensions
*.so
diff --git a/.streamlit/config.toml b/.streamlit/config.toml
new file mode 100644
index 00000000..11b2a98b
--- /dev/null
+++ b/.streamlit/config.toml
@@ -0,0 +1,5 @@
+[theme]
+base="dark"
+primaryColor="#844bff"
+backgroundColor="#454b64"
+secondaryBackgroundColor="#656fdc"
diff --git a/README.md b/README.md
index 516993b9..ece338c1 100644
--- a/README.md
+++ b/README.md
@@ -1,24 +1,10 @@
-# Streamlit-based Recommender System
-#### EXPLORE Data Science Academy Unsupervised Predict
-
-## 1) Overview
-
-
-
-This repository forms the basis of *Task 2* for the **Unsupervised Predict** within EDSA's Data Science course. It hosts template code which will enable students to deploy a basic recommender engine based upon the [Streamlit](https://www.streamlit.io/) web application framework.
-
-As part of the predict, students are expected to expand on this base template; improving (and fixing) the given base recommender algorithms, as well as providing greater context to the problem and attempted solutions through additional application pages/functionality.
-
-#### 1.1) What is a Recommender System?
+# Streamlit-based Movie Recommender System
+ Two primary approaches are used in recommender systems are content-based and collaborative-based filtering. In content-based filtering this similarity is measured between items based on their properties, while collaborative filtering uses similarities amongst users to drive recommendations. At a fundamental level, these systems operate using similarity, where we try to match people (users) to things (items).
+
+### 1) Screen recording
[](https://youtu.be/Eeg1DEeWUjA)
-Recommender systems are the unsung heroes of our modern technological world. Search engines, online shopping, streaming multimedia platforms, news-feeds - all of these services depend on recommendation algorithms in order to provide users the content they want to interact with.
-
-At a fundamental level, these systems operate using similarity, where we try to match people (users) to things (items). Two primary approaches are used in recommender systems are content-based and collaborative-based filtering. In content-based filtering this similarity is measured between items based on their properties, while collaborative filtering uses similarities amongst users to drive recommendations.
-
-Throughout the course of this Sprint, you'll work on defining this brief explanation further as you come to understand the theoretical and practical aspects of recommendation algorithms.
-
#### 1.2) Description of contents
Below is a high-level description of the contents within this repo:
@@ -26,91 +12,34 @@ Below is a high-level description of the contents within this repo:
| File Name | Description |
| :--------------------- | :-------------------- |
| `edsa_recommender.py` | Base Streamlit application definition. |
-| `recommenders/collaborative_based.py` | Simple implementation of collaborative filtering. |
-| `recommenders/content_based.py` | Simple implementation of content-based filtering. |
+| `recommenders/collaborative_based.py` | Implementation of collaborative filtering. |
+| `recommenders/content_based.py` | Implementation of content-based filtering. |
| `resources/data/` | Sample movie and rating data used to demonstrate app functioning. |
| `resources/models/` | Folder to store model and data binaries if produced. |
| `utils/` | Folder to store additional helper functions for the Streamlit app |
## 2) Usage Instructions
-
-#### 2.1) Improving your recommender system
-The primary goal of this task within the Unsupervised Predict is to make students aware of (and ultimately competent in handling) the complexities associated with deploying recommender algorithms in a live environment. These algorithms are resource heavy - requiring high amounts of memory and processing power when associated with larger data sources. As such, you'll need to research and determine the modifications required to deploy this app so that it produces appropriate recommendations with as little latency as possible. This will not be a trivial task, but we know you'll give your best shot :star:!
-
-In order to make your improvements, we have a few instructions to guide you:
- - **Only modify the sections of the base `edsa_recommender.py` file which have been indicated**. The code which has been designated to be left unaltered is used to provide a standard interface during our automated testing of your app. Changing this code may result in our system assigning you a mark of 0 :(
-
- - **Do not modify the function name and signature for the `*_model` functions in `collaborative_based.py` and `content_based.py`**. As stated above, these functions are used during automated testing. You are, however, supposed to modify/improve the content of these functions with your algorithms developed within Task 1 of the Unsupervised Predict.
-
- - **Add additional data where needed**. The data files which we've provided you within this repo template serve only as examples. For correct/improved functioning, you may need to add additional data files from sources such as the Kaggle challenge in Task 1, or the S3 bucket provided to you during this sprint. (**NB:** Github doesn't accept large file uploads during a commit. As such, you may need to keep only local copies of your data files. Have a look at how to exclude files from your git commits using a `.gitignore` file [here](https://docs.github.com/en/github/using-git/ignoring-files))
-
- - **Focus on both algorithmic approaches**. There will be trade-offs for using either collaborative or content based filtering. Try to discover these by attempting to use both approaches in your app.
-
- - **Use computing power if necessary**. As mentioned before, the compute resources required for this task are heavy. As such, when the need arises, switch to an AWS instance with greater computing power. (**NB:** We'll require that you restrict this to one large AWS instance (t2.2xlarge/t2.xlarge) per team).
-
-
-#### 2.2) Creating a copy of this repo
-
-| :zap: WARNING :zap: |
-| :-------------------- |
-| Do **NOT** *clone* this repository. Instead follow the instructions in this section to *fork* the repo. |
-
-As described within the Predict instructions for the Unsupervised Sprint, this code represents a *template* from which to extend your own work. As such, in order to modify the template, you will need to **[fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo)** this repository. Failing to do this will lead to complications when trying to work on the web application remotely.
-
-To fork the repo, simply ensure that you are logged into your GitHub account, and then click on the 'fork' button at the top of this page.
-
-#### 2.3) Running the recommender system locally
-
-As a first step to becoming familiar with our web app's functioning, we recommend setting up a running instance on your own local machine.
-
-To do this, follow the steps below by running the given commands within a Git bash (Windows), or terminal (Mac/Linux):
-
- 1. Ensure that you have the prerequisite Python libraries installed on your local machine:
-
- ```bash
- pip install -U streamlit numpy pandas scikit-learn
- conda install -c conda-forge scikit-surprise
- ```
-
- 2. Clone the *forked* repo to your local machine.
-
- ```bash
- git clone https://github.com/{your-account-name}/unsupervised-predict-streamlit-template.git
- ```
-
- 3. Navigate to the base of the cloned repo, and start the Streamlit app.
-
+### 2.1) Running the recommender locally
+It is recommended that the following requirements made available in a virtual ennvironment:
+ -Python 3.6+
+ -pip3
+ -streamlit
+ -numpy
+ -pandas
+ -scikit-learn
+ -scikit-surprise
+To run the app on a local machine will require the user to navigate to the base of the cloned repo and run the following command in the created virtual environment on git bash:
```bash
- cd unsupervised-predict-streamlit-template/
streamlit run edsa_recommender.py
```
-
- If the web server was able to initialise successfully, the following message should be displayed within your bash/terminal session:
-
-```
- You can now view your Streamlit app in your browser.
-
- Local URL: http://localhost:8501
- Network URL: http://192.168.43.41:8501
-```
-
-You should also be automatically directed to the base page of your web app. This should look something like:
-
-
-
-Congratulations! You've now officially deployed your web-based recommender engine!
-
-While we leave the modification of your recommender system up to you, the latter process of cloud deployment is outlined within the next section.
+ If the web server was able to initialise successfully, a local URL and Network URL should be displayed within your bash/terminal session. Copy and paste andy of these URLs in any broswer to access the app if not automatically directed to app upon running the code above.
+| :zap: WARNING! :zap: |
+| :-------------------- |
+| This application uses extensive memory to generate results for the recommender system mode. Therefore, it is not recommendend to click the recommender button when running the app locally. Instead we recommend deploying this app on a larger AWS instance with sufficient memory (t2.2xlarge/t2.xlarge) and storage(>= 30 GiB) |
#### 2.4) Running the recommender system on a remote AWS EC2 instance
-| :zap: WARNING :zap: |
-| :-------------------- |
-| As outlined in the previous section, we recommend deploying this app on a larger AWS instance with sufficient memory (t2.2xlarge/t2.xlarge). Note that a restriction of one large compute instance per team will be applied. |
-
-The following steps will enable you to run your recommender system on a remote EC2 instance, allowing it to the accessed by any device/application which has internet access.
-
-Within these setup steps, we will be using a remote EC2 instance, which we will refer to as the ***Host***, in addition to our local machine, which we will call the ***Client***. We use these designations for convenience, and to align our terminology with that of common web server practices. In cases where commands are provided, use Git bash (Windows) or Terminal (Mac/Linux) to enter these.
+The following steps will enable you to run your recommender system on a remote EC2 instance, allowing it to generate recommendation results.
1. Ensure that you have access to a running AWS EC2 instance with an assigned public IP address.
@@ -129,11 +58,6 @@ conda install -c conda-forge scikit-surprise
git clone https://github.com/{your-account-name}/unsupervised-predict-streamlit-template.git
cd unsupervised-predict-streamlit-template/
```
-
-| :information_source: NOTE :information_source: |
-| :-------------------- |
-| In the following steps we make use of the `tmux` command. This programme has many powerful functions, but for our purposes, we use it to gracefully keep our web app running in the background - even when we end our `ssh` session. |
-
4. Enter into a Tmux window within the current directory. To do this, simply type `tmux`.
5. Start the Streamlit web app on port `5000` of the host
@@ -162,22 +86,8 @@ Where the specific `Network` and `External` URLs correspond to those assigned to
Where the above public IP address corresponds to the one given to your AWS EC2 instance.
- If successful, you should see the landing page of your recommender system app (image identical to that for the local setup instructions).
-
-**[On the Host]:**
-
-7. To keep your app running continuously in the background, detach from the Tmux window by pressing `ctrl + b` and then `d`. This should return you to the view of your terminal before you opened the Tmux window.
-
- To go back to your Tmux window at any time (even if you've left your `ssh` session and then return), simply type `tmux attach-session`.
-
- To see more functionality of the Tmux command, type `man tmux`.
-
-Having run your web app within Tmux, you should be now free to end your ssh session while your webserver carries on purring along. Well done :zap:!
-
-## 3) FAQ
-
-This section of the repo will be periodically updated to represent common questions which may arise around its use. If you detect any problems/bugs, please [create an issue](https://help.github.com/en/github/managing-your-work-on-github/creating-an-issue) and we will do our best to resolve it as quickly as possible.
+ If successful, you should see the landing page of the recommender system app.
+ #### NB: The app may not work as described in gthe screen recording because the code to implement the explorer functionality has been excluded.The user is encouraged to write their own code to implement the functionality of the explorer page and insert relevant links in the buttons.
-We wish you all the best in your learning experience :rocket:

diff --git a/edsa_recommender.py b/edsa_recommender.py
index f1192112..f337ecfd 100644
--- a/edsa_recommender.py
+++ b/edsa_recommender.py
@@ -36,28 +36,27 @@
from utils.data_loader import load_movie_titles
from recommenders.collaborative_based import collab_model
from recommenders.content_based import content_model
+import scipy as sp
# Data Loading
title_list = load_movie_titles('resources/data/movies.csv')
# App declaration
def main():
-
# DO NOT REMOVE the 'Recommender System' option below, however,
# you are welcome to add more options to enrich your app.
- page_options = ["Recommender System","Solution Overview"]
-
+ page_options = ["Recommender System","Explorer","Solution Overview","Contact Us"]
+ st.sidebar.image('resources/imgs/flix.png',use_column_width=True)
# -------------------------------------------------------------------
# ----------- !! THIS CODE MUST NOT BE ALTERED !! -------------------
# -------------------------------------------------------------------
- page_selection = st.sidebar.selectbox("Choose Option", page_options)
+ page_selection = st.sidebar.selectbox("Choose App Mode", page_options)
if page_selection == "Recommender System":
# Header contents
- st.write('# Movie Recommender Engine')
- st.write('### EXPLORE Data Science Academy Unsupervised Predict')
st.image('resources/imgs/Image_header.png',use_column_width=True)
+ st.write('# Movie Recommender Engine')
# Recommender System algorithm selection
- sys = st.radio("Select an algorithm",
+ sys = st.radio("### Select an algorithm",
('Content Based Filtering',
'Collaborative Based Filtering'))
@@ -100,9 +99,200 @@ def main():
# -------------------------------------------------------------------
# ------------- SAFE FOR ALTERING/EXTENSION -------------------
+ st.markdown(#Recommender background
+ f"""
+
+ """,
+ unsafe_allow_html=True
+ )
+
if page_selection == "Solution Overview":
+
st.title("Solution Overview")
- st.write("Describe your winning approach on this page")
+ st.write("### Explore the business and technical aspects of our solution")
+ tab1,tab2 = st.tabs(["Business","Technical"])
+ with tab1:
+
+ st.write("""To build an effective recommendation algorithm, we need access to a vast amount
+ of user data, including historical movie preferences, ratings, and interactions.
+ The business aspect involves setting up mechanisms to collect, store, and manage
+ this data securely and ethically. The success of the app relies on providing a
+ seamless and engaging user experience. The algorithm should integrate seamlessly
+ with the app's interface, ensuring that movie recommendations are prominently displayed
+ and easily accessible to users. As the user base grows, the algorithm should be scalable
+ to handle increasing data and user interactions efficiently. The business needs to plan
+ for server infrastructure and data processing capabilities that can handle this growth.
+ Given the sensitivity of user data, ensuring strict data privacy and security measures
+ will be a top priority.""")
+
+ #st.image ("resources/imgs/flix.png")
+
+ with tab2:
+
+ st.write("""The technical solution involves integrating data from multiple sources,
+ including the MovieLens dataset and movie content data from IMDb. This may require data
+ preprocessing and alignment to create a unified dataset.
+ """)
+ st.write(" ")
+ st.write("""
+ The data must undergo thorough cleaning to handle missing values, outliers, and
+ inconsistencies that could adversely affect the accuracy of the recommendation
+ algorithm. Extracting relevant features from the data is crucial for content-based
+ filtering. Features such as movie genre, cast, director, and release year will be used
+ to create user profiles and movie representations.
+ """)
+ st.write(" ")
+ st.write("""Implementing collaborative filtering
+ techniques, such as user-based or item-based filtering, will help identify similar users
+ and movies to make accurate predictions. Utilizing content-based filtering, the algorithm
+ will match user preferences with movie attributes to recommend similar movies that the
+ user has not yet viewed.
+ """)
+ st.write(" ")
+ st.write("""The selected models will be trained using historical user ratings
+ and movie attributes. The technical team will fine-tune the models to achieve the highest
+ predictive accuracy. To support scalability and performance, the technical solution may
+ involve cloud-based infrastructure, such as AWS or Google Cloud, to handle data storage,
+ processing, and user interactions.""")
+
+
+
+
+
+
+
+ st.markdown( # Explorer background
+ f"""
+
+ """,
+ unsafe_allow_html=True
+ )
+ if page_selection == "Explorer":
+ # Header contents
+ #st.image('resources/imgs/Image_header.png',use_column_width=True)
+ st.title('Explorer')
+ st.write('### Explore movies in associate streaming platforms')
+ tab1,tab3 = st.tabs(["Custom Pix","Coming Soon"])
+
+ with tab1:
+ st.write('### Pick Filters')
+
+ sys2 = st.radio("Filters",
+ ("Genre",
+ "Cast Member","Release Date"))
+ if sys2 == "Genre":
+ movie_1 = st.selectbox('Genre',('All','Action','Comedy','Drama','Romance','Thriller'))
+ if movie_1 != 'Action':
+ # Perform actions or show output based on the selected genre
+ # Display 10 drama movies
+ st.write(f"Movies in the genre: {movie_1}")
+ # Create a button to redirect user to another website
+ button_clicked = st.button("Visit Associate Streaming Platform")
+
+ if button_clicked:
+ # Define the URL based on the selected cast member
+ imdb_url = get_imdb_url(movie_2)
+
+ if imdb_url:
+ # Redirect the user to the IMDb page
+ st.markdown(f"Go to IMDb", unsafe_allow_html=True)
+ else:
+ st.write("IMDb URL not available")
+ if sys2 == "Cast Member":
+ movie_2 = st.selectbox('Cast Member',('Denzel Washington','Tom Hardy','Will Smith','Jackie Chan','Bruce Lee','Luhle Shumi','Martin Lawrence','Martin Briestol'))
+ st.write(f"Movies by: {movie_2}")
+ # Create a button to redirect user to another website
+ button_clicked = st.button("Visit Associate Streaming Platform")
+
+ if button_clicked:
+ # Define the URL based on the selected cast member
+ imdb_url = get_imdb_url(movie_2)
+
+ if imdb_url:
+ # Redirect the user to the IMDb page
+ st.markdown(f"Go to IMDb", unsafe_allow_html=True)
+ else:
+ st.write("IMDb URL not available")
+ if sys2 == "Release Date":
+ movie_3 = st.selectbox('Release Date',('2019','2018','2016-2017','2010-2015','2001-2009','1994-2000'))
+ st.write(f"Movies relesed during: {movie_3}")
+ # Create a button to redirect user to another website
+ button_clicked = st.button("Visit Associate Streaming Platform")
+
+ if button_clicked:
+ # Define the URL based on the selected cast member
+ imdb_url = get_imdb_url(movie_2)
+
+ if imdb_url:
+ # Redirect the user to the IMDb page
+ st.markdown(f"Go to IMDb", unsafe_allow_html=True)
+ else:
+ st.write("IMDb URL not available")
+ with tab3:
+ # Update the URL params to trigger a redirect
+ st.experimental_set_query_params(redirect=True)
+ st.write("Redirecting...")
+
+
+ if page_selection == "Contact Us":
+ st.title('Contact Us')
+ st.write("Please fill out the form below to get in touch with us.")
+
+ # Display input fields
+ # Display input fields
+ name_placeholder = st.empty()
+ name = name_placeholder.text_input("Your Name")
+
+ email_placeholder = st.empty()
+ email = email_placeholder.text_input("Your Email")
+
+ message_placeholder = st.empty()
+ message = message_placeholder.text_area("Message")
+
+ # Submit button
+ if st.button("Submit"):
+ # Process the form data here (e.g., send an email, store in a database)
+
+ # Display a success message
+ st.success("Thank you for reaching out! We will get back to you soon.")
+
+ # Reset input fields
+ name_placeholder.text_input("Your Name", value="", key="reset_name")
+ email_placeholder.text_input("Your Email", value="", key="reset_email")
+ message_placeholder.text_area("Message", value="", key="reset_message")
+
+def get_imdb_url(cast_member):
+ # Placeholder function to get IMDb URL based on the cast member
+ # Replace this with your own logic to generate the correct IMDb URL
+
+ imdb_urls = {
+ 'Denzel Washington': 'https://www.imdb.com/name/nm0000243/',
+ 'Tom Hardy': 'https://www.imdb.com/name/nm0362766/',
+ 'Will Smith': 'https://www.imdb.com/name/nm0000226/',
+ 'Jackie Chan': 'https://www.imdb.com/name/nm0000329/'
+ }
+
+
+
+
+
+
+
+
+
+
+
+
# You may want to add more sections here for aspects such as an EDA,
# or to provide your business pitch.
diff --git a/recommenders/collaborative_based.py b/recommenders/collaborative_based.py
index 861b5d8f..dd0e97aa 100644
--- a/recommenders/collaborative_based.py
+++ b/recommenders/collaborative_based.py
@@ -1,5 +1,4 @@
"""
-
Collaborative-based filtering for item recommendation.
Author: Explore Data Science Academy.
@@ -24,125 +23,93 @@
Description: Provided within this file is a baseline collaborative
filtering algorithm for rating predictions on Movie data.
-
"""
-
-# Script dependencies
+ # Script dependencies
import pandas as pd
import numpy as np
-import pickle
-import copy
-from surprise import Reader, Dataset
-from surprise import SVD, NormalPredictor, BaselineOnly, KNNBasic, NMF
-from sklearn.metrics.pairwise import cosine_similarity
-from sklearn.feature_extraction.text import CountVectorizer
+import streamlit as st
+from sklearn.neighbors import NearestNeighbors
+from scipy.sparse import csr_matrix
+
+# Suppress cell warnings for a cleaner notebook
+import warnings
+warnings.filterwarnings('ignore')
# Importing data
movies_df = pd.read_csv('resources/data/movies.csv',sep = ',')
ratings_df = pd.read_csv('resources/data/ratings.csv')
ratings_df.drop(['timestamp'], axis=1,inplace=True)
-# We make use of an SVD model trained on a subset of the MovieLens 10k dataset.
-model=pickle.load(open('resources/models/SVD.pkl', 'rb'))
-
-def prediction_item(item_id):
- """Map a given favourite movie to users within the
- MovieLens dataset with the same preference.
-
- Parameters
- ----------
- item_id : int
- A MovieLens Movie ID.
-
- Returns
- -------
- list
- User IDs of users with similar high ratings for the given movie.
-
- """
- # Data preprosessing
- reader = Reader(rating_scale=(0, 5))
- load_df = Dataset.load_from_df(ratings_df,reader)
- a_train = load_df.build_full_trainset()
+ # Below function creates a pivot table
- predictions = []
- for ui in a_train.all_users():
- predictions.append(model.predict(iid=item_id,uid=ui, verbose = False))
- return predictions
+def movie_data(movie):
+ # New pivot where each column would represent each unique userId and each row represents each unique movieId
+ movie_pivot = movie.pivot(index = 'movieId', columns = 'userId', values = 'rating')
+ # Convert NAN to zero value
+ movie_pivot.fillna(0, inplace = True)
-def pred_movies(movie_list):
- """Maps the given favourite movies selected within the app to corresponding
- users within the MovieLens dataset.
-
- Parameters
- ----------
- movie_list : list
- Three favourite movies selected by the app user.
-
- Returns
- -------
- list
- User-ID's of users with similar high ratings for each movie.
-
- """
- # Store the id of users
- id_store=[]
- # For each movie selected by a user of the app,
- # predict a corresponding user within the dataset with the highest rating
- for i in movie_list:
- predictions = prediction_item(item_id = i)
- predictions.sort(key=lambda x: x.est, reverse=True)
- # Take the top 10 user id's from each movie with highest rankings
- for pred in predictions[:10]:
- id_store.append(pred.uid)
- # Return a list of user id's
- return id_store
+ return movie_pivot
+ # Below function finds nearest neighbors and returns recommended movie list using cosine similarity between movies
+# @st.cache(show_spinner=False, suppress_st_warning=True)
+@st.cache_data(show_spinner=False)
# !! DO NOT CHANGE THIS FUNCTION SIGNATURE !!
-# You are, however, encouraged to change its content.
+# You are, however, encouraged to change its content.
def collab_model(movie_list,top_n=10):
- """Performs Collaborative filtering based upon a list of movies supplied
- by the app user.
-
- Parameters
- ----------
- movie_list : list (str)
- Favorite movies chosen by the app user.
- top_n : type
- Number of top recommendations to return to the user.
-
- Returns
- -------
- list (str)
- Titles of the top-n movie recommendations to the user.
-
- """
-
- indices = pd.Series(movies_df['title'])
- movie_ids = pred_movies(movie_list)
- df_init_users = ratings_df[ratings_df['userId']==movie_ids[0]]
- for i in movie_ids :
- df_init_users=df_init_users.append(ratings_df[ratings_df['userId']==i])
- # Getting the cosine similarity matrix
- cosine_sim = cosine_similarity(np.array(df_init_users), np.array(df_init_users))
- idx_1 = indices[indices == movie_list[0]].index[0]
- idx_2 = indices[indices == movie_list[1]].index[0]
- idx_3 = indices[indices == movie_list[2]].index[0]
- # Creating a Series with the similarity scores in descending order
- rank_1 = cosine_sim[idx_1]
- rank_2 = cosine_sim[idx_2]
- rank_3 = cosine_sim[idx_3]
- # Calculating the scores
- score_series_1 = pd.Series(rank_1).sort_values(ascending = False)
- score_series_2 = pd.Series(rank_2).sort_values(ascending = False)
- score_series_3 = pd.Series(rank_3).sort_values(ascending = False)
- # Appending the names of movies
- listings = score_series_1.append(score_series_1).append(score_series_3).sort_values(ascending = False)
- recommended_movies = []
- # Choose top 50
- top_50_indexes = list(listings.iloc[1:50].index)
- # Removing chosen movies
- top_indexes = np.setdiff1d(top_50_indexes,[idx_1,idx_2,idx_3])
- for i in top_indexes[:top_n]:
- recommended_movies.append(list(movies_df['title'])[i])
- return recommended_movies
+ # Use function to merge dataframse and select subset based on highest coutn of movie ratings
+ movie = movies_df.merge(ratings_df, how = 'left', on='movieId')
+ # Convert df to a pivot table and replace NAN value with zero
+ movie_pivot = movie_data(movie)
+ # Reduce sparsity to assist with computation time on large dataset
+ csr_item = csr_matrix(movie_pivot.values)
+ movie_pivot.reset_index(inplace=True)
+ # Initiate KNN model using NearestNeighbors and Cosine similarity
+ knn_item = NearestNeighbors(metric = 'cosine', algorithm = 'brute', n_neighbors = 20, n_jobs = -1)
+ knn_item.fit(csr_item)
+ #movie_list2 = [x[:-7] for x in movie_list]
+ # Empty list to store recommended movieID's
+ full_list = []
+ # Check if selected movie is in the moevie dataframe
+ movie_list_1 = movies_df.loc[movies_df['title'] ==movie_list[0]]
+ movie_list_2 = movies_df.loc[movies_df['title'] ==movie_list[1]]
+ movie_list_3 = movies_df.loc[movies_df['title'] ==movie_list[2]]
+
+ if len(movie_list_1):
+ movie_index_1a = movie_list_1.iloc[0]['movieId'] # finds movieId of selected movie
+ movie_index_1 = movie_pivot[movie_pivot['movieId'] == movie_index_1a].index[0] # finds movie index in pivot table
+ distances_1 , indices_1 = knn_item.kneighbors(csr_item[movie_index_1],n_neighbors=top_n+1) # find 10 most similar movies with KNN model (index of movie and distance)
+ # index of recommended movies with distance in sorted list - most similar first
+ recommend_movie_indices_1 = sorted(list(zip(indices_1.squeeze().tolist(),distances_1.squeeze().tolist())),key=lambda x: x[1])[:0:-1] # excluding the selected movie
+ recommend_movie_indices_1 = recommend_movie_indices_1[0:4]
+
+ # Calculate the same for movie 2 and 3 as per movie 1 from movie list:
+
+ if len(movie_list_2):
+ movie_index_2a = movie_list_2.iloc[0]['movieId']
+ movie_index_2 = movie_pivot[movie_pivot['movieId'] == movie_index_2a].index[0]
+ distances_2 , indices_2 = knn_item.kneighbors(csr_item[movie_index_2],n_neighbors=top_n+1)
+ recommend_movie_indices_2 = sorted(list(zip(indices_2.squeeze().tolist(),distances_2.squeeze().tolist())),key=lambda x: x[1])[:0:-1]
+ recommend_movie_indices_2 = recommend_movie_indices_2[4:7]
+
+ if len(movie_list_3):
+ movie_index_3a = movie_list_3.iloc[0]['movieId']
+ movie_index_3 = movie_pivot[movie_pivot['movieId'] == movie_index_3a].index[0]
+ distances_3 , indices_3 = knn_item.kneighbors(csr_item[movie_index_3],n_neighbors=top_n+1)
+ recommend_movie_indices_3 = sorted(list(zip(indices_3.squeeze().tolist(),distances_3.squeeze().tolist())),key=lambda x: x[1])[:0:-1]
+ recommend_movie_indices_3 = recommend_movie_indices_3[7:10]
+
+
+ # Combine above three lists and sort from closest to lowest distance
+ full_list = recommend_movie_indices_1 + recommend_movie_indices_2 + recommend_movie_indices_3
+ full_list = sorted(full_list, key = lambda x:x[1], reverse = False)
+
+ recommend_list = [] # list for recommended movies
+ for item in full_list: # loop through recommended movies to find title of movies
+ movie_index = movie_pivot.iloc[item[0]]['movieId']
+ idx = movies_df[movies_df['movieId'] == movie_index].index
+ recommend_list.append({'Title':movies_df['title'].iloc[idx].values[0],'Distance':item[1]}) # extract title of movie
+ df_recommend = pd.DataFrame(recommend_list) # convert to dataframe
+
+ top_recommendations = df_recommend['Title'][:10].tolist()
+
+ return top_recommendations
diff --git a/recommenders/content_based.py b/recommenders/content_based.py
index ed7df363..d4e36b0c 100644
--- a/recommenders/content_based.py
+++ b/recommenders/content_based.py
@@ -31,8 +31,8 @@
import os
import pandas as pd
import numpy as np
-from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
+from sklearn.metrics.pairwise import cosine_similarity
# Importing data
movies = pd.read_csv('resources/data/movies.csv', sep = ',')
diff --git a/resources/data/zipped.zip b/resources/data/zipped.zip
new file mode 100644
index 00000000..6a4ced48
Binary files /dev/null and b/resources/data/zipped.zip differ
diff --git a/resources/imgs/flix.png b/resources/imgs/flix.png
new file mode 100644
index 00000000..4dbed12b
Binary files /dev/null and b/resources/imgs/flix.png differ