Surprise is a Python library created by Nicolas Hug with the purpose of building and analyze recommender systems. Due to the improvement done by the Surprise's community regard their recommender system, in addition to don't falling in the "reinventing the wheel" crime, we will use some prediction algorithms based in the context of retail.
More about this library can be founded in link or link2
- Overall mathematical description : The NormalPredictor algorithm, is the simplest algorithm that predicts rating in a random way, assuming that the rating distribution is normal [], The parameters are obtained through the Maximum likelihood estimation (Basically that under the assumed statistical model the observed data is most probable).
- Assumptions : Assumes that the distribution of the rating is normal.
- Pros and Cons
- Pros: Easiest predictor, intuitive and fast.
- Cons: Worst predictions, performance strongly linked to variability.
The next 3 algorithms are basically the same, In other words the procedure to predict the ratings are quite similar based in "k-nearest neighbors approach", The only 2 things that are changed is how we measure proximity or identify the neighborhoods, this parameter is named Similarity measure , the other difference is if user-information is considered (like his mean rating).
The basic K-Nearest Neighbors (KNN) approach works in the next way:
1. Is defined the Similarity measure,the user and item of interest
2. Calculate the Similarity measure for all other users (or items) that have rated the item (have been rated by the user).
3. Sort in descending way and select the first k similar users (or items).
4. Predict the rating relying on the previous list of similar users.
Knowing this, the next KNN algorithms differ in the step 4.
- Overall mathematical description : As was said in the previous section KNNBasic is an algorithm that estimates the rating based completely in the k-nearest neighbors (depending in the Similarity measure in this case the used is Mean Squared Difference), Once the k-neighboors are determinated, The rating () is predicted as follows:
Where is the set of the k neighbors that have rated the item i. is the Similarity measure between u,v choosen and is the rating of the user v over the item i.
- Assumptions : This algorithm assumes that similar things exist in close proximity i.e that users having evaluated ratings over items similarly are users similar or have the same "taste" about the items.
- Pros and Cons
- Pros: Better predictions than "NormalPredictor",non-parametric.
- Cons: Performance linked to variability and slow in contrast of "NormalPredictor" and Co-clustering.
- Overall mathematical description : This case is similar to the previous algorithm, but in addition is added the mean rating to the step 4 ( i.e how we predict the ratings as of the k nearest neighbors) in the next way:
Where is the mean of all ratings given by user x. the remaining terminology is analogous to the KNNBasic
- Assumptions : This algorithm assumes that similar things exist in close proximity i.e that users having evaluated over items similarly are similar users or have the same "taste" about the items. In addition assumes the mean-rating of users is enough relevant to be considered.
- Pros and Cons
- Pros: Better predictions than "NormalPredictor",non-parametric, More information than KKBasic.
- Cons: Performance linked to variability and slow in contrast of "NormalPredictor" and Co-clustering.
- Overall mathematical description : Finally for this one is used the z-score of user, so is the most complete between the KNN-algorithms, the rating is predicted in the following manner:
Where is the standard deviation of all ratings given by user x. The remaining terminology is analogous to the KNNWithMeans.
- Assumptions : This algorithm assumes that similar things exist in close proximity i.e that users having evaluated over items similarly are similar users or have the same "taste" about the items. In addition assumes the z-score normalization of users is enough relevant to be considered.
- Pros and Cons
- Pros: Better predictions than "NormalPredictor",non-parametric and most complete KNN-algorithm, Can be modified in real-time .
- Cons: Performance linked to variability and slow in contrast of "NormalPredictor" and Co-clustering.
- Overall mathematical description : Co-clustering (or simultaneous clustering ) prediction algorithm is a prediction algorithm que make use of the co-clustering technique over utems and items, i.e All the rating predictions are made over a cluster of similar users AND similar items: The co-cluster can be viewed as users with similar item likes or items liked by similar users ( In other words the lecture can be done in both directions). The rating are predicted as follows:
Where is the average rating of co-cluster , is the average rating of user-u’s cluster, and is the average rating of item-i’s cluster. If the user is new (hasn't rated anything), the prediction is . If the item is new, the prediction is . If both the user and the item are new, the prediction is . Consequently , and are the mean of all ratings given by user u, the mean of all ratings given to item i and the mean of all ratings respectively.
The clusters are choosen to be the solution of the next least-square optimization problem:
Where is a parameter that indicates that rating exist, In summary that the approximation matrix is the least squares solution that preserves the user, item and co-cluster averages
- Assumptions : Assumes that similar users and items, are in close proximity (like KNN).
- Pros and Cons
- Pros: Scalability and Computational efficiency and Acurateness.
- Cons: Hardest predictor, Cold-start problem.
- Overall mathematical description : Like the name indicates, the Slope one predictor algorithm, is a predictor of the form f(x)=x+b, which principle is based in the "Popularity difference" , this term refers to the case when one item is better liked than another item. This could be measured for example with the average difference of rating between this two items, the same difference could be used to predict ratings, assuming this difference is the expected difference between the user's of interests rating over this two items, resulting in a linear predictor. The predict rating is calculated as follows:
Where Is the predicted rating, is the mean of all ratings given by user u, is the set of relevant items, i.e. the set of items j rated by u that also have at least one common user with i and is defined as the average difference between the ratings of i and those of j:
- Assumptions : This algorithm assume that the predictor or approximate function best suited to the rating behavior is linear.
- Pros and Cons
- Pros: Very decent predictions,non-parametric, Easy to implement and Fast .
- Cons: In this algorithm the number of ratings is not considered, this could induce bad predictions.