1. Reliability estimation
Item-item Collaborative Filtering (CF) is commonly regarded as one of the most effective approaches to build Recommender Systems. Given a set of users, a set of items and a set of ratings, item-based CF techniques use available ratings to build relationship between items in the form of an item similarity matrix
.Although research on item-based algorithms is not new, the evaluation of their predictive capabilities is purely empirical and still an open issue. In most works, predictions are expected to be more accurate for items and users associated with many ratings and low variance of ratings. Adomavicius et al.(Adomavicius et al., 2007) observe that recommendations tend to be more accurate for users and items exhibiting lower rating variance. Cremonesi et al. (Cremonesi et al., 2012) empirically show that there is some correlation between recall of recommendations and number of ratings in the user profile. A disadvantage of these approaches is that confidence estimation is based only on user and item ratings, and does not take into account the properties of the prediction model, which could overlook some valuable information.
We show that, under the hypothesis of a perfect model (i.e., an oracle able to make perfect recommendations), item-based methods are analogous to an eigenvalue problem
, where each user profile (i.e., the vector of ratings from a user) is a left eigenvector of the similarity matrix. Moreover, each user is associated with an eigenvalue, and the magnitude of the eigenvalue is linearly correlated with the accuracy of recommendations for that user. We call this analogy theeigenvector analogy.
2. Eigenvector Analogy
A generic item-based model (either CF, CBF or hybrid) predicts the ratings of a user for all items as:
where is the profile of the user and is a similarity matrix. We now assume to have an ideal item-based recommender, able to predict all the user ratings (both known and unknown). More formally, this translates into two assumptions:
- Assumption 1: Perfect Predictions.:
Estimated ratings are identical to the ratings in the user profile . Because of this assumption, we can write .
- Assumption 2: Perfect Knowledge.:
We have the complete knowledge of all the ratings in the user-rating matrix.
Under these assumptions, the item-based model described by (1) is analogous to a left eigenvector problem
where is a left eigenvector of matrix and is the corresponding eigenvalue. In the eigenvector formulation (2), predicted ratings are equivalent to the ratings in the user profile multiplied by the eigenvalue associated with the user. For each user, the corresponding eigenvalue can either flatten out or amplify the differences between predicted ratings. The closer is an eigenvalue to zero, the more difficult will be for the item-based method to correctly rank items and to distinguish between relevant and non-relevant items.
2.1. Learning the eigenvalues
In order to provide an estimation of the user’s eigenvalue, we rewrite (2) in matrix format
where is the user-rating matrix, is the similarity matrix and is a diagonal matrix with eigenvalues on its diagonal. In order to satisfy the model described by (3), we need to find a similarity matrix
, other than the identity matrix, such that all the user profilesin are left eigenvectors of . This problem is called the inverse eigenvalue problem which, if we also know all the eigenvalues, has the following exact solution: . Where
is the Moore-Penrose pseudoinverse, which uses singular-value decomposition. We call this modelEigenSim. The eigenvalues are then estimated via SGD, optimizing BPR.
We evaluated the performance of EigenSim on two different datasets, namely Movielens 10M (70K users, 10M interactions) and subsample of Netflix (40K users, 1.25M interactions), the dataset used for the Netflix Prize. The datasets are split by selecting randomly ratings into training (60%), validation (20%) and test (20%) set.
4. Results discussion
The main focus of our experiment is to study if there is a correlation between the eigenvalue associated with a user and the quality of recommendations it receives, measured in terms of MAP. We also investigate if this correlation depends on the specific algorithm used for recommendations. Moreover, as previous work suggested, we study if there is correlation between the user’s number of ratings and his/her eigenvalue.
Figure 1 presents the performance of EigenSim and Slim on the Movielens 10M dataset. The 70 thousands users of the dataset have been ranked based on their eigenvalue and partitioned into 10 groups of 7K users each. Each point in the figure represents the MAP for each group. The corresponding eigenvalue is the median of the eigenvalues of the users in the group. The figure clearly shows a linear correlation between the quality of recommendations and the eigenvalue of the users. When providing recommendations to users with very low eigenvalues the quality of recommendations drops to zero, regardless of the algorithm, while for users with large eigenvalues the quality of recommendations increases for both algorithms. In terms of MAP, the quality of ItemKNN recommendations for users with large eigenvalues () is almost ten times the quality for users with low eigenvalues () and twice the average quality across all the users. It is interesting to observe that eigenvalues affect the quality of recommendations also for an item-based algorithm (Slim) not based on the eigenvalue assumption.
Pearson correlation coefficient between MAP and is 0.99 for both Slim and EigenSim. This confirms that eigenvalues are strong predictors of the quality of recommendations for any item-based algorithms. For comparison, on the same dataset, correlation between MAP and profile length is only 0.78 and 0.80 for Slim and EigenSim, respectively. A similar behaviour holds for Netflix.
Figure 2 plots, for each user in the Movielens 10M dataset, the eigenvalue associated with the user, as computed with EigenSim, and the number of ratings in his/her profile. The figures show that there is not a strong correlation between and the number of ratings in the user profile. For instance, users with a profile length of 50 ratings have eigenvalues ranging from 0.3 (very low quality of recommendations) up to 1.0 (average quality). The Pearson correlation coefficient between and profile length is 0.76. The correlation between quality of recommendations and eigenvalues is much stronger than the correlation between quality of recommendations and number of ratings in the user’s profile.
These results are not limited to item-based methods but apply also to some model-based matrix factorization methods, such as AsySVD and PureSVDas, thanks to folding-in, matrix factorization models are equivalent to the FISM item-based method (Cremonesi et al., 2010).
We have shown that an ideal item-based method can be formulated as an eigenvalue problem. We show that the magnitude of the eigenvalue is strongly correlated to the accuracy of recommendations for that user and therefore it can provide a reliable measure of confidence. Ongoing work is focused on providing a more in-depth theoretical analysis of the eigenvalue analogy and its extension to user-based methods, as well as to validate these results with more datasets and compare against other proposed confidence measures.
- Adomavicius et al. (2007) Gediminas Adomavicius, Sreeharsha Kamireddy, and YoungOk Kwon. 2007. Towards more confident recommendations: Improving recommender systems using filtering approach based on rating variance. In 17th Workshop on Information Technologies and Systems, WITS 2007.
- Cremonesi et al. (2012) Paolo Cremonesi, Franca Garzottto, and Roberto Turrin. 2012. User Effort vs. Accuracy in Rating-based Elicitation. In Proceedings of the Sixth ACM Conference on Recommender Systems (RecSys ’12). 27–34.
- Cremonesi et al. (2010) Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 39–46.