1. Reliability estimation
Itemitem Collaborative Filtering (CF) is commonly regarded as one of the most effective approaches to build Recommender Systems. Given a set of users, a set of items and a set of ratings, itembased CF techniques use available ratings to build relationship between items in the form of an item similarity matrix
.Although research on itembased algorithms is not new, the evaluation of their predictive capabilities is purely empirical and still an open issue. In most works, predictions are expected to be more accurate for items and users associated with many ratings and low variance of ratings. Adomavicius et al.
(Adomavicius et al., 2007) observe that recommendations tend to be more accurate for users and items exhibiting lower rating variance. Cremonesi et al. (Cremonesi et al., 2012) empirically show that there is some correlation between recall of recommendations and number of ratings in the user profile. A disadvantage of these approaches is that confidence estimation is based only on user and item ratings, and does not take into account the properties of the prediction model, which could overlook some valuable information.We show that, under the hypothesis of a perfect model (i.e., an oracle able to make perfect recommendations), itembased methods are analogous to an eigenvalue problem
, where each user profile (i.e., the vector of ratings from a user) is a left eigenvector of the similarity matrix. Moreover, each user is associated with an eigenvalue, and the magnitude of the eigenvalue is linearly correlated with the accuracy of recommendations for that user. We call this analogy the
eigenvector analogy.2. Eigenvector Analogy
A generic itembased model (either CF, CBF or hybrid) predicts the ratings of a user for all items as:
(1) 
where is the profile of the user and is a similarity matrix. We now assume to have an ideal itembased recommender, able to predict all the user ratings (both known and unknown). More formally, this translates into two assumptions:
 Assumption 1: Perfect Predictions.:

Estimated ratings are identical to the ratings in the user profile . Because of this assumption, we can write .
 Assumption 2: Perfect Knowledge.:

We have the complete knowledge of all the ratings in the userrating matrix.
Under these assumptions, the itembased model described by (1) is analogous to a left eigenvector problem
(2) 
where is a left eigenvector of matrix and is the corresponding eigenvalue. In the eigenvector formulation (2), predicted ratings are equivalent to the ratings in the user profile multiplied by the eigenvalue associated with the user. For each user, the corresponding eigenvalue can either flatten out or amplify the differences between predicted ratings. The closer is an eigenvalue to zero, the more difficult will be for the itembased method to correctly rank items and to distinguish between relevant and nonrelevant items.
2.1. Learning the eigenvalues
In order to provide an estimation of the user’s eigenvalue, we rewrite (2) in matrix format
(3) 
where is the userrating matrix, is the similarity matrix and is a diagonal matrix with eigenvalues on its diagonal. In order to satisfy the model described by (3), we need to find a similarity matrix
, other than the identity matrix, such that all the user profiles
in are left eigenvectors of . This problem is called the inverse eigenvalue problem which, if we also know all the eigenvalues, has the following exact solution: . Whereis the MoorePenrose pseudoinverse, which uses singularvalue decomposition. We call this model
EigenSim. The eigenvalues are then estimated via SGD, optimizing BPR.3. Datasets
We evaluated the performance of EigenSim on two different datasets, namely Movielens 10M (70K users, 10M interactions) and subsample of Netflix (40K users, 1.25M interactions), the dataset used for the Netflix Prize. The datasets are split by selecting randomly ratings into training (60%), validation (20%) and test (20%) set.
4. Results discussion
The main focus of our experiment is to study if there is a correlation between the eigenvalue associated with a user and the quality of recommendations it receives, measured in terms of MAP. We also investigate if this correlation depends on the specific algorithm used for recommendations. Moreover, as previous work suggested, we study if there is correlation between the user’s number of ratings and his/her eigenvalue.
Figure 1 presents the performance of EigenSim and Slim on the Movielens 10M dataset. The 70 thousands users of the dataset have been ranked based on their eigenvalue and partitioned into 10 groups of 7K users each. Each point in the figure represents the MAP for each group. The corresponding eigenvalue is the median of the eigenvalues of the users in the group. The figure clearly shows a linear correlation between the quality of recommendations and the eigenvalue of the users. When providing recommendations to users with very low eigenvalues the quality of recommendations drops to zero, regardless of the algorithm, while for users with large eigenvalues the quality of recommendations increases for both algorithms. In terms of MAP, the quality of ItemKNN recommendations for users with large eigenvalues () is almost ten times the quality for users with low eigenvalues () and twice the average quality across all the users. It is interesting to observe that eigenvalues affect the quality of recommendations also for an itembased algorithm (Slim) not based on the eigenvalue assumption.
Pearson correlation coefficient between MAP and is 0.99 for both Slim and EigenSim. This confirms that eigenvalues are strong predictors of the quality of recommendations for any itembased algorithms. For comparison, on the same dataset, correlation between MAP and profile length is only 0.78 and 0.80 for Slim and EigenSim, respectively. A similar behaviour holds for Netflix.
Figure 2 plots, for each user in the Movielens 10M dataset, the eigenvalue associated with the user, as computed with EigenSim, and the number of ratings in his/her profile. The figures show that there is not a strong correlation between and the number of ratings in the user profile. For instance, users with a profile length of 50 ratings have eigenvalues ranging from 0.3 (very low quality of recommendations) up to 1.0 (average quality). The Pearson correlation coefficient between and profile length is 0.76. The correlation between quality of recommendations and eigenvalues is much stronger than the correlation between quality of recommendations and number of ratings in the user’s profile.
These results are not limited to itembased methods but apply also to some modelbased matrix factorization methods, such as AsySVD and PureSVDas, thanks to foldingin, matrix factorization models are equivalent to the FISM itembased method (Cremonesi et al., 2010).
5. Conclusion
We have shown that an ideal itembased method can be formulated as an eigenvalue problem. We show that the magnitude of the eigenvalue is strongly correlated to the accuracy of recommendations for that user and therefore it can provide a reliable measure of confidence. Ongoing work is focused on providing a more indepth theoretical analysis of the eigenvalue analogy and its extension to userbased methods, as well as to validate these results with more datasets and compare against other proposed confidence measures.
References
 (1)
 Adomavicius et al. (2007) Gediminas Adomavicius, Sreeharsha Kamireddy, and YoungOk Kwon. 2007. Towards more confident recommendations: Improving recommender systems using filtering approach based on rating variance. In 17th Workshop on Information Technologies and Systems, WITS 2007.
 Cremonesi et al. (2012) Paolo Cremonesi, Franca Garzottto, and Roberto Turrin. 2012. User Effort vs. Accuracy in Ratingbased Elicitation. In Proceedings of the Sixth ACM Conference on Recommender Systems (RecSys ’12). 27–34.
 Cremonesi et al. (2010) Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on topn recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 39–46.
Comments
There are no comments yet.