An Adaptive Matrix Factorization Approach for Personalized Recommender Systems

07/26/2016 ∙ by Gianna M. Del Corso, et al. ∙ 0

Given a set U of users and a set of items I, a dataset of recommendations can be viewed as a sparse rectangular matrix A of size |U|× |I| such that a_u,i contains the rating the user u assigns to item i, a_u,i=? if the user u has not rated the item i. The goal of a recommender system is to predict replacements to the missing observations ? in A in order to make personalized recommendations meeting the user's tastes. A promising approach is the one based on the low-rank nonnegative matrix factorization of A where items and users are represented in terms of a few vectors. These vector can be used for estimating the missing evaluations and to produce new recommendations. In this paper we propose an algorithm based on the nonnegative matrix Factorization approach for predicting the missing entries. Numerical test have been performed to estimate the accuracy in predicting the missing entries and in the recommendations provided and we have compared our technique with others in the literature. We have tested the algorithm on the MovieLens databases containing ratings of users on movies.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Consumers are literally submerged by large selections of products and choices. An important challenge is to help users to find the most appropriated products that meet their needs and tastes. Most retailers are interested in reliable recommendation systems that, analyzing the user behaviors and interests, produce personalized recommendations. Content filtering and collaborative filtering are two alternative approaches to this interesting problem. The content-based filtering approaches try to recommend items that are similar to those that a user liked in the past [16], whereas systems designed according to the collaborative filtering paradigm identify users whose preferences are similar to those of the given user and recommend items they have liked [22].

The content-based filtering algorithms analyze the items previously rated by a user and profile the user’s interests based on the features of the items previously rated by that user. When compared to collaborative filtering recommendation systems, this approach apparently has several interesting advantages. For example, it is based only on the previous ratings of that user and is independent of the ratings of the other users. Also, the mechanism is transparent to the user and it does not suffer from the first-rater problem, that is, an item can be recommended also if it has not yet been rated by any user. Nonetheless, content-based filtering has drawbacks not shared by collaborative filtering. For example it can use only a limited number of characteristics associated to items and often need some domain knowledge to discriminate items the user may like or dislike. Another shortcoming is the lack of novelty in the recommended items since the system searches only among items with share common features with those already liked by the user. Content-based recommended systems moreover require to collect enough ratings before starting to suggest items to a new user, while in collaborative filtering users can immediately start receiving recommendations.

In collaborative filtering the major task is to establish reliable similarity metrics for finding similar users based on their ratings. Classic methods include neighborhood methods [2]

which predict ratings by referring to users whose ratings are similar to the queried user. The underlined assumption is that if two users agree in the rating of some items, probably they would agree also in the rating of the remaining items 

[23]. Many approaches have been proposed to find the neighborhood of a user, some exploiting simple similarity or correlation metrics, others employing more effective techniques such as those based on clustering  [21, 26]

or Bayesian classifiers 

[17].

A class of successful collaborative filtering models are based on the so called Latent Factor Models. These models try to view the expressed ratings as characterized by a low number of factors inferred from the rating patterns to reduce the dimension of the spaces of the users and of the items. Most of the realization of latent factor models are based on low-rank matrix factorization of the rating matrix where items and users are represented in terms of a few vectors. The model is trained using the available data and later used to predict ratings for new items.

In this paper we propose and analyze a new method based on the Nonnegative Matrix Factorization of the rating matrix. The algorithm employs an alternating descent iteration schema with thresholding to minimize a non-convex cost function and compute a low-rank approximation for the rating matrix.

We performed numerical tests to estimate the accuracy in predicting the missing entries and we compared our approach with other techniques in the literature. For the experiments we used the 100K, 1M, and 10M MovieLens databases containing ratings on movies on a scale from 1 to 5. We divided the ratings into training (80%) and test set (20%) and we computed the standard error measures such of Precision, Recall, Mean average error, and 0-1-Loss.

The experiments show that our algorithm improves the accuracy in terms of 0-1-Loss and obtains values of Precision and Recall up to 79%, showing that our approach is effective in predicting recommendable items.

The paper is organized as follows: in Section 2 we give some preliminary concepts and we introduce the Alternating Nonnegative Least Square problem whose scheme will be used for designing our recommender system presented in Section 3. Section 4 describes the experimental results, and Section 5 contains some conclusions.

2 Preliminaries

Consider a set of Users and a set of Items . Let be the set of possible votes that a user can assign to an item. Define the set of possible votes plus the value ? which corresponds to the undefined or missing evaluation of an item by an user. Let be the Utility Matrix, and assume that we only observe a subset of entries , where each entry with represents the vote (or rating) that user assigns to item . The goal of recommendation system is to predict replacements to the missing values in the utility matrix in order to make personalized recommendations to a particular user, that is to estimate the entries of for .

A promising approach is the one based on the low-rank nonnegative matrix factorization of . As we will see in next section, our problem can be viewed as a modification of the classical Nonnegative Matrix Factorization (NMF) problem that can be formulated as the problem, given a natural number , of finding that satisfy

(1)

where the norm involved is the Frobenius norm defined as

The problem (1) is non-convex, hence it might have many local minima; however is convex in either one of the two matrices,

and can be solved with the common techniques employed for convex optimization. Many numerical algorithms have been proposed after the seminal paper by Lee and Seung [14]; the vast majority of them employ an iterative schema where the convergence is proven to local minima. Alternating Nonnegative Least Square (ANLS) [15, 8, 10] is a very successful class of algorithms that has been employed for the clustering problem. It has the advantage to be rather fast and to require significantly less efforts than other NMF techniques. The ANLS algorithm can be described as follows

Procedure ANLS Input: ,  repeat        until stopping condition Output: ,

It can be proved that every limit point generated from the ANLS framework is a stationary point for the non-convex original problem [11]. To solve the least square problems in Procedure ANLS, one can use one of the many methods developed such as the Active-set method [13, 9, 1], the projected gradient method [15], or the projected quasi-Newton method [8], or the greedy coordinate descent method [7, 3].

3 Recommendation systems: a Non-Negative Matrix formulation

In the context of recommendation systems, the non-negative matrix formulation (1) should be slightly modified to take into account the missing votes in the utility matrix and the additional constraints that the entries of should be integers in the range . In fact, we have to predict the unknown values of outside , and for this reason we seek for and such that approximates the utility matrix on the entries with indices in . We can reformulate our problem as follows. Denote by the projection operator that only retains the entries of a matrix that lie in the set

Given an integer , we look for and such that

(2)

Projecting the residual with operator allow to construct two matrices and such that is close to the values of the utility matrix on the entries containing a vote meanwhile allowing to assign the missing entries. The underlying assumption of this approach is that there exists a latent factor space of size such that user-item interactions are modeled as inner products in that reduced space. User is represented by the vector , i.e., -th row of matrix , while , the -th row of , is associated to item .

Many attempts [12, 19, 24] have been performed in this direction, most of them add regularization parameters to avoid the overfitting of the data. Our approach is different: we employ an adaptive scheme where we update the values of the missing entries with the values of the current matrix , and moreover we realize the regularization cutting the values of the reconstructed rating matrix to integers with values in . Despite we are not able to prove the convergence of our method, we have that if a sufficiently large value of is chosen, the output of our algorithm converges on the expressed ratings in on , while the entries which replaces the wildcards ? are perfectly compatible with the latent factor model.

Denote by the integer closest to the scalar , and extend the definition to matrices in such a way denotes the matrix whit entries . Given a matrix , and a positive integer define the matrix such that . To solve problem (2) we can proceed as in the ANLS algorithm alternating a step of minimization respect to and one respect to in a iterative process. When looking for a solution of (2

) we have some degree of freedom since the vectors

are not determined in all their components. In particular, since our minimization problem takes in consideration only the nonzero structure of , we can replace the wildcards character in with values between and and proceed as described in algorithm CutNMF. At the beginning we assume that the matrix is the utility matrix where the character ? has been replaced with zero, i.e., .

We generate an instance of the problem that is matrices as well as a matrix . At each step a new set of matrices is computed solving two Least Squares problems and the product is computed and used to construct a new . As local stopping conditions we use a control over two error estimates, the mean Frobenius error (mFE) and the maximum integer error (MIE)

The procedure CutNMF takes as input the matrix , and the set of the pairs of nonzeros in , and performs steps for updating and . After a comparative study, we chose to solve the two minimization problems using the greedy coordinate descent method described in [3]. At each step we update the error estimate MIE and mFE. The cycle is repeated at most times unless a either the matrix has been perfectly reconstructed by the approximation (MIE=0), or the mFE is not decreasing anymore.

Procedure CutNMF Input:    ,    flag=TRUE;  ;  ;  while(   ;            set   set           endwhile Output:

From the output matrix we can obtained the matrix of the possible recommendations as . The values mFE measure the adherence of the data to the latent factor model and for a sufficiently large we should experience mFE going to zero.

4 Experimental Results

We performed a number of experiments aimed at analyzing the convergence of our method and studying the accuracy of the predictions of our recommendation method. For our experiments we used the MovieLens 100K, MovieLens 1M and MovieLens 10M databases containing ratings from users to movies on a scale from 1 to 5. We used also a synthetic matrix generated starting from two random rank- matrices and .

We performed two classes of experiments, one for analyzing the convergence of our iterative schema and another to study the accuracy in the prediction and in the recommendations provided. Let us denote by the matrix returned by procedure CutNMF(). In addition to the error measures MIE e mFE introduced in Section 3 we evaluated the algorithm also on others measures [6]. Let be a subset of the observed entries of the utility matrix, i.e., , we define

  • the Mean Absolute Error (MAE),

    (3)
  • and the 0-1 Loss

    (4)

    where

Taking we get the mean error of the reconstructed values . Taking , we measure the error on the set of values larger than , i.e., on the set of suitable recommendations. This error is referred in the literature [6] as the Constrained Mean Absolute Error and denoted as CMAE.

The 0-1 Loss measure on counts the number of mismatches between the rating matrix and the matrix on the recommendable items, i.e., those with a rating of 4 or 5, that is the number of recommended items (those with ) that should not be recommended or not recommended ( ) despite . A low value of 0-1 Loss indicates that the algorithm returns almost always correct recommendations.

To measure the quality of the overall recommender system we evaluate also precision and recall. Precision is usually defined as the fraction of items correctly recommended over the number of recommended items, while Recall is the fraction of items correctly recommended over the number of items that should be recommended. Different authors use slightly different definitions for these two metrics [5, 20], in our case, since the interest is to do a targeted recommender system, we consider the set of relevant recommendations for each user as the set of positive ratings by users to items, i.e., , and with the set of positive predicted ratings. We have

4.1 Convergence

We studied the convergence of our method by choosing sufficiently large values of – the rank of the two factors and – and by analyzing how the error measures decrease. Let the set of observed ratings in one of the MovieLens datasets. For a sufficiently large value the measures mFE, MIE as well as , and 0- converge to zero meaning that, applying CutNMF to matrix , all the nonzeros of the rating matrix are reconstructed. Regarding our experimentation we should note that the cost of the iterations grows linearly with , and that for fairly small values of the error mFE tend to stabilize after a rather limited number of iterations.

The purpose of this kind of experiments is twofold: firstly we want to test the correctness of our iterative procedure, and secondly we use it to confirm that the latent factor model is adequate for designing recommender systems.

Figure 1: Behavior of mFE and on the MovieLens 100K dataset
Figure 2: Trend of Precision (dashed lines) and Recall (solid) for different values of on the MovieLens 1M dataset. Since the cardinality of does not change during the iterations, the value of Precision increases in a more regular way. This is due to the increase of the set of recommendations .

In Figure 1 it is shown, for , the convergence of the mFE and measures on the MovieLens 100K dataset. We use a tolerance of and . We observe that for larger values of we have convergence to zero while for moderately small values of the two metrics stagnates and we do not have any more gain in performing more iterations. In Figure 2 and are plotted for different values of for the MovieLens 1M dataset. We see that for larger the values of Recall and Precision reach .

To study the convergence and verify that the hidden structure of the data is captured by the latent factor model, we have also analyzed the behavior of the proposed method on a synthetic utility matrix built as the product of two rank-20 matrices and . The matrix obtained is perfectly adherent to the latent factor model in which we have a set of 20 features describing preferences of users on items. The matrix constructed has size and the set is composed of 500K values uniformly sampled among the 5M of entries of the matrix, meaning that of the entries are set equal to the wildcard character . We verified that our method is able to capture the rank structure of the synthetic rank-20 matrix and that the error measures have a trend similar to that observed in the real dataset.

Matrix k iter 0-
15 84,430 0.265 0.113 88.26 88.68
Synthetic 20 65,590 0.214 0.028 96.62 97.60
25 51,480 0.199 0.022 97.41 98.09
Table 1: For different values of we obtain very good values of 0-1 Loss and Precision and Recall.

In Table 1 we report the errors obtained applying procedure cutNMF to . We tested the accuracy obtained with the “special” value of together with and . Since the problem is underdetermined we expect that the larger the the better the fit of the values in . With Table 1 we confirm that this is indeed the case: even if can be totally reconstructed using rank 20 matrix our algorithm misses to find the global minimum.

Matrix k mFE 0-
6 0.594 0.602 0.220 79.79 80.78
10 0.500 0.551 0.1925 82.26 83.18
MovieLens 100K 50 0.133 0.269 0.055 94.93 95.10
100 0.023 0.103 0.002 99.78 99.83
150 0.004 0.040 0 100 100
15 0.540 0.575 0.203 81.80 83.27
25 0.470 0.536 0.181 83.81 84.86
MovieLens 1M 100 0.226 0.357 0.093 91.89 91.93
200 0.109 0.234 0.041 96.46 96.33
300 0.059 0.163 0.018 98.56 98.37
Table 2: The different measures of the error in approximating the solution of the minimization problem. By increasing the value of we get a better approximation of the data and all the error measures tend to zero. The value of Precision and Recall are very close to 100. For we achieved the value 0 in 0-1 Loss and the 100% of Precision and Recall meaning that all the “significative” values of the matrix have been reconstructed by the algorithm.

In Table 2 we report the values of the errors obtained for different values of the parameter on the MovieLens datasets. We note that all the error measures decrease for an increasing . We note that Precision and Recall are very high, meaning that the more significant ratings, i.e., those with a vote higher or equal to 4, are well captured by our model. On the 100K dataset, we note that we get a value of 0-1 Loss equal to zero, meaning that we never have a mismatch in the reconstruction of important ratings.

4.2 Accuracy of the recommender system

This section is devoted to the study of accuracy of our method in predicting recommendations. To evaluate the ability of our approach to recommend items and to test the quality of recommendations we split our data among training and test sets. In particular the set is partitioned in two disjoint sets, the Training Set containing the of the couples uniformly sampled in and the set forming the Test Set. In the following the values reported of the errors are based on the run of cutNMF procedure with input parameters .

Matrix k 0-
6 0.6253 0.2550 77.04 79.27
MovieLens 1M 10 0.5940 0.2555 77.21 78.82
15 0.5642 0.2605 77.05 77.89
6 0.6093 0.2630 74.56 72.69
MovieLens 10M 10 0.5937 0.2612 74.95 72.48
15 0.5864 0.2633 75.20 71.70
Table 3: Predictions of recommendations. We obtain good values of 0-1 Loss and Precision and Recall meaning that the methods is good to predict liked recommendations.

To evaluate the performance of our algorithm in reconstructing the entries of in the Test set , we computed our metrics over the set to see if our algorithm is able to predict the ratings in the Test set.

MovieLens 1M MovieLens 10M
Method 0- CMAE 0-
KNN Pearson correlation 0.823 0.721 0.423 0.842 0.743 0.434
NMF [14] 1.243 1.106 0.463 1.356 1.234 0.487
Regularized NMF (rNMF) [12] 0.684 0.574 0.384 0.698 0.586 0.396
Probabilistic NMF (pNMF)[6] 0.664 0.526 0.270 0.676 0.542 0.284
cutNMF 0.675 0.659 0.255 0.639 0.654 0.263
Table 4: Comparison between different NMF methods. The values reported in the first four rows are taken from [6]. We see that our method performs well for the 0-1 Loss and MAE measures. In this table are reported the values obtained with .

As seen in Section 4.1 for a sufficiently large the matrix can be completely reconstructed in the observed values, that is . In the following set of experiments however, we keep a fairly small value of and perform only a small number of iterations. In fact, the algorithm learns from the data in but its performance is evaluated on the test set . We have experienced that taking a larger value of is not a good option since the model tend to overfit the data loosing in the quality of recommendations. As observed by Gills [4] the choice of the adequate is rather tricky. Among the most popular approaches we find the trial and error, the estimation using for example SVD, and the use of other techniques [18, 25]. We tried several values for and as we can see in Table 3, the best values are achieved assuming only few latent factors are present. Working with small values of is convenient also from a computational point of view, the time required by the algorithm being linear in . We compared our method with the standard NMF proposed in [14], with one of the regularized version [12] and with the Probabilistic NMF in [6]. In order to compare our data with those obtained with completely other approaches we report also the values of the errors obtained by the KNN algorithm [2] using Pearson correlation as similarity measure between users. From Table 4 we see that our method achieves the lowest value of 0-1 Loss that in our opinion is the most significant metric among the measures we monitored. In Figure 3 are plotted the values of versus 0-. We see that cutNMF is the one with the smallest distance from the origin.

Figure 3: Plot of vs 0- for comparing the different algorithms. We see that our algorithm achieves a good tradeoff between these two measures.

5 Conclusions

In this paper we proposed an effective algorithm for personalized recommendations. In particular the algorithm tries to find latent factors in the data, factorizing the rating matrix as the product of two nonnegative low rank matrices. The latent factors are found minimizing a non convex function involving the known ratings, and personalized recommendations are provided for the missing votes as the inner product of rows and columns of the matrices of the factorization. A regularizing strategy has been introduced into the method in order to avoid overfitting of the data. The experimentation on the MovieLens datasets shows the good performance of our method both in the converging properties and in the accuracy of the predictions. Latent factor approaches as the one suggested in this paper try to recommend to users only items closed to his usual taste, other techniques such as the Neighborhood approaches are on the contrary able to capture local association of the data. We plan to investigate how we can integrate the two approaches to get better and more targeted recommendations for the users.

References

  • [1] R. Bro and S. De Jong. A fast non-negative-constrained least square algorithm. Journal of Chemometrics, 11:393–401, 1997.
  • [2] C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation methods. In Recommender Systems Handbook, F. Ricci and L Rokach and B. Shapira and B. P. Kantor Eds., pages 107–144. Springer US, 2011.
  • [3] P. Favati, G. Lotti, O. Menchi, and F. Romani. Adaptive symmetric NMF for graph clustering. Technical report, Consiglio Nazionale delle Ricerche, IIT, 2016.
  • [4] N. Gillis. The Why and How of Nonnegative Matrix Factorization. In M. Signoretto JAK Suykens and A. Argyriou, editors,

    Regularization, Optimization, Kernels, and Support Vector Machines

    , pages 257–291. Chapman & Hall/CRC, 2014.
  • [5] J. Herlocker, J. A. Konstan, and L. G. Terveen andJ. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5–53, 2004.
  • [6] A. Hernando, J. Bobadilla, and F. Ortega. A non-negative matrix factorization for collaborative filtering recommender systems based on a bayesian probabilistic model. Knowledge-Based Systems, 97:188–202, 2016.
  • [7] C. J. Hsieh and I. S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mininig, pages 1064–1072, 2011.
  • [8] D. Kim, S. Sra, and I.S. Dhillon. Fast Newton-type methods for the least squares nonnegative matrix approximation problem. In SIAM International Conference in Data Mining, 2007.
  • [9] H. Kim and H. Park. Sparse non-negative matrix factorizations via alternating non-negativity- constrained least squares for microarray data analysis. Bioinformatics, 23:1495–1502, 2007.
  • [10] H. Kim and H. Park. Nonnegative matrix factorization based on alternating non-negativity-constrained least squares and the active set method. SIAM J. Matrix Anal and Appl., 30(2):713–730, 2008.
  • [11] H. Kim and H. Park. Fast Non-negative Matrix Factorization: an active-set-like method and comparisons. SIAM J. Sci. Comp., 33:3261–3281, 2011.
  • [12] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
  • [13] C. L. Lawson and R.J. Hanson. Solving Least Squares Problems. SIAM, 1995.
  • [14] D. Lee and H. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999.
  • [15] C.-J. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19:2756–2779, 2007.
  • [16] P. Lops, M. Gemmis, and G. Semeraro. Content-based recommender systems: State of the art and trends. In Recommender Systems Handbook, F. Ricci and L Rokach and B. Shapira and B. P. Kantor Eds., pages 73–105. Springer US, 2011.
  • [17] K. Miyahara and M. J. Pazzani. Collaborative filtering with the simple Bayesian classifier. In

    Proceedings of the 6th Pacific Rim International Conference on Artificial Intelligence

    , PRICAI’00, pages 679–689, 2000.
  • [18] J. M. P Nascimento and J. M. Bioucas-Dias. Hyperspectral signal subspace estimation. In IEEE International Geoscience and Remote Sensing Symposium, pages 3225–3228, 2007.
  • [19] R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems, volume 20, 2008.
  • [20] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Analysis of recommendation algorithms for e-commerce. In Proceedings of the 2Nd ACM Conference on Electronic Commerce, pages 158–167, 2000.
  • [21] B. M. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Recommender systems for large-scale E-Commerce: Scalable neighborhood formation using clustering. In 5th International Conference on Computer Information Technology (ICCIT), 2002.
  • [22] M. Balabanovićand Y. Shoham. Fab: Content-based, collaborative recommendation. Commun. ACM, 40(3):66–72, 1997.
  • [23] M. Sun, G. Lebanon, and P. Kidwell. Estimating probabilities in recommendation systems. Journal of the Royal Statistical Society: Series C (Applied Statistics), 61:471–492, 2012.
  • [24] G. Takács, I. Pilászy, B. Németh, and D.Tikk. Major components of the gravity recommendation system. SIGKDD Explor. Newsl., 9(2):80–83, 2007.
  • [25] V. Y. F. Tan and C. Fevotte. Automatic relevance determination in nonnegative matrix factorization with the /spl beta/-divergence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7):1592–1605, 2013.
  • [26] G-R. Xue, C. Lin, Q. Yang, W. Xi, H-L. Zeng, Y. Yu, and Z. Chen. Scalable collaborative filtering using cluster-based smoothing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pages 114–121, 2005.