Optimization Matrix Factorization Recommendation Algorithm Based on Rating Centrality

06/20/2018 ∙ by Zhipeng Wu, et al. ∙ 0

Matrix factorization (MF) is extensively used to mine the user preference from explicit ratings in recommender systems. However, the reliability of explicit ratings is not always consistent, because many factors may affect the user's final evaluation on an item, including commercial advertising and a friend's recommendation. Therefore, mining the reliable ratings of user is critical to further improve the performance of the recommender system. In this work, we analyze the deviation degree of each rating in overall rating distribution of user and item, and propose the notion of user-based rating centrality and item-based rating centrality, respectively. Moreover, based on the rating centrality, we measure the reliability of each user rating and provide an optimized matrix factorization recommendation algorithm. Experimental results on two popular recommendation datasets reveal that our method gets better performance compared with other matrix factorization recommendation algorithms, especially on sparse datasets.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, the recommender system greatly promotes the development of e-commerce. A high-quality recommender system can help users quickly find what they like when they facing massive amounts of goods [23], mitigate the problem of information overload [20], and bring more economic benefits for sellers [4]. Consequently, in order to get more accurate recommendation results, many researchers have proposed various recommendation algorithms.

Among all the recommendation algorithms, collaborative filtering (CF) is a relatively simple and effective method to generate a list of recommendations for a target user [1]. One of the most popular methods of CF is user-based CF, it aims to find some users who have behavior records (such as commodity comment and browsing history) similar to the target user, and then recommend him those items that the similar users select [3]. Therefore, researchers have proposed a number of similarity calculation methods to find the similar users [8, 15, 19], but the performance is barely satisfactory in the case of highly sparse data [10].

Because of obtaining significant performance in the Netflix Prize competition, the model-based CF approaches gain remarkable development in recommender system due to their high accuracy and scalability [12, 16], and matrix factorization model is the most representative one of them. It factorizes user-item rating matrix into two low rank matrices, the user-factor matrix and the item-factor matrix. Therefore, the original sparse rating-matrix can be filled by multiplication of two factor matrices. Inspired by the matrix factorization model in the competition, researchers have developed many improved algorithms successively: Funk [5] presented the regularized matrix factorization to solve the Netflix challenge and achieved a good result; Sarwar [22] proposed incremental matrix factorization algorithm to make the recommender system highly scalable; Paterek [18] added user and item biases to matrix factorization for mining the interaction between user and item more accurately; Koren [11] integrated additional information sources in matrix factorization model, but the time complexity was very high; Hu [9] proposed the concept of confidence level to measure user preferences in matrix factorization recommendation algorithm, which was only applicable to implicit feedbacks; He [7] pointed the missing data should be weighted by item popularity and provided the fast matrix factorization model; Meng [17] proposed weight-based matrix factorization and employed term frequency and inverse document frequency to find user’s interests, but the method was only suitable for text data.

However, above methods fail to consider each explicit rating’s reliability of user. In general, users have their own tastes and opinions on an item. Although the explicit rating is made by user, not all ratings should be given the same weight [12]. For example, some users prefer to give items high scores, leading to their average scores much higher than the overall mean value. In contrast, other users only rate the favorite items and tend to give lower scores on other items [2]. In this situation, the preferences of the two types of users are distinctly different, if they give the same item same score, the reliability of the two scores should be carefully evaluated.

In our work, instead of using explicit ratings directly, we explore the reliability of each observed rating under limited user information. Firstly, we analyze the degree of deviation between each rating and the average score of user, propose the notion of user-based rating centrality. Similarly, according to the degree of deviation between each rating and the item average score, we define the item-based rating centrality. Then we combine two kinds of rating centrality to infer the reliability of a rating. Furthermore, we provide an optimized matrix factorization algorithm based on the above analysis. Finally, we use stochastic gradient descent (SGD) learning algorithm to solve the optimization problem of the objective function. Several experiments are conducted on two classic recommendation datasets, and our method obtains better performance than other popular matrix factorization recommendation algorithms, indicating that it is feasible to mine the reliability of explicit ratings based on rating centrality.

The rest of the paper is organized as follows. Section 2 simply describes the matrix factorization recommendation algorithm. Our proposed approach which defines the rating centrality is introduced in detail in Section 3. In Section 4, we present the datasets and evaluation metrics in our experiments, and then analyze the experimental results. Finally, we draw the conclusion in Section 5.

2 Preliminaries

In this section, we first expatiate the problem discussed in this paper, and then give a brief introduction to traditional matrix factorization recommendation algorithms.

2.1 Problem Definition

In a general recommender system, we usually have users, items and the sparse user-item rating matrix . The each observed value of R denotes the user ’s rating on item , and represents the predicted value of the user on item . Given the interaction matrix R of the user and item, the goal of the recommender system is to get the predicted values of items that the target user might interest.

2.2 Matrix Factorization

Matrix factorization algorithms have been extensively used to mine the interaction between user and item [12]. Funk [5] points that the user-item rating matrix R can be decomposed into two low rank matrices, the user-factor matrix and the item-factor matrix:


where denotes user latent factor matrix, denotes item latent factor matrix and the parameter is the number of latent factors, in general, . Therefore, the predicted value can be calculated as:


where is the -th column of P and is the -th column of Q

. We can minimize the regularized squared error loss function

to get latent factor matrices:


where is the parameter of regularization term that is to avoid over fitting, denotes the Frobenius norm and is the training set of the (user, item) pairs. On the basis of this model, many improved matrix factorization algorithms have been proposed, for example, biased probabilistic matrix factorization [18], weighted regularization matrix factorization [25], coupled item-based matrix factorization [14], etc.

3 Proposed Method

In this section, we introduce the notion of rating centrality from the perspective of user and item respectively, which can be obtained easily even for sparse data. Based on the rating centrality, we present a strategy to compute the reliability of each rating and propose the optimized matrix factorization recommendation algorithm for further improving the accuracy of recommendation results.

3.1 Notion

The user-based rating centrality refers to the deviation degree between the user rating and the average score of user. Even if two users have the same score on the same item, the user-based rating centrality of the two users may be totally different. For example, user only rates whatever he likes, he will have a high average score, however, another user tends to give items negative scores, consequently, the average score of the user is relatively low. It is obvious that the preferences of the two users are completely different. In this case, if user gives item a high score and user gives the same item same score , we can’t regard the two ratings have the same reliability, because user tends to give positive ratings and the average score of him is higher than user . Therefore, we define the user-based rating centrality to measure the reliability from user perspective:


where is the average score of user and is the maximum value of the rating scale. Because and may be very close, to avoid the value of too large, we limit the max value of to .

Moreover, the user-based rating centrality is just calculated from user perspective, if the quality of item is really good and user whose average score is low gives it a high rating, we should also suppose the rating has high reliability because the rating is consistence with item popularity. More exactly, if most users have preferences for the item and give it high scores, then the item will have a high average score. On the contrary, if the quality of item is poor, it will get plenty of negative feedbacks from the majority of users. Obviously, the characteristics of the two items are totally different. In this case, if user gives item and same high score or same low score respectively, we should also consider the rating reliability from item perspective. Consequently, we define the item-based rating centrality to measure the deviation degree between the user rating and the average score of item:


where is the average value of the item . Similarly, we limit the max value of to . In practical calculation, we can add a minimum value on the denominator in (4) and (5) to avoid denominator equals to zero, .

After obtaining the user-based and item-based rating centrality, we present a strategy to measure the reliability of a rating. If both and are small values, that means the rating deviates from the overall distribution of user and item , therefore, we suppose the rating has relatively low reliability. However, if we get high values of and , we will consider the rating reflects the real evaluation of user and item , and give it a high weight. Hence, we can get the reliability of a rating from the following formula:


where is a monotone increasing function that normalize the reliability. The bias is to avoid and maintain the data integrity. We will use three kinds of : , , , and conduct an experiment to compare the performance of them in Section 4.

3.2 Prediction

According to [11], the prediction formula for the calculation of in our method is defined as:


where is the bias of user and is the bias of item . We consider that if a rating’s reliability is low, then the influence of the rating should be reduced in training process. In other words, we pay more attention to the fitting of high reliability ratings in the process of optimizing the objective function. On this basis, we propose an optimized loss function with the weighted regularization which is to avoid the over fitting in the process of model training. The adjusted regularized squared error loss function is as follows:


where and denote the average reliability of user ’s ratings and item ’s ratings, respectively.

3.3 Optimization

In order to solve the problem of minimizing the loss function (8), we use SGD to learn the model parameters due to its high efficiency. First, for each observed rating , we can get the the prediction error :


Then we compute each parameter’s partial derivative to get the direction of the gradient and next modify the parameters until the convergence is realized:


where is the learning rate. Our method is an improved matrix factorization algorithm based on rating centrality, so we call our method MFRC, and the specific algorithm flow is shown in Algorithm 1.

Input: rating matrix R; reliability of each rating ; number of latent factors ;
Output: latent factor matrix P and Q; biases and ; number of iterations ;

1:Randomly initialize P and Q; and are set to 0; ;
2:while  do
3:     for each observed rating in R do
4:         update by equation (10);
5:         update by equation (11);
6:         update by equation (12);
7:         update by equation (13);
8:     end for
9:     ;
10:end while
Algorithm 1 MFRC algorithm

4 Experiments

In this section, we introduce the datasets and evaluation metrics used in our experiments, and then analyze the experimental results in detail.

4.1 Datasets

MovieLens111https://grouplens.org/datasets/movielens/ dataset is one of the most prevalent datasets in recommender systems. In our experiments, we use two kinds of MovieLens datasets: MovieLens 100K and MovieLens 1M. Each rating’s range is from 1 to 5, and each user has rated at least 20 items. Table 1 shows the basic statistics of the two datasets.

Dataset Users Items Ratings Sparsity
MovieLens 100K 943 1,682 100,000 93.70%
MovieLens 1M 6,040 3,952 1,000,209 95.80%
Table 1: Statistics of Datasets

4.2 Benchmark Algorithms

In our experiments, the benchmark algorithms contain probabilistic matrix factorization (PMF) [21], biased PMF (BPMF) [11], alternating least squares with weighted regularization (ALSWR) [25] and AutoSVD [24]. They are closely relevant to our work and achieve good results in recommender systems.

4.3 Evaluation Metrics

In order to evaluate the performance of the proposed method, we use root mean squared error (RMSE) and fraction of concordant pairs (FCP) to measure the accuracy of rating prediction.

RMSE is extensively used in measuring the accuracy of prediction, it is defined as:


where is the test set of (user, item) pairs and is the set size.

Another metric is FCP. Koren [13] supposed that the correct item ranking in recommender systems should also be considered. In other words, if in test set, then this trend should be kept in prediction results. Hence, FCP is defined as:


where and denote the number of concordant pairs and discordant pairs of user respectively. Higher FCP means the more concordant pairs in test results. Therefore, we expect the recommendation algorithm has a high value of FCP when its RMSE is low.

4.4 Results and Discussion

4.4.1 Impact of Normalization Function

In this section, we compare the performance of three functions in our model. We randomly choose 80% of the original data as training set and the remaining as test set. The number of latent factors is from 20 to 100. From Fig. 1, we can clearly see that “” gets the best value of RMSE, and “” performs sightly worse than “”, while “” performs the worst on both two datasets. This shows that the reliability of each rating should be normalized to a relatively small range. If we overemphasize on highly reliable ratings, we may lose much information from the remaining ratings and result in greater data sparsity. In terms of the performance of prediction, we will use “” in the following experiments.

(a) RMSE on MovieLens 100K (b) RMSE on MovieLens 1M
Figure 1: Performance on two Datasets with different

4.4.2 Impact of Number of Latent Factors

In order to examine our method in depth, we compare our method with other benchmark algorithms under different number of latent factors ranging from 20 to 100. Similar to the previous section, we randomly choose 80% of the original data as training set and conduct each experiment for five times, then calculate the average value of RMSE and FCP. In addition, the number of iterations is set to 100, is set to 0.05 and is set to 0.005 on both two datasets.

Figure. 2 shows that with the increase of , the performances of all methods keep improving and eventually tend to be stable. On MovieLens 100K dataset, from Fig. 2(a), we can see that MFRC outperforms other benchmark algorithms on RMSE, which is at least 0.005 lower than BPMF and PMF. As for ALSWR, it is unstable with the change of . In addition, Fig. 2(b) shows that the FCP of all algorithms is up to 71%, while MFRC has reached 74.5% which is about 1% higher than the second best algorithm. This indicates that our method not only has lower prediction error, but also has more correct ranked items pairs. Similarly, on MovieLens 1M dataset which is more sparse than MovieLens 100K, the performance of our method is still the best one. Fig. 1(c) shows the RMSE of MFRC maintains a gradual decline with the increase of k and at least 0.003 lower than that of BPMF. We can see from Fig. 1(d) that the FCP of MFRC is significantly higher than that of other algorithms and is always maintained above 77.5% when , while others are lower than 77.3%. According to above analysis, we can conclude that the performance of MFRC becomes better and gradually reaches a stable state with the increase of , but obviously the computational complexity of matrix factorization is proportional to . Therefore, we should consider the balance between accuracy and efficiency according to the actual situation.

(a) RMSE on MovieLens 100K (b) FCP on MovieLens 100K (c) RMSE on MovieLens 1M (d) FCP on MovieLens 1M
Figure 2: Performance on two Datasets under different latent factors

4.4.3 Impact of Sparsity

Sparsity is one of the most important factors that affect the performance of the recommender system [6]. To further evaluate our method, we change the proportion of training set. The training ratio is set to 50%, 60%, 70% and 80%. The number of latent factors is set to 50.

Table 2 and Table 3 show the experimental results on two datasets, respectively. As we have expected, the sparsity of dataset greatly affects the performance of recommendation algorithm. Table 2 shows the results on MovieLens 100K dataset, we can see that from to , the RMSE of MFRC is always maintained at a relatively low level and the FCP of MFRC increases steadily. Even though the improvement of performance becomes smaller and smaller, MFRC performs substantially well over all other benchmark algorithms. Through the comparison between BPMF and MFRC, we can find that it’s effective to mine highly reliable ratings. From Table 3, we can see clearly that on Movielens 1M dataset, our method outperforms significantly all methods discussed here under different data sparsity and the performance of traditional methods still has certain disparity compared to MFRC. When , the RMSE of MFRC is 0.008 lower than that of BPMF and 0.0221 lower than that of PMF. Similarly, the FCP of MFRC is kept at a high proportion with the increase of . In conclusion, our method that combined with rating centrality can make significantly less prediction error and get more concordant pairs on extremely sparse datasets.

RMSE 50% 0.9751 0.9358 1.0112 0.9417 0.9270
60% 0.9557 0.9196 1.0063 0.9317 0.9163
70% 0.9412 0.9116 0.9891 0.9245 0.9067
80% 0.9304 0.9026 0.9730 0.9164 0.8983

50% 71.35% 71.61% 70.11% 70.51% 72.53%
60% 72.03% 72.37% 70.23% 71.24% 73.20%
70% 72.60% 72.90% 71.06% 71.77% 73.74%
80% 73.29% 73.74% 71.60% 72.71% 74.57%

Table 2: Performance on MovieLens 100K under different training ratio
RMSE 50% 0.8830 0.8689 0.8802 0.8753 0.8609
60% 0.8698 0.8597 0.8681 0.8653 0.8531
70% 0.8584 0.8514 0.8593 0.8560 0.8465
80% 0.8511 0.8472 0.8567 0.8484 0.8432

50% 75.33% 75.90% 75.75% 75.38% 76.67%
60% 76.05% 76.41% 76.47% 75.99% 77.08%
70% 76.66% 76.86% 76.97% 76.53% 77.44%
80% 77.01% 77.07% 77.20% 76.89% 77.56%

Table 3: Performance on MovieLens 1M under different training ratio

5 Conclusion

In this work, for getting more accurate recommendation results, we mine the reliable ratings of user from limited data, and propose an optimized matrix factorization recommendation algorithm based on rating centrality of user and item. Different from traditional matrix factorization recommendation algorithms which fail to consider the reliability of each user rating, in our method, we define the notion of user-based rating centrality and item-based rating centrality, and then combine them to measure the reliability of each rating. On this basis, we introduce the reliability into traditional matrix factorization objective function and make an optimized adjustment. Our extensive experimental results demonstrate that MFRC obtains less prediction error and more concordant pairs compared with other popular matrix factorization recommendation algorithms, especially on highly sparse datasets. We can conclude that our method based on rating centrality can find the reliable rating from user’s explicit ratings and get significant performances in recommender systems.


This work was supported by the National Natural Science Foundation of China (No.61602048) and the Fundamental Research Funds for the Central Universities(No.NST20170206).


  • [1] Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17(6), 734–749 (June 2005)
  • [2] Cacheda, F., Formoso, V.: Comparison of collaborative filtering algorithms:limitations of current techniques and proposals for scalable, high-performance recommender systems. Acm Transactions on the Web 5(1), 1–33 (2011)
  • [3] Cai, Y., f. Leung, H., Li, Q., Min, H., Tang, J., Li, J.: Typicality-based collaborative filtering recommendation. IEEE Transactions on Knowledge and Data Engineering 26(3), 766–779 (March 2014)
  • [4] Chen, L., Gemmis, M.D., Felfernig, A., Lops, P., Ricci, F., Semeraro, G.: Human decision making and recommender systems. Acm Transactions on Interactive Intelligent Systems 3(3), 1–7 (2013)
  • [5] Funk, S.: Netflix update: Try this at home. http://sifter.org/ simon/journal/20061211.html (2006)
  • [6] Grčar, M., Mladenič, D., Fortuna, B., Grobelnik, M.: Data Sparsity Issues in the Collaborative Filtering Framework. Springer Berlin Heidelberg (2006)
  • [7] He, X., Zhang, H., Kan, M.Y., Chua, T.S.: Fast matrix factorization for online recommendation with implicit feedback. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 549–558 (2016)
  • [8] Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. Acm Transactions on Information Systems 22(1), 5–53 (2004)
  • [9] Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Eighth IEEE International Conference on Data Mining. pp. 263–272 (2009)
  • [10] Huang, Z., Chen, H., Zeng, D.: Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM (2004)
  • [11]

    Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 426–434 (2008)

  • [12] Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
  • [13]

    Koren, Y., Sill, J.: Collaborative filtering on ordinal user feedback. In: International Joint Conference on Artificial Intelligence. pp. 3022–3026 (2013)

  • [14] Li, F., Xu, G., Cao, L.: Coupled Item-based Matrix Factorization. Springer International Publishing (2014)
  • [15] Li, N., Li, C.: Zero-sum reward and punishment collaborative filtering recommendation algorithm. In: 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. vol. 1, pp. 548–551 (Sept 2009)
  • [16] Mehta, R., Rana, K.: A review on matrix factorization techniques in recommender systems. In: 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA). pp. 269–274 (April 2017)
  • [17] Meng, J., Zheng, Z., Tao, G., Liu, X.: User-specific rating prediction for mobile applications via weight-based matrix factorization. In: IEEE International Conference on Web Services. pp. 728–731 (2016)
  • [18]

    Paterek, A.: Improving regularized singular value decomposition for collaborative filtering. Proceedings of Kdd Cup and Workshop (2007)

  • [19] Patra, B.K., Launonen, R., Ollikainen, V., Nandi, S.: A new similarity measure using bhattacharyya coefficient for collaborative filtering in sparse data. Knowledge-Based Systems 82(C), 163–177 (2015)
  • [20] Ricci, F., Rokach, L., Shapira, B.: Recommender Systems: Introduction and Challenges, pp. 1–34. Springer US, Boston, MA (2015)
  • [21] Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: International Conference on Neural Information Processing Systems. pp. 1257–1264 (2007)
  • [22] Sarwar, B., Konstan, J., Riedl, J.: Incremental singular value decomposition algorithms for highly. In: International Conference on Computer and Information Science. pp. 27–28 (2002)
  • [23] Xue, W., Xiao, B., Mu, L.: Intelligent mining on purchase information and recommendation system for e-commerce. In: IEEE International Conference on Industrial Engineering and Engineering Management. pp. 611–615 (2016)
  • [24] Zhang, S., Yao, L., Xu, X.: Autosvd++: An efficient hybrid collaborative filtering model via contractive auto-encoders. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 957–960. SIGIR ’17, ACM, New York, NY, USA (2017)
  • [25] Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the netflix prize. In: Proc. Int’l Conf. Algorithmic Aspects in Information and Management, Lncs. pp. 337–348 (2008)