1 Introduction
In recent years, the recommender system greatly promotes the development of ecommerce. A highquality recommender system can help users quickly find what they like when they facing massive amounts of goods [23], mitigate the problem of information overload [20], and bring more economic benefits for sellers [4]. Consequently, in order to get more accurate recommendation results, many researchers have proposed various recommendation algorithms.
Among all the recommendation algorithms, collaborative filtering (CF) is a relatively simple and effective method to generate a list of recommendations for a target user [1]. One of the most popular methods of CF is userbased CF, it aims to find some users who have behavior records (such as commodity comment and browsing history) similar to the target user, and then recommend him those items that the similar users select [3]. Therefore, researchers have proposed a number of similarity calculation methods to find the similar users [8, 15, 19], but the performance is barely satisfactory in the case of highly sparse data [10].
Because of obtaining significant performance in the Netflix Prize competition, the modelbased CF approaches gain remarkable development in recommender system due to their high accuracy and scalability [12, 16], and matrix factorization model is the most representative one of them. It factorizes useritem rating matrix into two low rank matrices, the userfactor matrix and the itemfactor matrix. Therefore, the original sparse ratingmatrix can be filled by multiplication of two factor matrices. Inspired by the matrix factorization model in the competition, researchers have developed many improved algorithms successively: Funk [5] presented the regularized matrix factorization to solve the Netflix challenge and achieved a good result; Sarwar [22] proposed incremental matrix factorization algorithm to make the recommender system highly scalable; Paterek [18] added user and item biases to matrix factorization for mining the interaction between user and item more accurately; Koren [11] integrated additional information sources in matrix factorization model, but the time complexity was very high; Hu [9] proposed the concept of confidence level to measure user preferences in matrix factorization recommendation algorithm, which was only applicable to implicit feedbacks; He [7] pointed the missing data should be weighted by item popularity and provided the fast matrix factorization model; Meng [17] proposed weightbased matrix factorization and employed term frequency and inverse document frequency to find user’s interests, but the method was only suitable for text data.
However, above methods fail to consider each explicit rating’s reliability of user. In general, users have their own tastes and opinions on an item. Although the explicit rating is made by user, not all ratings should be given the same weight [12]. For example, some users prefer to give items high scores, leading to their average scores much higher than the overall mean value. In contrast, other users only rate the favorite items and tend to give lower scores on other items [2]. In this situation, the preferences of the two types of users are distinctly different, if they give the same item same score, the reliability of the two scores should be carefully evaluated.
In our work, instead of using explicit ratings directly, we explore the reliability of each observed rating under limited user information. Firstly, we analyze the degree of deviation between each rating and the average score of user, propose the notion of userbased rating centrality. Similarly, according to the degree of deviation between each rating and the item average score, we define the itembased rating centrality. Then we combine two kinds of rating centrality to infer the reliability of a rating. Furthermore, we provide an optimized matrix factorization algorithm based on the above analysis. Finally, we use stochastic gradient descent (SGD) learning algorithm to solve the optimization problem of the objective function. Several experiments are conducted on two classic recommendation datasets, and our method obtains better performance than other popular matrix factorization recommendation algorithms, indicating that it is feasible to mine the reliability of explicit ratings based on rating centrality.
The rest of the paper is organized as follows. Section 2 simply describes the matrix factorization recommendation algorithm. Our proposed approach which defines the rating centrality is introduced in detail in Section 3. In Section 4, we present the datasets and evaluation metrics in our experiments, and then analyze the experimental results. Finally, we draw the conclusion in Section 5.
2 Preliminaries
In this section, we first expatiate the problem discussed in this paper, and then give a brief introduction to traditional matrix factorization recommendation algorithms.
2.1 Problem Definition
In a general recommender system, we usually have users, items and the sparse useritem rating matrix . The each observed value of R denotes the user ’s rating on item , and represents the predicted value of the user on item . Given the interaction matrix R of the user and item, the goal of the recommender system is to get the predicted values of items that the target user might interest.
2.2 Matrix Factorization
Matrix factorization algorithms have been extensively used to mine the interaction between user and item [12]. Funk [5] points that the useritem rating matrix R can be decomposed into two low rank matrices, the userfactor matrix and the itemfactor matrix:
(1) 
where denotes user latent factor matrix, denotes item latent factor matrix and the parameter is the number of latent factors, in general, . Therefore, the predicted value can be calculated as:
(2) 
where is the th column of P and is the th column of Q
. We can minimize the regularized squared error loss function
to get latent factor matrices:(3) 
where is the parameter of regularization term that is to avoid over fitting, denotes the Frobenius norm and is the training set of the (user, item) pairs. On the basis of this model, many improved matrix factorization algorithms have been proposed, for example, biased probabilistic matrix factorization [18], weighted regularization matrix factorization [25], coupled itembased matrix factorization [14], etc.
3 Proposed Method
In this section, we introduce the notion of rating centrality from the perspective of user and item respectively, which can be obtained easily even for sparse data. Based on the rating centrality, we present a strategy to compute the reliability of each rating and propose the optimized matrix factorization recommendation algorithm for further improving the accuracy of recommendation results.
3.1 Notion
The userbased rating centrality refers to the deviation degree between the user rating and the average score of user. Even if two users have the same score on the same item, the userbased rating centrality of the two users may be totally different. For example, user only rates whatever he likes, he will have a high average score, however, another user tends to give items negative scores, consequently, the average score of the user is relatively low. It is obvious that the preferences of the two users are completely different. In this case, if user gives item a high score and user gives the same item same score , we can’t regard the two ratings have the same reliability, because user tends to give positive ratings and the average score of him is higher than user . Therefore, we define the userbased rating centrality to measure the reliability from user perspective:
(4) 
where is the average score of user and is the maximum value of the rating scale. Because and may be very close, to avoid the value of too large, we limit the max value of to .
Moreover, the userbased rating centrality is just calculated from user perspective, if the quality of item is really good and user whose average score is low gives it a high rating, we should also suppose the rating has high reliability because the rating is consistence with item popularity. More exactly, if most users have preferences for the item and give it high scores, then the item will have a high average score. On the contrary, if the quality of item is poor, it will get plenty of negative feedbacks from the majority of users. Obviously, the characteristics of the two items are totally different. In this case, if user gives item and same high score or same low score respectively, we should also consider the rating reliability from item perspective. Consequently, we define the itembased rating centrality to measure the deviation degree between the user rating and the average score of item:
(5) 
where is the average value of the item . Similarly, we limit the max value of to . In practical calculation, we can add a minimum value on the denominator in (4) and (5) to avoid denominator equals to zero, .
After obtaining the userbased and itembased rating centrality, we present a strategy to measure the reliability of a rating. If both and are small values, that means the rating deviates from the overall distribution of user and item , therefore, we suppose the rating has relatively low reliability. However, if we get high values of and , we will consider the rating reflects the real evaluation of user and item , and give it a high weight. Hence, we can get the reliability of a rating from the following formula:
(6) 
where is a monotone increasing function that normalize the reliability. The bias is to avoid and maintain the data integrity. We will use three kinds of : , , , and conduct an experiment to compare the performance of them in Section 4.
3.2 Prediction
According to [11], the prediction formula for the calculation of in our method is defined as:
(7) 
where is the bias of user and is the bias of item . We consider that if a rating’s reliability is low, then the influence of the rating should be reduced in training process. In other words, we pay more attention to the fitting of high reliability ratings in the process of optimizing the objective function. On this basis, we propose an optimized loss function with the weighted regularization which is to avoid the over fitting in the process of model training. The adjusted regularized squared error loss function is as follows:
(8) 
where and denote the average reliability of user ’s ratings and item ’s ratings, respectively.
3.3 Optimization
In order to solve the problem of minimizing the loss function (8), we use SGD to learn the model parameters due to its high efficiency. First, for each observed rating , we can get the the prediction error :
(9) 
Then we compute each parameter’s partial derivative to get the direction of the gradient and next modify the parameters until the convergence is realized:
(10) 
(11) 
(12) 
(13) 
where is the learning rate. Our method is an improved matrix factorization algorithm based on rating centrality, so we call our method MFRC, and the specific algorithm flow is shown in Algorithm 1.
4 Experiments
In this section, we introduce the datasets and evaluation metrics used in our experiments, and then analyze the experimental results in detail.
4.1 Datasets
MovieLens^{1}^{1}1https://grouplens.org/datasets/movielens/ dataset is one of the most prevalent datasets in recommender systems. In our experiments, we use two kinds of MovieLens datasets: MovieLens 100K and MovieLens 1M. Each rating’s range is from 1 to 5, and each user has rated at least 20 items. Table 1 shows the basic statistics of the two datasets.
Dataset  Users  Items  Ratings  Sparsity 

MovieLens 100K  943  1,682  100,000  93.70% 
MovieLens 1M  6,040  3,952  1,000,209  95.80% 
4.2 Benchmark Algorithms
4.3 Evaluation Metrics
In order to evaluate the performance of the proposed method, we use root mean squared error (RMSE) and fraction of concordant pairs (FCP) to measure the accuracy of rating prediction.
RMSE is extensively used in measuring the accuracy of prediction, it is defined as:
(14) 
where is the test set of (user, item) pairs and is the set size.
Another metric is FCP. Koren [13] supposed that the correct item ranking in recommender systems should also be considered. In other words, if in test set, then this trend should be kept in prediction results. Hence, FCP is defined as:
(15) 
where and denote the number of concordant pairs and discordant pairs of user respectively. Higher FCP means the more concordant pairs in test results. Therefore, we expect the recommendation algorithm has a high value of FCP when its RMSE is low.
4.4 Results and Discussion
4.4.1 Impact of Normalization Function
In this section, we compare the performance of three functions in our model. We randomly choose 80% of the original data as training set and the remaining as test set. The number of latent factors is from 20 to 100. From Fig. 1, we can clearly see that “” gets the best value of RMSE, and “” performs sightly worse than “”, while “” performs the worst on both two datasets. This shows that the reliability of each rating should be normalized to a relatively small range. If we overemphasize on highly reliable ratings, we may lose much information from the remaining ratings and result in greater data sparsity. In terms of the performance of prediction, we will use “” in the following experiments.
4.4.2 Impact of Number of Latent Factors
In order to examine our method in depth, we compare our method with other benchmark algorithms under different number of latent factors ranging from 20 to 100. Similar to the previous section, we randomly choose 80% of the original data as training set and conduct each experiment for five times, then calculate the average value of RMSE and FCP. In addition, the number of iterations is set to 100, is set to 0.05 and is set to 0.005 on both two datasets.
Figure. 2 shows that with the increase of , the performances of all methods keep improving and eventually tend to be stable. On MovieLens 100K dataset, from Fig. 2(a), we can see that MFRC outperforms other benchmark algorithms on RMSE, which is at least 0.005 lower than BPMF and PMF. As for ALSWR, it is unstable with the change of . In addition, Fig. 2(b) shows that the FCP of all algorithms is up to 71%, while MFRC has reached 74.5% which is about 1% higher than the second best algorithm. This indicates that our method not only has lower prediction error, but also has more correct ranked items pairs. Similarly, on MovieLens 1M dataset which is more sparse than MovieLens 100K, the performance of our method is still the best one. Fig. 1(c) shows the RMSE of MFRC maintains a gradual decline with the increase of k and at least 0.003 lower than that of BPMF. We can see from Fig. 1(d) that the FCP of MFRC is significantly higher than that of other algorithms and is always maintained above 77.5% when , while others are lower than 77.3%. According to above analysis, we can conclude that the performance of MFRC becomes better and gradually reaches a stable state with the increase of , but obviously the computational complexity of matrix factorization is proportional to . Therefore, we should consider the balance between accuracy and efficiency according to the actual situation.
4.4.3 Impact of Sparsity
Sparsity is one of the most important factors that affect the performance of the recommender system [6]. To further evaluate our method, we change the proportion of training set. The training ratio is set to 50%, 60%, 70% and 80%. The number of latent factors is set to 50.
Table 2 and Table 3 show the experimental results on two datasets, respectively. As we have expected, the sparsity of dataset greatly affects the performance of recommendation algorithm. Table 2 shows the results on MovieLens 100K dataset, we can see that from to , the RMSE of MFRC is always maintained at a relatively low level and the FCP of MFRC increases steadily. Even though the improvement of performance becomes smaller and smaller, MFRC performs substantially well over all other benchmark algorithms. Through the comparison between BPMF and MFRC, we can find that it’s effective to mine highly reliable ratings. From Table 3, we can see clearly that on Movielens 1M dataset, our method outperforms significantly all methods discussed here under different data sparsity and the performance of traditional methods still has certain disparity compared to MFRC. When , the RMSE of MFRC is 0.008 lower than that of BPMF and 0.0221 lower than that of PMF. Similarly, the FCP of MFRC is kept at a high proportion with the increase of . In conclusion, our method that combined with rating centrality can make significantly less prediction error and get more concordant pairs on extremely sparse datasets.
Metric  PMF  BPMF  ALSWR  AutoSVD  MFRC  

RMSE  50%  0.9751  0.9358  1.0112  0.9417  0.9270 
60%  0.9557  0.9196  1.0063  0.9317  0.9163  
70%  0.9412  0.9116  0.9891  0.9245  0.9067  
80%  0.9304  0.9026  0.9730  0.9164  0.8983  
FCP 
50%  71.35%  71.61%  70.11%  70.51%  72.53% 
60%  72.03%  72.37%  70.23%  71.24%  73.20%  
70%  72.60%  72.90%  71.06%  71.77%  73.74%  
80%  73.29%  73.74%  71.60%  72.71%  74.57%  

Metric  PMF  BPMF  ALSWR  AutoSVD  MFRC  

RMSE  50%  0.8830  0.8689  0.8802  0.8753  0.8609 
60%  0.8698  0.8597  0.8681  0.8653  0.8531  
70%  0.8584  0.8514  0.8593  0.8560  0.8465  
80%  0.8511  0.8472  0.8567  0.8484  0.8432  
FCP 
50%  75.33%  75.90%  75.75%  75.38%  76.67% 
60%  76.05%  76.41%  76.47%  75.99%  77.08%  
70%  76.66%  76.86%  76.97%  76.53%  77.44%  
80%  77.01%  77.07%  77.20%  76.89%  77.56%  

5 Conclusion
In this work, for getting more accurate recommendation results, we mine the reliable ratings of user from limited data, and propose an optimized matrix factorization recommendation algorithm based on rating centrality of user and item. Different from traditional matrix factorization recommendation algorithms which fail to consider the reliability of each user rating, in our method, we define the notion of userbased rating centrality and itembased rating centrality, and then combine them to measure the reliability of each rating. On this basis, we introduce the reliability into traditional matrix factorization objective function and make an optimized adjustment. Our extensive experimental results demonstrate that MFRC obtains less prediction error and more concordant pairs compared with other popular matrix factorization recommendation algorithms, especially on highly sparse datasets. We can conclude that our method based on rating centrality can find the reliable rating from user’s explicit ratings and get significant performances in recommender systems.
Acknowledgement
This work was supported by the National Natural Science Foundation of China (No.61602048) and the Fundamental Research Funds for the Central Universities(No.NST20170206).
References
 [1] Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the stateoftheart and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17(6), 734–749 (June 2005)
 [2] Cacheda, F., Formoso, V.: Comparison of collaborative filtering algorithms:limitations of current techniques and proposals for scalable, highperformance recommender systems. Acm Transactions on the Web 5(1), 1–33 (2011)
 [3] Cai, Y., f. Leung, H., Li, Q., Min, H., Tang, J., Li, J.: Typicalitybased collaborative filtering recommendation. IEEE Transactions on Knowledge and Data Engineering 26(3), 766–779 (March 2014)
 [4] Chen, L., Gemmis, M.D., Felfernig, A., Lops, P., Ricci, F., Semeraro, G.: Human decision making and recommender systems. Acm Transactions on Interactive Intelligent Systems 3(3), 1–7 (2013)
 [5] Funk, S.: Netflix update: Try this at home. http://sifter.org/ simon/journal/20061211.html (2006)
 [6] Grčar, M., Mladenič, D., Fortuna, B., Grobelnik, M.: Data Sparsity Issues in the Collaborative Filtering Framework. Springer Berlin Heidelberg (2006)
 [7] He, X., Zhang, H., Kan, M.Y., Chua, T.S.: Fast matrix factorization for online recommendation with implicit feedback. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 549–558 (2016)
 [8] Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. Acm Transactions on Information Systems 22(1), 5–53 (2004)
 [9] Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Eighth IEEE International Conference on Data Mining. pp. 263–272 (2009)
 [10] Huang, Z., Chen, H., Zeng, D.: Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM (2004)

[11]
Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 426–434 (2008)
 [12] Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)

[13]
Koren, Y., Sill, J.: Collaborative filtering on ordinal user feedback. In: International Joint Conference on Artificial Intelligence. pp. 3022–3026 (2013)
 [14] Li, F., Xu, G., Cao, L.: Coupled Itembased Matrix Factorization. Springer International Publishing (2014)
 [15] Li, N., Li, C.: Zerosum reward and punishment collaborative filtering recommendation algorithm. In: 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. vol. 1, pp. 548–551 (Sept 2009)
 [16] Mehta, R., Rana, K.: A review on matrix factorization techniques in recommender systems. In: 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA). pp. 269–274 (April 2017)
 [17] Meng, J., Zheng, Z., Tao, G., Liu, X.: Userspecific rating prediction for mobile applications via weightbased matrix factorization. In: IEEE International Conference on Web Services. pp. 728–731 (2016)

[18]
Paterek, A.: Improving regularized singular value decomposition for collaborative filtering. Proceedings of Kdd Cup and Workshop (2007)
 [19] Patra, B.K., Launonen, R., Ollikainen, V., Nandi, S.: A new similarity measure using bhattacharyya coefficient for collaborative filtering in sparse data. KnowledgeBased Systems 82(C), 163–177 (2015)
 [20] Ricci, F., Rokach, L., Shapira, B.: Recommender Systems: Introduction and Challenges, pp. 1–34. Springer US, Boston, MA (2015)
 [21] Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: International Conference on Neural Information Processing Systems. pp. 1257–1264 (2007)
 [22] Sarwar, B., Konstan, J., Riedl, J.: Incremental singular value decomposition algorithms for highly. In: International Conference on Computer and Information Science. pp. 27–28 (2002)
 [23] Xue, W., Xiao, B., Mu, L.: Intelligent mining on purchase information and recommendation system for ecommerce. In: IEEE International Conference on Industrial Engineering and Engineering Management. pp. 611–615 (2016)
 [24] Zhang, S., Yao, L., Xu, X.: Autosvd++: An efficient hybrid collaborative filtering model via contractive autoencoders. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 957–960. SIGIR ’17, ACM, New York, NY, USA (2017)
 [25] Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Largescale parallel collaborative filtering for the netflix prize. In: Proc. Int’l Conf. Algorithmic Aspects in Information and Management, Lncs. pp. 337–348 (2008)
Comments
There are no comments yet.