I Introduction
Rapid and accurate prediction for users’ preference is the ultimate goal of today’s recommender systems [1]. Accurate personalized recommender systems benefit both demandside and supplyside including content publisher and platform[2, 3]. For this reason, recommender systems not only attract great interests in the academia [4, 5, 6, 7], but also are widely developed in industry [8, 9, 10].
Matrix factorization (MF) [8]
models are one of the remarkable solutions in recommender systems. MF models, as a branch of collaborative filtering techniques, characterize both items and users by vectors in the same space, inferred from the observed entries of the useritem rating matrix; while predicting an unknown rating of a useritem pair relies on the item and the user vectors (the predicted score is usually the inner product of the corresponding item and user vectors). Singular value decomposition (SVD), as a member of MF models, imposes baseline predictors. SVD++
[11] extends SVD by including users’ implicit feedback, such as users’ historical consumed items, into its model as auxiliary information. It is found that SVD++ is one of the most effective models in recommender systems.Besides MF, another successful family of collaborative filtering techniques are the nearest neighbor methods [12, 13]
, which estimate unknown ratings of a useritem pair by considering similar users’ ratings to the item.
Both SVD++ and nearest neighbor models utilize the structure information of useritem bipartite graph. Useritem bipartite graph treats each user and item as a vertex. Every rating between users and items is represented as an undirected weighted edge connecting the corresponding user and item vertices. The number of “step” on the bipartite graph is defined as the shortest unweighted path length between two vertices. Therefore SVD++, which applies user implicit feedback, utilizes the “stepone” structure information whereas nearest neighbor model, which relies on similar users’ ratings to predict, makes use of “steptwo” structure information.
Inspired by the structure of useritem bipartite graph, SVD++ exposes two limitations: (i). SVD++ only utilizes userside implicit feedback, whereas itemside implicit feedback is not leveraged; (ii). SVD++ treats interacted items as users’ implicit feedback equally, while in real world a user usually has preference over viewed items. For example, when it comes to movie recommendation, a user has rated two movies: Titanic and Stargate: Continuum. In this case, Stargate: Continuum should be much more important because Titanic is watched by so many people, while Stargate: Continuum may only attract Science fiction fans and can better reflect a user’s personal interest.
In this paper, we propose three novel models to resolve the above two limitations of SVD++:

Graphbased collaborative filtering (GCF) model generalizes implicit feedback from a graph perspective and introduces item implicit feedback to SVD++ model. As bipartite graph structure information has shown its effectiveness in the aforementioned two branches of CF models, it is natural to consider introducing implicit feedback on items, which describes the relationship between items better and thus improves the performance of the model.

On the basis of GCF model, Weighted Graphbased CF (WGCF) model applies a matrix form, which models users’ personal tastes and items’ special popularity, to achieve weight mechanism for the implicit feedback. According to various user tastes and item popularities, different weights are calculated to describe the profiling ability of the implicit feedback to the user and item.

Attentive Graphbased CF (AGCF) model also achieves weight mechanism for implicit feedback. Rather than decomposing weights between the user/item and the implicit feedback, AGCF model adopts attention network on the implicit feedback to distinguish different importance. Furthermore, the data representation of implicit feedback on a user/item is essentially defined by the “neighborhood” on bipartite graph, which can consist of multistep vertices of user/item rather than stepone vertices in SVD++. In the paper we conduct experiment with steptwo implicit feedback data on AGCF model and prove its effectiveness, especially for sparse data.
In the experiments, we also evaluate and compare the influence of different implicit feedback sampling methods to model performance. We find out that for WGCF and AGCF models, random sampling can achieve close performance compared with relevance sampling and greatly saves time for data pretreatment.
The rest of this paper is organized as follows. We discuss the related work in Section II and present two preliminary models in III. Then we present our models in Section IV. Experimental settings and results are discussed in Section V. We finally conclude this paper and discuss the future work in Section VI.
Ii Related Work
Collaborative filtering (CF) has been well studied for personalized recommender systems during the last decade [14, 8, 5]. The basic assumption of CF is that the users with similar behavior tend to like the same item, and the items with similar audiences tend to have the same rating [15, 13].
From the model’s perspective, CF mainly consists of memorybased and modelbased methods [16]. Memorybased methods directly define similarities of pairs of items or pairs of users, based on which preference scores of useritem pairs can be calculated[13, 12]. Despite their simplicity and interpretability, memory based methods usually take too much time or memory to calculate itemitem or useruser similarity, making them incapable of handling largescale recommendation scenarios. On the other hand, modelbased CF methods normally learn the similarities by fitting the model to the useritem interaction data and make the prediction based on the model. Latent factor models are major implementation of modelbased methods, such as probabilistic latent semantic analysis (pLSA) [17] and the most widely used matrix factorization [8].
Koren proposed SVD++ [11], a classic model to combine user’s “neighborhood”, i.e. previously rated items, and matrix factorization model for prediction, which can be viewed as a mixture of memorybased and modelbased methods.
From the data perspective, for the prediction of target data, the classic CF tasks focus on rating prediction, i.e. to make the predicted rating scores as accurate as possible, which is a regression problem [18]. Since 2009, there emerge researches on implicit data, where there is only the observation of a user consuming an item (without any explicit rating) [6]. Such data format can be regarded as a oneclass classification problem [19, 20].
Furthermore, for such implicit feedback data, the recommendation performance is typically evaluated by a top recommendation task, i.e. to select and rank
items to the target user and thus the learning to rank evaluation metrics can be adopted, such as mean average precision (MAP), normalized cumulative discounted gain (NDCG), and mean reciprocal ranking (MRR)
[21]. We only study the more fundamental regression problem, thus the ranking metrics will not be evaluated on our models, and these ranking approaches will not be compared with our proposal.As for the side information, such as user demographics, item attributes and the recommendation context, many matrix factorization variants were proposed [8, 22], among which the most successful model would be factorization machines (FM) [5]. In FM, the side information and user/item identifiers are regarded as onehot features of different fields, with a low dimensional latent vector assigned to each feature. The interaction of user, item and side information is formulated by vector inner product.
Recently, with the great success of attention network applied to various applications, such as image recognition [23]
[24], it is not surprising there emerge some attention models for CF. Attentional factorization machine was leveraged in [25] where attention network is applied on cross features in factorization machines and assigns different feature combinations with different weights. The author in [26] proposed an attentionbased encoderdecoder architecture for modeling user session and predicting next item on past activities.Compared with the abundant previous works of CF, this work is positioned on the most classic prediction problem with no side information. Our work makes differences with memorybased and modelbased CF methods by considering graphbased user/item representation. Our model is expected to extract more useful information from the simple bipartite graph with the help of attention network, thus can be easily integrated with other feature based frameworks.
User set  
Item set  
user  
item  
dimension of latent factor  
dimension of embedding for implicit feedback  
latent factor for user  
latent factor for item  
score bias of user  
score bias of item  
global score bias  
embedding for user implicit feedback  
embedding for item implicit feedback  
set of items that has interaction with user  
set of users that has interaction with item  
number of users  
number of items  
weight of user implicit feedback to user  
weight of item implicit feedback to item  
user embedding for implicit feedback weight  
item embedding for implicit feedback weight  
set of steptwo implicit feedback for user  
set of steptwo implicit feedback for item  
attention score of user implicit feedback to user  
attention score of item implicit feedback to item 
Iii Preliminary
In this section, we provide detailed preliminaries of basic matrix factorization and SVD++. In order to study the fundamental CF problems, we choose to focus on (user, item, rating) tuples and the structural information within the useritem bipartite graph. Side information other than user IDs, item IDs, and ratings are not considered in this paper. Such additional features can be seamlessly introduced into our model by leveraging models like factorization machines. All the notations that will show up later in this paper are listed in Table I.
Iiia Matrix Factorization
In basic matrix factorization model, a user and an item are represented by latent vectors and , respectively, where . Formally, the rating prediction in basic MF model is formulated as:
(1) 
where , , is a nondecreasing scaling function to bound the predicted ratings to certain range, and is the inner product of user and item latent vectors.
IiiB Svd++
Except for a user’s identifier, his neighborhood (denoted as , in this case, his rated items), regarded as implicit feedback, also characterizes a user’s preference. Based on this intuition, SVD++ model characterizes a user by the latent vector , as well as those items rated by the user. More formally, the rating prediction of SVD++ model is formulated as:
(2) 
In the above formulation, user is modeled by the latent vector and its implicit feedback :
(3) 
where is independent of item embedding , and these items are equally weighted by a constant . The set of items rated by a user could be recognized as a feature of this user. SVD++ employs this feature in users’ representation by taking normalized sum over the rated item set, as shown in (3).
Despite the excellent performance in practice, SVD++ has two limitations as mentioned before. Firstly, SVD++ does not utilize itemside implicit feedback; secondly, SVD++ treats interacted items equally, which may not hold true in practice. Thus we propose Weighted Graphbased CF model and Attentive Graphbased CF model.
Iv Methodology
In this section, we propose our Weighted Graphbased CF model and Attentive Graphbased CF model.
Iva Weighted Graphbased CF model
Before introducing weighted version, we firstly introduce Graphbased CF (GCF) model. In order to fully utilize itemside graph information, we extend SVD++ by incorporating implicit feedback on items, i.e. the users who have interacted with the given item. For instance, in Netflix dataset, the implicit feedback on a movie is the set of users who have watched and rated it before. In our GCF model, the rating prediction is defined as follows:
(4)  
(5)  
(6) 
where , and other parameters follow the same definition in SVD++.
We further propose Weighted Graphbased CF (WGCF) model to tackle the second limitation. Instead of being equally treated as shown in (5) and (6), the individuals in both users’ and items’ implicit feedbacks are modeled independently. In our WGCF model, (5) and (6) are redefined as follows:
(7)  
(8) 
In (7) and (8), is the weight matrix for every latent factor in the implicit feedback for both users and items. describes the profiling capability of user implicit feedback to user (similar semantics applies to ). However, if we train directly, we will come across two main obstacles. The first obstacle is the size of , which is never feasible in practice. The second obstacle is the serious sparsity of as users usually only interact with a small subset of items. Thus it is natural to use matrix factorization to learn low rank representation of the weight matrix. We decompose weight matrix into two smaller matrices, , where , . Such decomposition can also be explained as user’s personal taste and item’s special popularity respectively. When some items’ special popularity meets users’ personal taste, such items serve as implicit feedback of the users ought to be assigned a high weight; whereas items’ special popularity is not consistent with users’ personal taste, then such items serve as implicit feedback have little influence on users and should have a low weight. Therefore in WGCF model (7) and (8) are replaced by:
(9)  
(10) 
IvB Attentive Graphbased CF model
Besides matrix representation of user taste and item popularity, another solution is to learn a functional representation, i.e. given user and item hidden vectors, a neural network can be employed to predict the importance of implicit feedback. From this intuition, we introduce attentive mechanism into our model, in order to automatically learn the importance of each neighbor in a graph. In Attentive Graphbased CF (AGCF) model, attention network is applied to substitute the and
matrix in WGCF model to evaluate the importance between different user/item and their implicit feedbacks. The input of the attention network for user implicit feedback is the concatenation of user embedding and the corresponding implicit feedback embedding. Then the concatenated embeddings go through a multilayer perceptron (MLP) with ReLU nonlinearity. Finally, softmax function is performed over the outcome of MLP to ensure that all attention scores be normalized between
and with sum of . Formally, the attention network is defined as follows:(11)  
(12) 
where parameter
denotes the temperature of softmax to properly adjust the variance of the attention scores. The same procedure is carried out for item implicit feedback. Attention scores are then multiplied with implicit feedback embeddings as weights before implicit feedback embeddings are summed together and added in user and item embeddings. (
9) and (10) now rewrite as follows:(13)  
(14) 
where and are the attention score for user implicit feedbacks and item implicit feedbacks separately, just as and in WGCF model. Fig. 1 shows one example of how user implicit feedback is generated.
IvC Feedback Process
In practice, the number of implicit feedback, i.e. the number of interacted users, on popular items in datasets is usually large. We depict the statistics for Netflix dataset in Fig. 2. Xaxis of Fig. 2 is the number of implicit feedback (i.e. interacted items or interacted users) and Yaxis represents for how many users or items have a specific number of implicit feedback. Every point on the figures represents one class of users or items who have the same number of feedbacks. As can be observed, the number of implicit feedbacks on both users and items are distributed in longtail.
We can tell from Fig. 2 that most users have to feedbacks and the number of feedbacks for items is most likely between and . However, there still exist items which have more than feedbacks because such popular items interact with many users. Obviously, it is not practical to include all the feedback data within a model, especially for WGCF and AGCF model, where every implicit feedback needs to multiply its weight or attention score first. Therefore, sampling is quite necessary to include implicit feedback in our models.
In the experiments we study two sampling policies, which are “relevance” and “random”. For “relevance” sampling method, firstly, we run a basic matrix factorization to generate embeddings for users and items. Secondly, we select the relevant items for each user, where the “relevance” of a user and an item is measured by innerproduct of the corresponding embedding vectors. Top20 most relevant items are selected for each user as the implicit feedback in the model. Thirdly, the same sampling procedure is also performed to generate implicit feedback for each item. For users who rate less than 20 items, we take all the items as implicit feedback and pad a unified and fake item repeatedly until the size of implicit feedback of this user reaches 20. Exactly the same trick is applied to the items which are rated by less than 20 users. For “random” sampling method, user and item implicit feedbacks are taken randomly from all candidates with replacement. So for users who have rated less than 20 items or items with less 20 ratings, duplicated items/users will appear in their implicit feedbacks.
Except for stepone implicit feedback, we also sample steptwo implicit feedbacks for each user and item to further figure out how graph structure helps to predict scores of unknown useritem links on bipartite graph. For steptwo implicit feedbacks of user , firstly we generate candidates set , which is the union of item implicit feedbacks for all items in ’s implicit feedback list, either by “relevance” or “random” method. Then 20 steptwo implicit feedbacks for user , denoted as are randomly sampled from the candidate set. There are always more than 20 candidates so no padding is needed here. The same trick is applied to sample item two step implicit feedbacks .
V Experiments
Va Dataset
Our experiments are conducted on the original Netflix Prize contest. Netflix dataset collects in total records from users over movies. Each record contains a rating score from to where indicates user prefers the item most. In our experiments, all scores are normalized into range . We use 80% of the dataset as train set and the rest 20% as test set. Our dataset splitting strategy ensures that all users and movies in test set appear at least once in training set.
VB Comparing Models
During experiments, we compare the following models:
Model  random sampling  relevance sampling  Model  random sampling  relevance sampling 

MF  0.173405    
NCF  0.173266    
NFM  0.169594    
SVD++  0.172743  0.171890  GCF  0.17101  0.17062 
WSVD++  0.169570  0.169293  WGCF  0.168350  0.168639 
ASVD++  0.168505  0.168425  AGCF  0.168426  0.168814 

Matrix Factorization (MF) model. Both users and items are characterized by vectors in the same space, inferred from the historical useritem interactions. The predicted score of a useritem pair is generated by the inner product of corresponding user’s and item’s vectors.

SVD++ model. This model extends MF and further leverages users’ historical interacted items as users’ implicit feedbacks.

Graphbased collaborative filtering (GCF) model. GCF model extends SVD++ model by introducing the item implicit feedbacks (list of users who have historical interaction with the item) into the model.

Weighted Graphbased CF (WGCF) model. Instead of setting an equal and fixed weight for individuals in implicit feedbacks as in SVD++ and GCF, WGCF further learns the weights of individuals in implicit feedbacks in a matrix form.

Attentive Graphbased CF (AGCF) model. Compared with WGCF model, AGCF model applies attention network to distinguish the importance of implicit feedbacks.

Weighted SVD++ (WSVD++) model. This model introduces dynamic weights for user implicit feedback for SVD++ model.

Attentive SVD++ (ASVD++) model. This model applies attention network for user implicit feedback for SVD++ model.
To compare our models with the most recent technology, we also conduct experiments on following models:

Neural Collaborative Filtering (NCF) model. This model combines matrix factorization and multilayer perception (MLP) under one neural network. In NCF model, embeddings of MF and MLP are independent. The output of MF here is the elementwise product of the embeddings to maintain the same vector form as MLP. The outputs of MF and MLP are then combined together with a hyperparameter to determine the tradeoff between MF and MLP.

Neural Factorization Machine (NFM
) model. This model applies neural network on output of traditional factorization machine to introduce nonlinear component. In NFM model, the corresponding embeddings of secondorder feature interaction of factorization machine first do an elementwise product and then go through multilayer perception. Finally, the output of MLP gives the prediction together with global bias and firstorder linear regression of traditional factorization machine.
All the models are implemented with TensorFlow. We rewrite NCF model and NFM model according to
[27] and [28] respectively. For NCF model we add extra bias to the MF part to fit the task. We share the code for repeatable experiments^{1}^{1}1The experiment code: https://goo.gl/kcRH3D. We will publish the code on GitHub upon the paper acceptance..VC Evaluation Matrics
In our experiment, we adopt the evaluation matrix of root mean square error (). This metric is widely used for scorebased recommender systems, which is also used in Netflix Prize contest.
VD Performance
In the experiments, we compare our GCF, WGCF, and AGCF models to MF, SVD++ and NCF, NFM model in setting where with respect to RMSE. We set the dimension of latent vectors of implicit feedbacks in the experiments. The comparison results are presented in Table II. Note that MF, NFM, and NCF have no implicit feedback so sampling policy and item implicit feedback are not applicable.
VD1 Overall performance
From Table II, we can observe that all models outperform MF, of which the performance is . Among all models best performance is achieved by WGCF model with random sampling, gaining improvement over SVD++ model. Comparing with NFM and NCF models, WGCF also gains improvement over NFM model. NCF model does not perform well on RMSE task maybe because it does not adopt regularization.
VD2 Item implicit feedback effectiveness
Table II is divided into two sides. Models like WSVD++ on the left utilizes only user side implicit feedback information whereas models like WGCF on the right side apply both user and item implicit feedbacks. Though MF, NFM, and NCF utilize no implicit feedback, here we still categorize them to the left side. Comparing the performance of models in two sides of the table, we find out that all the three models in the right side outperform the corresponding models on the left side, except for AGCF model with relevance sampling. In addition, GCF and WGCF gain performance improvement of whereas item implicit feedback shows a negligible effect on AGCF model, which may be caused by potential overfitting problem.
VD3 Sampling policy
In our experiments, for all the models except for MF, users’ and items’ implicit feedbacks are restricted to a fixed number (namely, 20) by sampling or padding. In the experiments, we take two sampling methods, “random” and “relevance”, as discussed before in subsection IVC.
For SVD++ model, relevance sampling method gets the performance of while random sampling method only gets . However, when it comes to WGCF and AGCF, things become different. When comparing two sampling methods in the same rows of Table II
, the performance gap between random and relevance sampling becomes minus for AGCF and WGCF models. This is probably caused by the ability of attention network or the weight matrix to distinguish important implicit feedbacks from irrelevant ones. Especially for AGCF model, there is no significant necessity to sample implicit feedbacks according to their relevance to users or items and therefore it will save time for the training process. Another potential benefit is that it also allows the possibility for WGCF and AGCF model to adapt for stream data.
VE Effectiveness of steptwo implicit feedback
Besides stepone implicit feedbacks, we also run experiments over steptwo implicit feedbacks on AGCF, which is called AGCF2 model. We treat steptwo implicit feedbacks the same way as stepone implicit feedbacks except that they do not share embeddings. Detailed experiment result is shown in Fig. 3. The learning curve from Fig. 3 demonstrates that adding steptwo implicit feedbacks of user and item not only slightly improves the overall RMSE performance from to over test data, but also speeds up the convergence of the model. With the same setting of learning parameters, AGCF2 model takes less than a half of number of rounds to converge than with only stepone implicit feedbacks.
In order to further explore the effect of steptwo implicit feedbacks, we also check the performance of AGCF and AGCF2 models on those users who have less stepone implicit feedbacks. In the experiment, we filter the records whose user has less than and stepone implicit feedbacks respectively and then calculate RMSE for these records on AGCF and AGCF2 models. The result is shown in Fig. 4. Through Fig. 4 we can find out that AGCF2 model even gets more accurate predictions on sparse implicit feedbacks while AGCF model, on the contrary, performs worse than AGCF model with full dataset. Though two models performs closely with full dataset, AGCF2 model outperforms AGCF model by when the number of user feedback is less than and . Obviously, steptwo implicit feedback provides useful additional information about user and item for AGCF2 model.
VF Parameters
Compared with traditional SVD++, WGCF and AGCF models both introduce weight mechanism as well as item side implicit feedback. AGCF applies attention network and WGCF adopts matrix form. The following two subsections give a detailed discussion about parameter tuning for the attention network in AGCF model and the weight matrix in WGCF model.
Temperature in attention network In this section, we compare the performance of AGCF model under different softmax temperature parameters. All the experiments are conducted with same parameters and training rounds except for the softmax temperature and the result is shown in Fig. 5. In order to show the ability of attention network to distinguish important implicit feedbacks, for each pair of user/item and its implicit feedback, we find the original rating (before normalization) through the training data and categorize all the pairs according to the rating, namely , into groups. Then we calculate the corresponding mean attention score for each group, which in Fig. 5 is mean of , respectively. Finally, we compare the mean attention scores with the original rating which is served as ground truth here. As a highrating item should be more important to user than a lowrating item, attention scores should reflect this and have high values on highrating items. This conclusion also holds for item implicit feedbacks and corresponding attention scores.
From Fig. 5 we can see that AGCF model with temperature of gets a relatively high RMSE performance on test set, which is obvious as a large temperature value restricts the final attention score merely near the average value. As the temperature gets smaller, the RMSE value on test after 30 rounds’ training continues to drop until temperature and then it increases again. The reason is that as the temperature decreases, attention scores start to take their part as weights to emphasize those “important” implicit feedback embeddings and therefore generate more accurate predictions. However, smaller temperature also enlarges the perturbation during random initialization, which results for the bad performance for model with temperature smaller than . Therefore after this series of experiment, we take temperature , which is a proper value from Fig. 5, for all attentive models.
Regularization In order to prevent overfitting, during model training we apply regularization to embedding for all models except for NFM and NCF models ( is only applied on and for MF). For GCF, AGCF, and WGCF models, regularization is also applied on . Except for that, we apply regularization specifically to weights and for WGCF model. Fig. 6 shows the learning curve of WGCF model under different weights regularization coefficients .
All experiments are conducted under the same parameters for rounds. Through a series of experiments, we found out that when is too small like , the curve overfits quickly; while is too large as for ,
penalty accounts for most of the loss function and the training process goes wrong and gives back NAN after
rounds or so. Only with , WGCF model gets the balance between optimizing the model and penalty for model complexity and gets the best performance.Vi Conclusion and Future Work
In this paper, we study the task of leveraging implicit feedback in graphbased collaborative filtering for recommender systems. Unlike existing works, we generalize the implicit feedback used in collaborative filtering models to further incorporate both users and items as their particular neighborhood information in the useritem bipartite graph. We also extend the model by employing adaptive weighting over the implicit feedbacks. Further, we perform weight matrices and attention networks for each implicit feedback built based on the learned user/item representations. Extended neighborhood in the bipartite graph such as steptwo information can also be utilized in such attentive model to improve performance on sparse data. Experiments on a wellknown collaborative filtering benchmark dataset demonstrate the superiority of our proposed model over the stateoftheart ones in the classic rating prediction task, especially for the sparse implicit feedback scenarios.
For future work, we will extend our model to other realworld application scenarios, such as topN item recommendation, where the ranking objective and negative item sampling strategies will be adopted. Besides, we may also explore how to extend the implicit feedbacks to leverage other structures besides steptwo information.
References
 [1] F. Ricci, L. Rokach, and B. Shapira, Introduction to recommender systems handbook. Springer, 2011.
 [2] P. Lamere and S. Green., “Project aura: recommendation for the rest of us,” Presentation at Sun JavaOne Conference, 2008.
 [3] A. S. Das, M. Datar, A. Garg, and S. Rajaram, “Google news personalization: scalable online collaborative filtering,” in WWW, 2007.
 [4] L. Lü, M. Medo, C. H. Yeung, Y.C. Zhang, Z.K. Zhang, and T. Zhou, “Recommender systems,” Physics Reports, vol. 519, no. 1, pp. 1–49, 2012.
 [5] S. Rendle, “Factorization machines,” in Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 2010, pp. 995–1000.
 [6] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative filtering for implicit feedback datasets,” in Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. Ieee, 2008, pp. 263–272.
 [7] Y. Juan, Y. Zhuang, W.S. Chin, and C.J. Lin, “Fieldaware factorization machines for ctr prediction,” in Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016, pp. 43–50.
 [8] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, 2009.
 [9] J. Davidson, B. Liebald, J. Liu, P. Nandy, T. Van Vleet, U. Gargi, S. Gupta, Y. He, M. Lambert, B. Livingston et al., “The youtube video recommendation system,” in Proceedings of the fourth ACM conference on Recommender systems. ACM, 2010, pp. 293–296.
 [10] L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextualbandit approach to personalized news article recommendation,” in Proceedings of the 19th international conference on World wide web. ACM, 2010, pp. 661–670.
 [11] Y. Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008, pp. 426–434.
 [12] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Itembased collaborative filtering recommendation algorithms,” in Proceedings of the 10th international conference on World Wide Web. ACM, 2001, pp. 285–295.
 [13] J. Wang, A. P. De Vries, and M. J. Reinders, “Unifying userbased and itembased collaborative filtering approaches by similarity fusion,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006, pp. 501–508.
 [14] J. Schafer, D. Frankowski, J. Herlocker, and S. Sen, “Collaborative filtering recommender systems,” The adaptive web, pp. 291–324, 2007.
 [15] G.R. Xue, C. Lin, Q. Yang, W. Xi, H.J. Zeng, Y. Yu, and Z. Chen, “Scalable collaborative filtering using clusterbased smoothing,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2005, pp. 114–121.

[16]
X. Su and T. M. Khoshgoftaar, “A survey of collaborative filtering
techniques,”
Advances in artificial intelligence
, vol. 2009, p. 4, 2009.  [17] T. Hofmann, “Latent semantic models for collaborative filtering,” ACM Transactions on Information Systems (TOIS), vol. 22, no. 1, pp. 89–115, 2004.
 [18] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, “Grouplens: an open architecture for collaborative filtering of netnews,” in Proceedings of the 1994 ACM conference on Computer supported cooperative work. ACM, 1994, pp. 175–186.

[19]
L. M. Manevitz and M. Yousef, “Oneclass svms for document classification,”
Journal of Machine Learning Research
, vol. 2, no. Dec, pp. 139–154, 2001.  [20] R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. Lukose, M. Scholz, and Q. Yang, “Oneclass collaborative filtering,” in Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 2008, pp. 502–511.
 [21] R. BaezaYates, B. RibeiroNeto et al., Modern information retrieval. ACM press New York, 1999, vol. 463.
 [22] Y. Koren, “Collaborative filtering with temporal dynamics,” Communications of the ACM, vol. 53, no. 4, pp. 89–97, 2010.
 [23] V. Mnih, N. Heess, A. Graves et al., “Recurrent models of visual attention,” in Advances in neural information processing systems, 2014, pp. 2204–2212.
 [24] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112.
 [25] J. Xiao, H. Ye, X. He, H. Zhang, F. Wu, and T.S. Chua, “Attentional factorization machines: Learning the weight of feature interactions via attention networks,” arXiv preprint arXiv:1708.04617, 2017.
 [26] P. Loyola, C. Liu, and Y. Hirate, “Modeling user session and intent with an attentionbased encoderdecoder architecture,” in Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 2017, pp. 147–151.
 [27] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.S. Chua, “Neural collaborative filtering,” in Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017, pp. 173–182.
 [28] X. He and T.S. Chua, “Neural factorization machines for sparse predictive analytics,” SIGIR, 2017.
Comments
There are no comments yet.