1 Introduction
Recommender systems have been playing a critical role in the realms of retail, social networking, and entertainment industries. Providing personalized recommendations is an important commercial strategy for online websites and mobile applications. There are two major recommendation tasks: rating prediction and personalized ranking. The former usually needs explicit ratings(e.g., 15 stars) while the latter aims to generate a ranked list of items in descending order based on the estimated preferences for each user. In many real world scenarios where only implicit feedback is available, personalized ranking is a more appropriate and popular choice
[Rendle et al.2009]. Collaborative filtering (CF) is a de facto approach which has been widely used in many realworld recommender systems [Ricci et al.2015]. CF assumes that useritem interactions can be modelled by inner product of user and item latent factors in a lowdimensional space. An effective and widely adopted ranking model based on CF is Bayesian Personalized Ranking (BPR) [Rendle et al.2009] which optimizes the ranking lists with a personalized pairwise loss. Another stateoftheart model is sparse linear method (SLIM) [Ning and Karypis2011] which recommends topitems via sparse linear regression. While BPR and SLIM have been shown to perform well on ranking task, we argue that they are hindered by a critical limitation: both of them are built on the assumption that there exists a linear relationship between users and items, while the relationship shall be more complex in reallife scenarios.
In recent years, researchers have demonstrated the efficacy of deep neural model for recommendation problems [Zhang et al.2017a, Karatzoglou and Hidasi2017]. Deep neural network can be integrated into classic recommendation models such as collaborative filtering [He et al.2017, Tay et al.2018a] and content based approaches [Cheng et al.2016, Tay et al.2018b]
to enhance their performances. Many deep neural techniques such as multilayered perceptron (MLP), autoencoder (AE), recurrent neural network (RNN) and convolutional neural network (CNN) can be applied to recommendation models. AE is usually used to incorporate side information of users/items. For example,
[Wang et al.2015] and [Zhang et al.2017b] proposed integrated models by combining latent factor model (LFM) with different variants of autoencoder; AE can also be adopted to reconstruct the rating matrix directly [Sedhain et al.2015]. CNN is mainly used to extract features from textual [Kim et al.2016, Zheng et al.2017], audio [Van den Oord et al.2013] or visual [He and McAuley2016] content. RNN can be used to model the sequential patterns of rating data or sessionbased recommendation [Hidasi et al.2015]. For example, [Wu et al.2017] designed a recurrent neural network based rating prediction model to capture the temporal dynamics of rating data; [Hidasi et al.2015] proposed using RNN to capture the interconnections between sessions. Some works attempted to generalize traditional recommendation models into neural versions. For example, [He et al.2017, He and Chua2017] designed the neural translations of LFM and factorization machine to model useritem interactions; [Xue et al.2017] proposed a deep matrix factorization model to anticipate user’s preferences from historical explicit feedback.Most previous works focused upon either explicit feedback (rating prediction task) or representation learning from abundant auxiliary information instead of interpreting useritem relationships in depth. In this work, we aim to model the useritem intricate relationships from implicit feedback, instead of explicit ratings, by applying multilayered nonlinear transformations. The main contributions are as follows:

We propose two recommendation models with deep neural networks, userbased NeuRec (UNeuRec) and itembased NeuRec (INeuRec), for personalized ranking task. We present an elegant integration of LFM and neural networks which can capture both the linearity and nonlinearity in reallife datasets.

With deep neural networks, we managed to reduce the number of parameters of existing advanced models while achieving superior performances.
2 Preliminaries
To make this paper selfcontained, we first define the research problem and introduce two highly relevant previous works.
2.1 Problem Statement
Let and denote the total number of users and items in a recommender system, so we have a interaction matrix . We use lowcase letter and to denote user and item respectively, and represents the preference of user to item
. In our work, we will use two important vectors:
and . denotes user ’s preferences toward all items; means the preferences for item received from all users in the system. We will focus on recommendation with implicit feedback here. Implicit feedback such as, click, browse and purchase is widely accessible and easy to collect. We set to if the interaction between user and item exists, otherwise, . Here, does not necessarily mean user dislikes item , it may also mean that the user does not realize the existence of item .2.2 Latent Factor Model
Latent factor model (LFM) is an effective methodology for modelbased collaborative filtering. It assumes that the useritem affinity can be derived from lowdimensional representations of users and items. Latent factor method has been widely studied and many variants have been developed [Koren et al.2009, Koren2008, Zhang et al.2017b, Salakhutdinov and Mnih2007]. One of the most successful realizations of LFM is matrix factorization. It factorizes the interaction matrix into two lowrank matrices with the same latent space of dimensionality ( is much smaller than and ), such that useritem interactions are approximated as inner product in that space
(1) 
where is the user latent factor and is the item latent factor. With this low rank approximation, it compresses the original matrix down to two smaller matrices.
2.3 Sparse Linear Method
SLIM [Ning and Karypis2011] is a sparse linear model for top recommendation. It aims to learn a sparse aggregation coefficient matrix . is reminiscent of the similarity matrix in itembased neighbourhood CF (itemCF) [Linden et al.2003], but SLIM learns the similarity matrix as a least squares problem rather than determines it with predefined similarity metrics (e.g., cosine, Jaccard etc.). It finds the optimal coefficient matrix by solving the following optimization problem
The constraints are intended to avoid trivial solutions and ensure positive similarities. The norm is adopted to introduce sparsity to matrix . SLIM can be considered as a special case of LFM with and . SLIM is demonstrated to outperform numerous models in terms of top recommendation. Nevertheless, we argue that it has two main drawbacks: (1) From the definition, the size of is far larger than the two latent factor models, that is,
, which also results in higher model complexity. Even though it can be improved via feature selection by first learning an itemCF model, this sacrifices model generalization as it heavily relies on other pretrained recommendation models; (2) SLIM assumes that there exists strong linear relationship between interaction matrix and
. However, this assumption does not necessarily holds. Intuitively, the relationship shall be far more complex in real world applications due to the dynamicity of user preferences and item changes. In this work, we aim to address these two problems. Inspired by LFM and recent advances of deep neural network on recommendation tasks, we propose employing a deep neural network to tackle the above disadvantages by introducing nonlinearity to top recommendations.3 Proposed Methodology
In this section, we present a novel nonlinear model based on neural network for top recommendation and denote it by NeuRec. Unlike SLIM which directly applies linear mapping on the interaction matrix , NeuRec first maps into a lowdimensional space with multilayer neural networks. This transformation not only reduces the parameter size, but also incorporates nonlinearity to the recommendation model. Then the useritem interaction is modeled by inner product in the lowdimensional space. Based on this approach, we further devise two variants, namely, UNeuRec and INeuRec.
3.1 Userbased NeuRec
For userbased NeuRec, we first get the highlevel dense representations from the rows of
with feedforward neural networks. Note that
is constructed with training data, so there are no leakages of test data in this model. Let and , ( is the number of layers) denote the weights and biases of layer . For each user, we havewhere
is a nonlinear activation function such as
, or . The dimension of output is usually much smaller than original input . Suppose the output dimension is (we reuse the latent factor size here), we have an output for each user. Same as latent factor models, we define an item latent factor for each item, and consider as user latent factor. The recommendation score is computed by the inner product of these two latent factors(2) 
To train this model, we minimize the regularized squared error in the following form
(3) 
Here, is the regularization rate. We adopt the Frobenius norm to regularize weight and item latent factor . Since parameter is no longer a similarity matrix but latent factors in a lowdimensional space, the constraints in SLIM and norm can be relaxed. For optimization, we apply the Adam algorithm [Kingma and Ba2014] to solve this objective function. Figure 1(left) illustrates the architecture of UNeuRec.
3.2 Itembased NeuRec
Likewise, we use the column of as input and learn a dense representation for each item with a multilayered neural network
(4)  
(5)  
(6) 
Let denote the user latent factor for user , then the preference score of user to item is computed by
(7) 
We also employ a regularized squared error as the training loss. Thus, the objective function of itembased NeuRec is formulated as
(8) 
the optimal parameters can also be learned with Adam Optimizer as well. The architecture of INeuRec is illustrated in Figure 1(right).
3.3 Dropout Regularization
Dropout [Srivastava et al.2014]
is an effective regularization technique for neural networks. It can reduce the coadaptation between neurons by randomly dropping some neurons during training. Unlike traditional dropout which is usually applied on hidden layers, here, we propose applying the dropout operation on the input layer
or (We found that the improvement of applying the dropout on hidden layers is subtle in our case). By randomly dropping some historical interactions, we could prevent the model from learning the identity function and increase the robustness of NeuRec.3.4 Relation to LFM and SLIM
In this section, we shed some light on the relationships between NeuRec and LFM / SLIM. NeuRec can be regarded as a neural integration of LFM and sparse linear model. NeuRec utilizes the concepts of latent factor in LFM. The major difference is that either item or user latent factors of NeuRec are learned from the rating matrix with deep neural network. In addition, NeuRec also manages to capture both negative and positive feedback in an integrated manner with rows or columns of as inputs. To be more precise, UNeuRec is a neural extension of SLIM. If we set to identity function and enforce to be a uniform vector of 1 and omit the biases, we have . Hence, UNeuRec will degrade to a SLIM with . Note that the sparsity and nonnegativity constraints are dropped. INeuRec has no direct relationship with SLIM. Nonetheless, it can be viewed as a symmetry version of UNeuRec. Since the objective functions of NeuRec and SLIM are similar, the complexities of these two models are linear to the size of the interaction matrix. Yet, NeuRec has less model parameters.
3.5 Pairwise Learning Approach
NeuRec can be boiled down to a pairwise training scheme with Bayesian log loss.
(9) 
Where is the model parameters, for UNeuRec, and for INeuRec; is Frobenius regularization; and represent observed and unobserved items respectively. The above pairwise method is intended to maximize the difference between positive items and negative items. However, previous studies have shown that optimizing these pairwise loss does not necessarily lead to best ranking performance [Zhang et al.2013]
. To overcome this issue, we adopt a nonuniform sampling strategy: in each epoch, we randomly sampled
items from negative samples for each user, calculate their ranking score and then treat the item with the highest rank as the negative sample. The intuition behind this algorithm is that we shall rank all positives samples higher than negatives samples.4 Experiments
In this section, we conduct experiments on four realworld datasets and analyze the impact of hyperparameters.
4.1 Experimental Setup
4.1.1 Datasets Description
We conduct experiments on four realworld datasets: Movielens HetRec, Movielens 1M, FilmTrust and Frappe. The two Movielens datasets^{1}^{1}1https://grouplens.org/datasets/movielens/ are collected by GroupLens research[Harper and Konstan2015]. Movielens HetRec is released in HetRec 2011^{2}^{2}2http://recsys.acm.org/2011 . It consists of interactions from movies and users. They are widely used as benchmark datasets for evaluating the performance of recommender algorithms. FilmTrust is crawled from a movie sharing and rating website by Guo et al. [Guo et al.2013]. Frappe [Baltrunas et al.2015] is an Android application recommendation dataset which contains around a hundred thousand records from
users on over four thousand mobile applications. The interactions of these four datasets are binarized with the approach introduced in Section 2.1.
4.1.2 Evaluation Metrics
To appropriately evaluate the overall performance for ranking task, the evaluation metrics include Precision and Recall with different cutoff value (e.g., P@5, P@10, R@5 and R@10), Mean Average Precision (MAP), Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (DNCG). These metrics are used to evaluate the quality of recommendation lists regarding different aspects
[Liu and others2009, Shani and Gunawardana2011]: Precision, Recall and MAP are used to evaluate the recommendation accuracy, as they only consider the hit numbers and ignore the rank positions; MRR and DNCG are two rankaware measures with which higher ranked positive items are prioritized, thus they are more suitable for assessing the quality of ranked lists. We omit the details for brevity.4.2 Implementation Details
We implemented our proposed model based on Tensorflow
^{3}^{3}3https://www.tensorflow.org/ and tested it on a NVIDIA TITAN X Pascal GPU. All models are learned with minbatch Adam. We do grid search to determine the hyperparameters. For all the datasets, we implement a five hidden layers neural network with constant structure for the neural network part of NeuRec and use sigmoid as the activation function. For MLHetRec, we set the neuron number of each layer to , latent factor dimension to and dropout rate to ; For ML1M, neuron number is set to , is set to , and dropout rate is set to . The neuron size for FilmTrust is set to and is set to . We do not use dropout for this dataset; For Frappe, neuron size is set to , is set to and dropout rate is set to . We set the learning rate to for MLHetRec, ML1M and Frappe. The learning rate for FilmTrust is . For MLHetRec, ML1M and FilmTrust, we set the regularization rate to , and that for Frappe is set to . For simplicity, we adopt the same parameter setting for pairwise training method. We use 80% useritem pairs as training data and hold out 20% as the test set, and estimate the performance based on five random traintest splits.4.3 Results and Discussions
Since NeuRec is designed to overcome the drawbacks of LFM and SLIM, so they are two strong baselines for comparison to demonstrate if our methods can overcome their disadvantages. Specifically, we choose BPRMF [Rendle et al.2009], a personalized ranking algorithm based on matrix factorization, as the representative of latent factor model. Similar to [Ning and Karypis2011], we adopt neighbourhood approach to accelerate the training process of SLIM. For fair comparison, we also report the results of mostPOP and two neural network based models: GMF and NeuMF [He et al.2017], and follow the configuration proposed in [He et al.2017]. The recent work DMF [Xue et al.2017] is tailored for explicit datasets and not suitable for recommendations on implicit feedback, so it is unfair to compare our method with it.
4.3.1 Parameter Size
The parameter size of SLIM is , while INeuRec has parameters and UNeuRec has . is the size of the neural network. Usually, our model can reduce the number of parameters largely (up to 10 times).
4.3.2 Overall Comparisons
MOVIELENS HetRec  

Methods  Precision@5  Precision@10  Recall@5  Recall@10  MAP  MRR 
mostPOP  
BPRMF  
GMF  
SLIM  
NeuMF  
INeuRec  
UNeuRec  
MOVIELENS 1M  
Methods  Precision@5  Precision@10  Recall@5  Recall@10  MAP  MRR 
mostPOP  
BPRMF  
GMF  
SLIM  
NeuMF  
INeuRec  
UNeuRec  
FILMTRUST  
Methods  Precision@5  Precision@10  Recall@5  Recall@10  MAP  MRR 
mostPOP  
BPRMF  
GMF  
SLIM  
NeuMF  
INeuRec  
UNeuRec  
FRAPPE  
Methods  Precision@5  Precision@10  Recall@5  Recall@10  MAP  MRR 
mostPOP  
BPRMF  
GMF  
SLIM  
NeuMF  
INeuRec  
UNeuRec 
Table 1 and Figure 2 summarize the overall performance of baselines and NeuRec. From the comparison, we can observe that our methods constantly achieve the best performances on these four datasets not only in terms of prediction accuracy but also ranking quality. Higher MRR and NDCG mean that our models can effectively rank the items user preferred in top positions. Performance gains of NeuRec over the best baseline are: Movielens HetRec (), Movielens 1M (), FilmTrust (), Frappe (). The results of INeuRec and UNeuRec are very close and better than competing baselines. The subtle difference between UNeuRec and INeuRec might be due to the distribution differences of user historical interactions and item historical interactions (or the number of users and items). We found that the improvement of NeuMF over GMF are not significant, which might be due to the overfitting caused by the use of dual embedding spaces [Tay et al.2018a]. Although the improvements of pairwise based UNeuRec and INeuRec are subtle (in Tables 2 and 3), they are still worth being investigated. From the results, we observe that UNeuRec is more suitable for pairwise training. In UNeuRec, positive item and negative item are represented by two independent vectors and , while in INeuRec, they need to share the same network with input or . Therefore, the negative and positive samples will undesirably influence each other.
ML HetRec  ML 1M  FilmTrust  FRAPPE  

P@5  0.521  0.347  0.418  0.038 
P@10  0.473  0.303  0.349  0.032 
R@5  0.047  0.077  0.402  0.054 
R@10  0.082  0.128  0.630  0.086 
MAP  0.227  0.194  0.492  0.076 
MRR  0.702  0.564  0.625  0.115 
NDCG  0.636  0.560  0.656  0.137 
ML HetRec  ML 1M  FilmTrust  FRAPPE  

P@5  0.415  0.345  0.413  0.039 
P@10  0.394  0.304  0.346  0.036 
R@5  0.036  0.075  0.397  0.037 
R@10  0.066  0.127  0.618  0.063 
MAP  0.210  0.193  0.483  0.063 
MRR  0.579  0.554  0.610  0.108 
NDCG  0.615  0.556  0.644  0.129 
4.4 Sensitivity to Neural Network Parameters
In the following text, we systematically investigate the impacts of neural hyperparameters on UNeuRec with dataset FilmTrust (INeuRec has a similar pattern to UNeuRec). In each comparison, we keep other settings unchanged and adjust the corresponding parameter values.
4.4.1 Latent Factor Size
Similar to latent factor model [Koren and Bell2015], the latent factor dimension poses great influence on the ranking performances. Larger latent factor size will not increase the performance and may even result in overfitting. In our case, setting to a value around to is a reasonable choice.
4.4.2 Number of Neurons
We set the neurons size to 50, 150, 250, 350 and 450 with a constant structure. As shown in Figure 3(b), both too simple and too complex model will decrease the model performance: simple model suffers from underfitting while complex model does not generalize well on test data.
4.4.3 Activation Function
We mainly investigate activation functions: , , and . We apply the activation function to all hidden layers. Empirically study shows that the function performs poorly with NeuRec, which also demonstrates the effectiveness of introducing nonlinearity. outperforms the other three activation functions. One possible reason is that can restrict the predicted value in range of , so it is more suitable for binary implicit feedback.
4.4.4 Depth of Neural Network
Another key factor is the depth of the neural network. From Figure 3(d), we observe that our model achieves comparative performances with hidden layers number set to 3 to 7. However, when we continue to increase the depth, the performance drops significantly. Thus, we would like to avoid overcomplex model by setting the depth to an appropriate small number.
5 Conclusion and Future Work
In this paper, we propose the NeuRec along with its two variants which provide a better understanding of the complex and nonlinear relationship between items and users. Experiments show that NeuRec outperforms the competing methods by a large margin while reducing the size of parameters substantially. In the future, we would like to investigate methods to balance the performance of INeuRec and UNeuRec, and incorporate items/users side information and context information to further enhance the recommendation quality. In addition, more advanced regularization techniques such as batch normalization could also be explored.
References
 [Baltrunas et al.2015] Linas Baltrunas, Karen Church, et al. Frappe: Understanding the usage and perception of mobile app recommendations inthewild. arXiv preprint arXiv:1505.03014, 2015.

[Cheng et al.2016]
HengTze Cheng, Levent Koc, et al.
Wide & deep learning for recommender systems.
In DLRS, pages 7–10. ACM, 2016.  [Guo et al.2013] G. Guo, J. Zhang, and N. YorkeSmith. A novel bayesian similarity measure for recommender systems. In IJCAI, pages 2619–2625, 2013.
 [Harper and Konstan2015] F. Maxwell Harper and Joseph A. Konstan. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5(4):19:1–19:19, December 2015.
 [He and Chua2017] Xiangnan He and TatSeng Chua. Neural factorization machines for sparse predictive analytics. In SIGIR, pages 355–364, NY, USA, 2017. ACM.
 [He and McAuley2016] Ruining He and Julian McAuley. Vbpr: Visual bayesian personalized ranking from implicit feedback. In AAAI, pages 144–150, 2016.
 [He et al.2017] Xiangnan He, Lizi Liao, et al. Neural collaborative filtering. In WWW, pages 173–182, 2017.
 [Hidasi et al.2015] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. Sessionbased recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939, 2015.
 [Karatzoglou and Hidasi2017] Alexandros Karatzoglou and Balázs Hidasi. Deep learning for recommender systems. In RecSys, RecSys ’17, pages 396–397, New York, NY, USA, 2017. ACM.
 [Kim et al.2016] Donghyun Kim, Chanyoung Park, et al. Convolutional matrix factorization for document contextaware recommendation. In RecSys, pages 233–240. ACM, 2016.
 [Kingma and Ba2014] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [Koren and Bell2015] Yehuda Koren and Robert Bell. Advances in collaborative filtering. In Recommender systems handbook, pages 77–118. Springer, 2015.
 [Koren et al.2009] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, August 2009.
 [Koren2008] Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In SIGKDD, pages 426–434. ACM, 2008.
 [Linden et al.2003] G. Linden, B. Smith, and J. York. Amazon.com recommendations: itemtoitem collaborative filtering. IEEE Internet Computing, 7(1):76–80, Jan 2003.
 [Liu and others2009] TieYan Liu et al. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009.
 [Ning and Karypis2011] X. Ning and G. Karypis. Slim: Sparse linear methods for topn recommender systems. In ICDM, pages 497–506, Dec 2011.
 [Rendle et al.2009] Steffen Rendle, Christoph Freudenthaler, et al. Bpr: Bayesian personalized ranking from implicit feedback. In UAI, pages 452–461, 2009.
 [Ricci et al.2015] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B Kantor. Recommender systems handbook. Springer, 2015.
 [Salakhutdinov and Mnih2007] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In NIPS, pages 1257–1264, USA, 2007. Curran Associates Inc.
 [Sedhain et al.2015] Suvash Sedhain, Aditya Krishna Menon, et al. Autorec: Autoencoders meet collaborative filtering. In WWW, pages 111–112. ACM, 2015.
 [Shani and Gunawardana2011] Guy Shani and Asela Gunawardana. Evaluating recommendation systems. Recommender systems handbook, pages 257–297, 2011.
 [Srivastava et al.2014] Nitish Srivastava, Geoffrey Hinton, et al. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1):1929–1958, 2014.
 [Tay et al.2018a] Yi Tay, Luu Anh Tuan, et al. Latent relational metric learning via memorybased attention for collaborative ranking. In WWW, pages 729–739, 2018.
 [Tay et al.2018b] Yi Tay, Luu Anh Tuan, et al. Multipointer coattention networks for recommendation. CoRR, abs/1801.09251, 2018.
 [Van den Oord et al.2013] Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. Deep contentbased music recommendation. In NIPS, pages 2643–2651, 2013.
 [Wang et al.2015] Hao Wang, Naiyan Wang, and DitYan Yeung. Collaborative deep learning for recommender systems. In SIGKDD, pages 1235–1244. ACM, 2015.
 [Wu et al.2017] ChaoYuan Wu, Amr Ahmed, et al. Recurrent recommender networks. In WSDM, pages 495–503, NY, USA, 2017. ACM.
 [Xue et al.2017] HongJian Xue, Xinyu Dai, et al. Deep matrix factorization models for recommender systems. In IJCAI, pages 3203–3209, 2017.
 [Zhang et al.2013] Weinan Zhang, Tianqi Chen, Jun Wang, and Yong Yu. Optimizing topn collaborative filtering via dynamic negative item sampling. In SIGIR, pages 785–788, New York, NY, USA, 2013. ACM.
 [Zhang et al.2017a] Shuai Zhang, Lina Yao, and Aixin Sun. Deep learning based recommender system: A survey and new perspectives. arXiv preprint arXiv:1707.07435, 2017.
 [Zhang et al.2017b] Shuai Zhang, Lina Yao, and Xiwei Xu. Autosvd++: An efficient hybrid collaborative filtering model via contractive autoencoders. In SIGIR, pages 957–960, New York, NY, USA, 2017. ACM.
 [Zheng et al.2017] Lei Zheng, Vahid Noroozi, et al. Joint deep modeling of users and items using reviews for recommendation. In WSDM, pages 425–434. ACM, 2017.
Comments
There are no comments yet.