1 Introduction
To facilitate the information seeking process for users in the age of data deluge, various information retrieval (IR) technologies have been widely deployed [GarciaMolina et al.2011]. As a typical paradigm of information push, recommender systems have become a core service and a major monetization method for many customeroriented systems [Wang et al.2018b]. Collaborative filtering (CF) is a key technique to build a personalized recommender system, which infers a user’s preference not only from her behavior data but also the behavior data of other users. Among the various CF methods, modelbased CF, more specifically, matrix factorization based methods [Rendle et al.2009, He et al.2016b, Zhang et al.2016] are known to provide superior performance over others and have become the mainstream of recommendation research.
The key to design a CF model is in 1) how to represent a user and an item, and 2) how to model their interaction based on the representation. As a dominant model in CF, matrix factorization (MF) represents a user (or an item) as a vector of latent factors (also termed as
embedding), and models an interaction as the inner product between the user embedding and item embedding. Many extensions have been developed for MF from both the modeling perspective [Wang et al.2015, Yu et al.2018, Wang et al.2018a] and learning perspective [Rendle et al.2009, Bayer et al.2017, He et al.2018]. For example, DeepMF [Xue et al.2017] extends MF by learning embeddings with deep neural networks, BPR [Rendle et al.2009] learns MF from implicit feedback with a pairwise ranking objective, and the recently proposed adversarial personalized ranking (APR) [He et al.2018] employs an adversarial training procedure to learn MF.Despite its effectiveness and many subsequent developments, we point out that MF has an inherent limitation in its model design. Specifically, it uses a fixed and dataindependent function — i.e., the inner product — as the interaction function [He et al.2017]. As a result, it essentially assumes that the embedding dimensions (i.e., dimensions of the embedding space) are independent with each other and contribute equally for the prediction of all data points. This assumption is impractical, since the embedding dimensions could be interpreted as certain properties of items [Zhang et al.2014], which are not necessarily to be independent. Moreover, this assumption has shown to be suboptimal for learning from realworld feedback data that has rich yet complicated patterns, since several recent efforts on neural recommender models [Tay et al.2018, Bai et al.2017] have demonstrated that better recommendation performance can be obtained by learning the interaction function from data.
Among the neural network models for CF, neural matrix factorization (NeuMF) [He et al.2017]
provides stateoftheart performance by complementing the inner product with an adaptable multiplelayer perceptron (MLP) in learning the interaction function. Later on, using multiple nonlinear layers above the embedding layer has become a prevalent choice to learn the interaction function. Specifically, two common designs are placing a MLP above the concatenation
[He et al.2017, Bai et al.2017] and the elementwise product [Zhang et al.2017, Wang et al.2017] of user embedding and item embedding. We argue that a potential limitation of such two designs is that there are few correlations between embedding dimensions being modeled. Although the following MLP is theoretically capable of learning any continuous function according to the universal approximation theorem [Hornik1991], there is no practical guarantee that the dimension correlations can be effectively captured with current optimization techniques.In this work, we propose a new architecture for neural collaborative filtering (NCF) by integrating the correlations between embedding dimensions into modeling. Specifically, we propose to use an outer product operation above the embedding layer, explicitly capturing the pairwise correlations between embedding dimensions. We term the correlation matrix obtained by outer product as the interaction map, which is a matrix where denotes the embedding size. The interaction map is rather suitable for the CF task, since it not only subsumes the interaction signal used in MF (its diagonal elements correspond to the intermediate results of inner product), but also includes all other pairwise correlations. Such rich semantics in the interaction map facilitate the following nonlinear layers to learn possible highorder dimension correlations. Moreover, the matrix form of the interaction map makes it feasible to learn the interaction function with the effective convolutional neural network (CNN), which is known to generalize better and is more easily to go deep than the fully connected MLP.
The contributions of this paper are as follows.

[leftmargin=*]

We propose a new neural network framework ONCF, which supercharges NCF modeling with an outer product operation to model pairwise correlations between embedding dimensions.

We propose a novel model named ConvNCF under the ONCF framework, which leverages CNN to learn highorder correlations among embedding dimensions from locally to globally in a hierarchical way.

We conduct extensive experiments on two public implicit feedback data, which demonstrate the effectiveness and rationality of ONCF methods.

This is the first work that uses CNN to learn the interaction function between user embedding and item embedding. It opens new doors of exploring the advanced and fastly evovling CNN methods for recommendation research.
2 Proposed Methods
We first present the Outer product based Neural Collaborative Filtering (ONCF) framework. We then elaborate our proposed Convolutional NCF (ConvNCF) model, an instantiation of ONCF that uses CNN to learn the interaction function based on the interaction map. Before delving into the technical details, we first introduce some basic notations.
Throughout the paper, we use bold uppercase letter (e.g., P) to denote a matrix, bold lowercase letter to denote a vector (e.g.,
), and calligraphic uppercase letter to denote a tensor (e.g.,
). Moreover, scalar denotes the th element of matrix P, and vector denotes the th row vector in P. Let be 3D tensor, then scalar denotes the th element of tensor , and vector denotes the slice of at the element .2.1 ONCF framework
Figure 1
illustrates the ONCF framework. The target of modeling is to estimate the matching score between user
and item , i.e., ; and then we can generate a personalized recommendation list of items for a user based on the scores.Input and Embedding Layer.
Given a user and an item
and their features (e.g., ID, user gender, item category etc.), we first employ onehot encoding on their features. Let
and be the feature vector for user and item , respectively, we can obtain their embeddings and via(1) 
where and are the embedding matrix for user features and item features, respectively; and denote the embedding size, number of user features, and number of item features, respectively. Note that in the pure CF case, only the ID feature will be used to describe a user and an item [He et al.2017], and thus and are the number of users and number of items, respectively.
Interaction Map.
Above the embedding layer, we propose to use an outer product operation on and to obtain the interaction map:
(2) 
where E is a matrix, in which each element is evaluated as: .
This is the core design of our ONCF framework to ensure the effectiveness of ONCF for the recommendation task. Compared to existing recommender systems [He et al.2017, Zhang et al.2017]
, we argue that using outer product is more advantageous in threefold: 1) it subsumes matrix factorization (MF) — the dominant method for CF — which considers only diagonal elements in our interaction map; 2) it encodes more signal than MF by accounting for the correlations between different embedding dimensions; and 3) it is more meaningful than the simple concatenation operation, which only retains the original information in embeddings without modeling any correlation. Moreover, it has been recently shown that, modeling the interaction of feature embeddings explicitly is particularly useful for a deep learning model to generalize well on sparse data, whereas using concatenation is suboptimal
[He and Chua2017, Beutel et al.2018].Lastly, another potential benefit of the interaction map lies in its 2D matrix format — which is the same as an image. In this respect, the pairwise correlations encoded in the interaction map can be seen as the local features of an “image”. As we all know, deep learning methods achieve the most success in computer vision domain, and many powerful deep models especially the ones based on CNN (e.g., ResNet
[He et al.2016a] and DenseNet [Huang et al.2017]) have been developed for learning from 2D image data. Building a 2D interaction map allows these powerful CNN models to be also applied to learn the interaction function for the recommendation task.Hidden Layers.
Above the interaction map is a stack of hidden layers, which targets at extracting useful signal from the interaction map. It is subjected to design and can be abstracted as , where denotes the model of hidden layers that has parameters , and g is the output vector to be used for the final prediction. Technically speaking, can be designed as any function that takes a matrix as input and outputs a vector. In Section 2.2, we elaborate how CNN can be employed to extract signal from the interaction map.
Prediction Layer.
The prediction layer takes in vector g and outputs the prediction score as: , where vector w reweights the interaction signal in g. To summarize, the model parameters of our ONCF framework are .
2.1.1 Learning ONCF for Personalized Ranking
Recommendation is a personalized ranking task. To this end, we consider learning parameters of ONCF with a rankingaware objective. In the NCF paper [He et al.2017], the authors advocate the use of a pointwise classification loss to learn models from implicit feedback. However, another more reasonable assumption is that observed interactions should be ranked higher than the unobserved ones. To implement this idea, [Rendle et al.2009] proposed a Bayesian Personalized Ranking (BPR) objective function as follows:
(3) 
where
are parameter specific regularization hyperparameters to prevent overfitting, and
denotes the set of training instances: , where denotes the set of items that has been consumed by user . By minimizing the BPR loss, we tailor the ONCF framework for correctly predicting the relative orders between interactions, rather than their absolute scores as optimized in pointwise loss [He et al.2017, He et al.2016b]. This can be more beneficial for addressing the personalized ranking task.It is worth pointing out that in our ONCF framework, the weight vector w can control the magnitude of the value of for all predictions. As a result, scaling up w can increase the margin for all training instances and thus decrease the training loss. To avoid such trivial solution in optimizing ONCF, it is crucial to enforce regularization or the maxnorm constraint on w. Moreover, we are aware of other pairwise objectives have also been widely used for personalized ranking, such as the L2 square loss [Wang et al.2017]. We leave this exploration for ONCF as future work, as our initial experiments show that optimizing ONCF with the BPR objective leads to good top recommendation performance.
2.2 Convolutional NCF
Motivation: Drawback of MLP.
In ONCF, the choice of hidden layers has a large impact on its performance. A straightforward solution is to use the MLP network as proposed in NCF [He et al.2017]; note that to apply MLP on the 2D interaction matrix , we can flat E to a vector of size . Despite that MLP is theoretically guaranteed to have a strong representation ability [Hornik1991], its main drawback of having a large number of parameters can not be ignored. As an example, assuming we set the embedding size of a ONCF model as 64 (i.e., ) and follow the common practice of the halfsize tower structure. In this case, even a 1layer MLP has (i.e., ) parameters, not to mention the use of more layers. We argue that such a large number of parameters makes MLP prohibitive to be used in ONCF because of three reasons: 1) It requires powerful machines with large memories to store the model; and 2) It needs a large number of training data to learn the model well; and 3) It needs to be carefully tuned on the regularization of each layer to ensure good generalization performance^{2}^{2}2In fact, another empirical evidence is that most papers used MLP with at most 3 hidden layers, and the performance only improves slightly (or even degrades) with more layers [He et al.2017, Covington et al.2016, He and Chua2017].
The ConvNCF Model.
To address the drawback of MLP, we propose to employ CNN above the interaction map to extract signals. As CNN stacks layers in a locally connected manner, it utilizes much fewer parameters than MLP. This allows us to build deeper models than MLP easily, and benefits the learning of highorder correlations among embedding dimensions. Figure 2
shows an illustrative example of our ConvNCF model. Note that due to the complicated concepts behind CNN (e.g., stride, padding etc.), we are not ambitious to give a systematic formulation of our ConvNCF model here. Instead, without loss of generality, we explain ConvNCF of this specific setting, since it has empirically shown good performance in our experiments. Technically speaking, any structure of CNN and parameter setting can be employed in our ConvNCF model. First, in Figure
2, the size of input interaction map is , and the model has 6 hidden layers, where each hidden layer has 32 feature maps. A feature map in hidden layer is represented as a 2D matrix ; since we set the stride to 2, the size of is half of its previous layer , e.g. and . All feature maps of Layer can be represented as a 3D tensor .Given the input interaction map E, we can first get the feature maps of Layer 1 as follows:
(4)  
where denotes the bias term for Layer 1, and
is a 3D tensor denoting the convolution filter for generating feature maps of Layer 1. We use the rectifer unit as activation function, a common choice in CNN to build deep models. Following the similar convolution operation, we can get the feature maps for the following layers. The only difference is that from Layer 1 on, the input to the next layer
becomes a 3D tensor :(5)  
where denotes the bias term for Layer , and denote the 4D convolution filter for Layer . The output of the last layer is a tensor of dimension , which can be seen as a vector and is projected to the final prediction score with a weight vector w.
Note that convolution filter can be seen as the “locally connected weight matrix” for a layer, since it is shared in generating all entries of the feature maps of the layer. This significantly reduces the number of parameters of a convolutional layer compared to that of a fully connected layer. Specifically, in contrast to the 1layer MLP that has over 8 millions parameters, the above 6layer CNN has only about 20 thousands parameters, which are several magnitudes smaller. This makes our ConvNCF more stable and generalizable than MLP.
Rationality of ConvNCF.
Here we give some intuitions on how ConvNCF can capture highorder correlations among embedding dimensions. In the interaction map E, each entry encodes the secondorder correlation between the dimension and . Next, each hidden layer captures the correlations of a local area^{3}^{3}3The size of the local area is determined by our setting of the filter size, which is subjected to change with different settings. of its previous layer . As an example, the entry in Layer 1 is dependent on four elements , which means that it captures the 4order correlations among the embedding dimensions . Following the same reasoning process, each entry in hidden layer can be seen as capturing the correlations in a local area of size in the interaction map E. As such, an entry in the last hidden layer encodes the correlations among all dimensions. Through this way of stacking multiple convolutional layers, we allow ConvNCF to learn highorder correlations among embedding dimensions from locally to globally, based on the 2D interaction map.
2.2.1 Training Details
We optimize ConvNCF with the BPR objective with minibatch Adagrad [Duchi et al.2011]
. Specifically, in each epoch, we first shuffle all observed interactions, and then get a minibatch in a sequential way; given the minibatch of observed interactions, we then generate negative examples on the fly to get the training triplets. The negative examples are randomly sampled from a uniform distribution; while recent efforts show that a better negative sampler can further improve the performance
[Ding et al.2018], we leave this exploration as future work. We pretrain the embedding layer with MF. After pretraining, considering that other parameters of ConvNCF are randomly initialized and the overall model is in a underfitting state, we train ConvNCF for epoch first without any regularization. For the following epochs, we enforce regularization on ConvNCF, including regularization on the embedding layer, convolution layers, and the output layer, respectively. Note that the regularization coefficients (especially for the output layer) have a very large impact on model performance.Gowalla  Yelp  

HR@  NDCG@  HR@  NDCG@  RI  
ItemPop  0.2003  0.2785  0.3739  0.1099  0.1350  0.1591  0.0710  0.1147  0.1732  0.0365  0.0505  0.0652  +227.6% 
MFBPR  0.6284  0.7480  0.8422  0.4825  0.5214  0.5454  0.1752  0.2817  0.4203  0.1104  0.1447  0.1796  +9.5% 
MLP  0.6359  0.7590  0.8535  0.4802  0.5202  0.5443  0.1766  0.2831  0.4203  0.1103  0.1446  0.1792  +9.2% 
JRL  0.6685  0.7747  0.8561  0.5270  0.5615  0.5821  0.1858  0.2922  0.4343  0.1177  0.1519  0.1877  +3.9% 
NeuMF  0.6744  0.7793  0.8602  0.5319  0.5660  0.5865  0.1881  0.2958  0.4385  0.1189  0.1536  0.1895  +3.0% 
ConvNCF  0.6914  0.7936  0.8695  0.5494  0.5826  0.6019  0.1978  0.3086  0.4430  0.1243  0.1600  0.1939   
3 Experiments
To comprehensively evaluate our proposed method, we conduct experiments to answer the following research questions:
 RQ1

Can our proposed ConvNCF outperform the stateoftheart recommendation methods?
 RQ2

Are the proposed outer product operation and the CNN layer helpful for learning from useritem interaction data and improving the recommendation performance?
 RQ3

How do the key hyperparameter in CNN (i.e., number of feature maps) affect ConvNCF’s performance?
3.1 Experimental Settings
Data Descriptions.
We conduct experiments on two publicly accessible datasets: Yelp^{4}^{4}4https://github.com/hexiangnan/sigir16eals and Gowalla^{5}^{5}5http://dawenl.github.io/data/gowalla_pro.zip.
Yelp. This is the Yelp Challenge data for user ratings on businesses. We filter the dataset following by [He et al.2016b]. Moreover, we merge the repetitive ratings at different timestamps to the earliest one, so as to study the performance of recommending novel items to a user. The final dataset obtains 25,815 users, 25,677 items, and 730,791 ratings.
Gowalla. This is the checkin dataset from Gowalla, a locationbased social network, constructed by [Liang et al.2016] for item recommendation. To ensure the quality of the dataset, we perform a modest filtering on the data, retaining users with at least two interactions and items with at least ten interactions. The final dataset contains 54,156 users, 52,400 items, and 1,249,703 interactions.
Evaluation Protocols.
For each user in the dataset, we holdout the latest one interaction as the testing positive sample, and then pair it with items that the user did not rate before as the negative samples. Each method then generates predictions for these useritem interactions. To evaluate the results, we adopt two metrics Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG), same as [He et al.2017]. HR@ is a recallbased metric, measuring whether the testing item is in the top position (1 for yes and 0 otherwise). NDCG@ assigns the higher scores to the items within the top positions of the ranking list. To eliminate the effect of random oscillation, we report the average scores of the last ten epochs after convergence.
Baselines.
To justify the effectiveness of our proposed ConvNCF, we study the performance of the following methods:
1. ItemPop ranks the items based on their popularity, which is calculated by the number of interactions. It is always taken as a benchmark for recommender algorithms.
2. MFBPR [Rendle et al.2009] optimizes the standard MF model with the pairwise BPR ranking loss.
3. MLP [He et al.2017] is a NCF method that concatenates user embedding and item embedding to feed to the standard MLP for learning the interaction function.
4. JRL [Zhang et al.2017] is a NCF method that places a MLP above the elementwise product of user embedding and item embedding. Its difference with GMF [He et al.2017] is that JRL uses multiple hidden layers above the elementwise product, while GMF directly outputs the prediction score.
5. NeuMF [He et al.2017] is the stateoftheart method for item recommendation, which combines hidden layer of GMF and MLP to learn the useritem interaction function.
Parameter Settings.
We implement our methods with Tensorflow, which is available at:
https://github.com/duxyme/ConvNCF. We randomly holdout 1 training interaction for each user as the validation set to tune hyperparameters. We evaluate ConvNCF of the specific setting as illustrated in Figure 2. The regularization coefficients are separately tuned for the embedding layer, convolution layers, and output layer in the range of . For a fair comparison, we set the embedding size as 64 for all models and optimize them with the same BPR loss using minibatch Adagrad (the learning rate is 0.05). For MLP, JRL and NeuMF that have multiple fully connected layers, we tuned the number of layers from 1 to 3 following the tower structure of [He et al.2017]. For all models besides MFBPR, we pretrain their embedding layers using the MFBPR, and the regularization for each method has been fairly tuned.3.2 Performance Comparison (RQ1)
Table 1 shows the Top recommendation performance on both datasets where is set to 5, 10, and 20. We have the following key observations:

[leftmargin=*]

ConvNCF achieves the best performance in general, and obtains high improvements over the stateoftheart methods. This justifies the utility of ONCF framework that uses outer product to obtain the 2D interaction map, and the efficacy of CNN in learning highorder correlations among embedding dimensions.

JRL consistently outperforms MLP by a large margin on both datasets. This indicates that, explicitly modeling the correlations of embedding dimensions is rather helpful for the learning of the following hidden layers, even for simple correlations that assume dimensions are independent of each other. Meanwhile, it reveals the practical difficulties to train MLP well, although it has strong representation ability in principle [Hornik1991].
3.3 Efficacy of Outer Product and CNN (RQ2)
Due to space limitation, for the blow two studies, we only show the results of NDCG, and the results of HR admit the same trend thus they are omitted.
Efficacy of Outer Product.
To show the effect of outer product, we replace it with the two common choices in existing solutions — concatenation (i.e., MLP) and elementwise product (i.e., GMF and JRL). We compare their performance with ConvNCF in each epoch in Figure 3. We observe that ConvNCF outperforms other methods by a large margin on both datasets, verifying the positive effect of using outer product above the embedding layer. Specifically, the improvements over GMF and JRL demonstrate that explicitly modeling the correlations between different embedding dimensions are useful. Lastly, the rather weak and unstable performance of MLP imply the difficulties to train MLP well, especially when the lowlevel has fewer semantics about the feature interactions. This is consistent with the recent finding of [He and Chua2017] in using MLP for sparse data prediction. .
Efficacy of CNN.
To make a fair comparison between CNN and MLP under our ONCF framework, we use MLP to learn from the same interaction map generated by outer product. Specifically, we first flatten the interaction as a dimensional vector, and then place a 3layer MLP above it. We term this method as ONCFmlp. Figure 4
compares its performance with ConvNCF in each epoch. We can see that ONCFmlp performs much worse than ConvNCF, in spite of the fact that it uses much more parameters (3 magnitudes) than ConvNCF. Another drawback of using such many parameters in ONCFmlp is that it makes the model rather unstable, which is evidenced by its large variance in epoch. In contrast, our ConvNCF achieves much better and stable performance by using the locally connected CNN. These empirical evidence provide support for our motivation of designing ConvNCF and our discussion of MLP’s drawbacks in Section
2.2.3.4 Hyperparameter Study (RQ3)
Impact of Feature Map Number.
The number of feature maps in each CNN layer affects the representation ability of our ConvNCF. Figure 5 shows the performance of ConvNCF with respect to different numbers of feature maps. We can see that all the curves increase steadily and finally achieve similar performance, though there are some slight differences on the convergence curve. This reflects the strong expressiveness and generalization of using CNN under the ONCF framework since dramatically increasing the number of parameters of a neural network does not lead to overfitting. Consequently, our model is very suitable for practical use.
4 Conclusion
We presented a new neural network framework for collaborative filtering, named ONCF. The special design of ONCF is the use of an outer product operation above the embedding layer, which results in a semanticrich interaction map that encodes pairwise correlations between embedding dimensions. This facilitates the following deep layers learning highorder correlations among embedding dimensions. To demonstrate this utility, we proposed a new model under the ONCF framework, named ConvNCF, which uses multiple convolution layers above the interaction map. Extensive experiments on two realworld datasets show that ConvNCF outperforms stateoftheart methods in top recommendation. In future, we will explore more advanced CNN models such as ResNet [He et al.2016a] and DenseNet [Huang et al.2017] to further explore the potentials of our ONCF framework. Moreover, we will extend ONCF to contentbased recommendation scenarios [Chen et al.2017, Yu et al.2018], where the item features have richer semantics than just an ID. Particularly, we are interested in building recommender systems for multimedia items like images and videos, and textual items like news.
5 Acknowledgments
This work is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its IRC@SG Funding Initiative, by the 973 Program of China under Project No.: 2014CB347600, by the Natural Science Foundation of China under Grant No.: 61732007, 61702300, 61501063, 61502094, and 61501064, by the Scientific Research Foundation of Science and Technology Department of Sichuan Province under Grant No. 2016JY0240, and by the Natural Science Foundation of Heilongjiang Province of China (No.F2016002). Jinhui Tang is the corresponding author.
References
 [Bai et al.2017] Ting Bai, JiRong Wen, Jun Zhang, and Wayne Xin Zhao. A neural collaborative filtering model with interactionbased neighborhood. In CIKM, pages 1979–1982, 2017.
 [Bayer et al.2017] Immanuel Bayer, Xiangnan He, Bhargav Kanagal, and Steffen Rendle. A generic coordinate descent framework for learning from implicit feedback. In WWW, pages 1341–1350, 2017.
 [Beutel et al.2018] Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H. Chi. Latent cross: Making use of context in recurrent recommender systems. In WSDM, pages 46–54, 2018.
 [Chen et al.2017] Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and TatSeng Chua. Attentive collaborative filtering: Multimedia recommendation with item and componentlevel attention. In SIGIR, pages 335–344, 2017.
 [Covington et al.2016] Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for youtube recommendations. In RecSys, pages 191–198, 2016.
 [Ding et al.2018] Jingtao Ding, Fuli Feng, Xiangnan He, Guanghui Yu, Yong Li, and Depeng Jin. An improved sampler for bayesian personalized ranking by leveraging view data. In WWW, pages 13–14, 2018.

[Duchi et al.2011]
John Duchi, Elad Hazan, and Yoram Singer.
Adaptive subgradient methods for online learning and stochastic
optimization.
Journal of Machine Learning Research
, 12(Jul):2121–2159, 2011.  [GarciaMolina et al.2011] Hector GarciaMolina, Georgia Koutrika, and Aditya Parameswaran. Information seeking: convergence of search, recommendations, and advertising. Communications of the ACM, 54(11):121–130, 2011.
 [He and Chua2017] Xiangnan He and TatSeng Chua. Neural factorization machines for sparse predictive analytics. In SIGIR, pages 355–364, 2017.
 [He et al.2016a] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
 [He et al.2016b] Xiangnan He, Hanwang Zhang, MinYen Kan, and TatSeng Chua. Fast matrix factorization for online recommendation with implicit feedback. In SIGIR, pages 549–558, 2016.
 [He et al.2017] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and TatSeng Chua. Neural collaborative filtering. In WWW, pages 173–182, 2017.
 [He et al.2018] Xiangnan He, Zhankui He, Xiaoyu Du, and TatSeng Chua. Adversarial personalized ranking for item recommendation. In SIGIR, 2018.
 [Hornik1991] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.
 [Huang et al.2017] Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. Densely connected convolutional networks. In CVPR, pages 4700–4708, 2017.
 [Liang et al.2016] Dawen Liang, Laurent Charlin, James McInerney, and David M Blei. Modeling user exposure in recommendation. In WWW, pages 951–961, 2016.
 [Rendle et al.2009] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars SchmidtThieme. Bpr: Bayesian personalized ranking from implicit feedback. In UAI, pages 452–461, 2009.
 [Tay et al.2018] Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. Latent relational metric learning via memorybased attention for collaborative ranking. In WWW, pages 729–739, 2018.
 [Wang et al.2015] Suhang Wang, Jiliang Tang, Yilin Wang, and Huan Liu. Exploring implicit hierarchical structures for recommender systems. In IJCAI, pages 1813–1819, 2015.
 [Wang et al.2017] Xiang Wang, Xiangnan He, Liqiang Nie, and TatSeng Chua. Item silk road: Recommending items from information domains to social users. In SIGIR, pages 185–194, 2017.
 [Wang et al.2018a] Xiang Wang, Xiangnan He, Fuli Feng, Liqiang Nie, and TatSeng Chua. Tem: Treeenhanced embedding model for explainable recommendation. In WWW, pages 1543–1552, 2018.
 [Wang et al.2018b] Zihan Wang, Ziheng Jiang, Zhaochun Ren, Jiliang Tang, and Dawei Yin. A pathconstrained framework for discriminating substitutable and complementary products in ecommerce. In WSDM, pages 619–627, 2018.
 [Xue et al.2017] HongJian Xue, Xinyu Dai, Jianbing Zhang, Shujian Huang, and Jiajun Chen. Deep matrix factorization models for recommender systems. In IJCAI, pages 3203–3209, 2017.
 [Yu et al.2018] Wenhui Yu, Huidi Zhang, Xiangnan He, Xu Chen, Li Xiong, and Zheng Qin. Aestheticbased clothing recommendation. In WWW, pages 649–658, 2018.

[Zhang et al.2014]
Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping Ma.
Explicit factor models for explainable recommendation based on phraselevel sentiment analysis.
In SIGIR, pages 83–92, 2014.  [Zhang et al.2016] Hanwang Zhang, Fumin Shen, Wei Liu, Xiangnan He, Huanbo Luan, and TatSeng Chua. Discrete collaborative filtering. In SIGIR, pages 325–334, 2016.
 [Zhang et al.2017] Yongfeng Zhang, Qingyao Ai, Xu Chen, and W Bruce Croft. Joint representation learning for topn recommendation with heterogeneous information sources. In CIKM, pages 1449–1458, 2017.
Comments
There are no comments yet.