1. Introduction
Recommender systems have become increasingly indispensable. Applications include top recommendations, which are widely adopted to recommend users ranked lists of items. For ecommerce, typically only a few recommendations are shown to the user each time and recommender systems are often evaluated based on the performance of the top recommendations.
Collaborative Filtering (CF) based methods are a fundamental building block in many recommender systems. CF based recommender systems predict what items a user will prefer by discovering and exploiting similarity patterns across users and items. The performance of CFbased methods often drops significantly when ratings are very sparse. With the increased availability of socalled side information, that is, additional information associated with items such as product reviews, movie plots, etc., there is great interest in taking advantage of such information so as to compensate for the sparsity of ratings.
Existing methods utilizing side information are linear models (Ning and Karypis, 2012), which have a restricted model capacity. A growing body of work generalizes linear model by deep learning to explore nonlinearities for largescale recommendations (He et al., 2017; Sedhain et al., 2015; Wu et al., 2016; Zheng et al., 2016). Stateoftheart performance is achieved by applying Variational Autoencoders (VAEs) (Kingma and Welling, 2013) for CF (Li and She, 2017; Liang et al., 2018; Lee et al., 2017). These deep models learn item representations from side information. Thus, the dimension of side information determines the input dimension of the network, which dominates the overall size of the model. This is problematic since side information is generally highdimensional (Chen et al., 2017). As shown in our experiments, existing deep models fail to beat linear models due to the highdimensionality of side information and an insufficient number of samples.
To avoid the impact from the highdimensionality while taking the effectiveness of VAE, we propose to learn feature representations from side information. In this way, the dimensions of the side information correspond to the number of samples rather than the input dimension of deep network. To instantiate this idea, in this paper, we propose collective Variational Autoencoder (cVAE), which learns to recover user ratings and side information simultaneously through VAE. While user ratings and side information are different sources of information, both are information associated with items. Thus, we take ratings from each user and each dimension of side information over all items as the input for VAE, so that samples from both sources of information have the same dimensionality (number of items). We can then feed ratings and side information into the same inference network and generation network. cVAE complements the sparse ratings with side information, as feeding side information into the same VAE increases the number of samples for training. The highdimensionality of side information is not a problem for cVAE, as it increases the sample size rather than the network scale. To account for the heterogeneity of user rating and side information, the final layer of the generation network follows different distributions depending on the type of information. Training a VAE by feeding it side information as input acts like a pretraining step, which is a crucial step for developing a robust deep network. Our experiments show that the proposed model,cVAE, achieves stateoftheart performance for top recommendation with side information.
2. Preliminaries
2.1. Notation
We introduce relevant notation in this section. We use , and to denote the number of users, items and the dimension of side information, respectively. We study the problem of top recommendation with highdimensional side information, where . We write for the matrix for side information and for user ratings. We summarize our notation in Table 1.
Notation  Description 

number of users  
number of items  
dimension of side information  
dimension of latent item representation  
number of recommended items  
matrix of side information  
matrix of user rating  
matrix of latent user representation  
matrix of latent item representation  
matrix of latent feature representation  
hidden layer of inference network  
hidden layer of generation network  
the mean of latent input representation  
the variance of latent user or feature representation 

nonlinear transformation of inference network 

nonlinear transformation of generation network  
the activation function to get 

the activation function to get  
the sigmoid function 
2.2. Linear models for top recommendation
Sparse LInear Method (SLIM) (Ning and Karypis, 2011) achieves stateoftheart performance for top recommendation. SLIM learns to reproduce the user rating marix through:
Here, is the coefficient matrix, which is analogous to the item similarity matrix. The performance of SLIM is heavily affected by the rating sparsity (Kabbur et al., 2013). Side information ha been utilized to overcome this issue (Ning and Karypis, 2012; Zhao et al., 2016; Chen et al., 2017). As a typical example of a method that uses side information, collective SLIM (cSLIM) learns from both user rating and side information. Specifically, are both reproduced through:
cSLIM learns the coefficient matrix collectively from both side information and user rating , a strategy that can help to overcome rating sparsity by side information. However, cSLIM is restricted by the fact that it is a linear model, which has limited model capacity.
2.3. Autoencoders for collaborative filtering
Recently, autoencoders have been used to address CF problems (Sedhain et al., 2015; Strub et al., 2016; Wu et al., 2016; Zhuang et al., 2017)
. Autoencoders are neural networks popularized by
Kramer (1991). They are unsupervised networks where the output of the network aims to be a reconstruction of the input.In the context of CF, the autoencoder is fed with incomplete rows (resp. columns) of the user rating matrix
. It then outputs a vector that predicts the missing entries. These approaches perform a nonlinear lowrank approximation of
in two different ways, using a Userside Autoencoder (UAE) (Figure 2(a)) or Itemside Autoencoder (IAE) (Figure 2(b)), which recover respectively through:and
where is the user representation and is the item representation. Moreover, and are the encode network and decode network, respectively. UAEs encode to learn a user latent representation and then recover from . In contrast, IAEs encode the transpose of to learn item latent representation and then recover the transpose of from . Note that UAEs work in a similar way as SLIM, as both can be viewed as reproducing through , which also captures item similarities.
When side information associated with items is available, the Featureside Autoencoder (FAE) is utilized to learn item representations:
where is the item representation. Existing hybrid methods incorporate FAE with IAE
as both learn item representations. However, this way of incorporating side information needs to estimate two separate
VAEs, which is not an effective way to address rating sparsity. They are also vulnerable to the high dimensionality of side information.3. Method
In this section, we propose a new way to incorporate side information with user ratings by combining the effectiveness of both cSLIM and autoencoders. We propose to reproduce by a FAE and by a UAE. In this way, the input for autoencoders of both and are of the same dimension, i.e., the number of items . Thus, we can feed and into the same autoencoder rather than two different autoencoders, which helps to overcome rating sparsity.
3.1. Collective variational autoencoder
We propose a collective Variational Autoencoder (cVAE) to generalize the linear models for top recommendation with side information to nonlinear models, by taking advantage of Variational Autoencoders (VAEs). Specifically, we propose to recover through
where and correspond to the inference network and generation network parameterized by and , respectively. An overview of cVAE is depicted in Figure 1. Unlike previous work utilizing VAEs, the proposed model encodes and decodes user rating and side information through the same inference and generation networks. Our model can be viewed as a nonlinear generalization of cSLIM, so as to learn item similarities collectively from user ratings and side information. While user ratings and side information are two different types of information, cSLIM fails to distinguish them. In contrast, cVAE assumes the output of the generation network to follow different distributions according to the type of input it has been fed.
Next, we describe the cVAE model in detail. Following common practice for VAE, we first assume the latent variables and
to follow a Gaussian distribution:
where
is an identity matrix. While
and are fed into the same network, we would like to distinguish them via different distributions. In this paper, we assume thatis binarized to capture implicit feedback, which is a common setting for top
recommendation (Ning and Karypis, 2011). Thus we follow Lee et al. (2017) and assume that the rating of userover all items follows a Bernoulli distribution:
where
is the sigmoid function. This defines the loss function when feeding user rating as input, i.e., the logistic loglikelihood for user
:(1) 
where is the th element of the vector and is normalized through a sigmoid function so that is within .
For side information, we study numerical features so that we assume the th dimension of side information from all items follows a Gaussian distribution:
This defines the loss function when feeding side information as input, i.e., the Gaussian loglikelihood for dimension :
(2) 
where is the th element of vector . Note that although we assume and to be generated from and respectively, the generation has shared parameters .
The generation procedure is summarized as follows:

for each user :

draw ;

draw


for each dimension of side information :

draw ;

draw .

Once the cVAE is trained, we can generate recommendations for each user with items ranked in descending order of . Here, is calculated as , that is, we take the mean of for prediction.
Next, we discuss how to perform inference for cVAE.
3.2. Variational inference
The loglikelihood of cVAE is intractable due to the nonlinear transformations of the generation network. Thus, we resort to variational inference to approximate the distribution. Variational inference approximates the true intractable posterior with a simpler variational distribution . We follow the meanfield assumption (Xing et al., 2002) by setting to be a fully factorized Gaussian distribution:
While we can optimize by minimizing the KullbackLeiber divergence , the number of parameters to learn grows with the number of users and dimensions of side information. This can become a bottleneck for realworld recommender systems with millions of users and highdimensional side information. The VAE replaces individual variational parameters with a datadependent function through an inference network parameterized by , i.e., , where and are generated as:
Putting together and with and forms the proposed cVAE (Figure 1).
We follow to derive the Evidence Lower Bound (ELBO):
(3) 
We use a Monte Carlo gradient estimator (Paisley et al., 2012) to infer the expectation in Equation (3). We draw samples of and from and perform stochastic gradient ascent to optimize the ELBO. In order to take gradients with respect to through sampling, we follow the reparameterization trick (Kingma and Welling, 2013) to sample and as:
As the divergence can be analytically derived (Kingma and Welling, 2013), we can then rewrite as:
(4) 
We then maximize ELBO given in Equation (4) to learn and .
3.3. Implementation details
We discuss the implementation of cVAE in detail. As we feed the user rating matrix and the item side information through the same input layer with neurons, we need to ensure that the input from both types of information are of the same format. In this paper, we assume that user ratings are binarized to capture implicit feedback and that side information is represented as a bagofwords. We propose to train cVAEs through a twophase algorithm. We first feed it side information to train, which works as pretraining. We then refine the VAE by feeding user ratings. We follow the typical setting by taking as a
MultiLayer Perceptron
(MLP); is also taken to be a MLP of the identical network structure with . We also introduce two parameters, i.e., and , to extend the model and make it more suitable for the recommendation task.3.3.1. Parameter
3.3.2. Parameter
We can adopt different perspectives about the ELBO derived in Equation (3) as: the first term can be interpreted as the reconstruction error, while the second term can be viewed as regularization. The ELBO is often overregularized for recommendation tasks (Liang et al., 2018). Therefore, a parameter is introduced to control the strength of regularization, so that the ELBO becomes:
We propose to train the cVAE in two phases. We first pretrain the cVAE by feeding it side information only. We then refine the model by feeding it user ratings. While Liang et al. (2018) suggests to set small to avoid overregularization, we opt for a larger value for during refinement, for two reasons: (1) the model is effectively pretrained with side information; it would be reasonable to require the posterior to comply more with this prior; and (2) refinement with user ratings can easily overfit due to the sparsity of ratings; it would be reasonable to regularize heavier so as to avoid overfitting.
4. Experiments
4.1. Experimental setup
4.1.1. Dataset
We conduct experiments on two datasets, Games and Sports, constructed from different categories of Amazon products (McAuley and Leskovec, 2013). For each category, the original dataset contains transactions between users and items, indicating implicit user feedback. The statistics of the datasets are presented in Table 2. We use the product reviews as item featured. We extract unigram features from the review articles and remove stopwords. We represent each product item as a bagofwords feature vector.
Dataset  #User  #Item  #Rating  #Dimension  #Feature 

Games  5,195  7,163  96,316  20,609  5,151,174 
Sports  5,653  11,944  86,149  31,282  3,631,243 
4.1.2. Methods for comparison
We contrast the performance of cVAE with that of existing existing VAEbased methods for CF: cfVAE (Li and She, 2017) and rVAE (Liang et al., 2018). Note that the performance of cfVAE will be affected greatly by the highdimensionality of side information. Besides, as cfVAE is designed originally for the rating prediction task, the recommendations provided by cfVAE will be less effective. While rVAE is effective for top recommendation, it suffers from rating sparsity as side information is not utilized.
We also compare with the stateoftheart linear model for top recommendation with side information, i.e., cSLIM (Ning and Karypis, 2012). By comparing with cSLIM, we can evaluate the capacity of cVAE as it can be regarded as a deep extension of cSLIM. We also compare with fVAE, which is the pretrained model of cVAE with side information only. Note that cVAE is the refinement over fVAE by user rating.
For all the VAEbased methods, we follow Kingma and Welling (2013) to set the batch size as 100 so that we can set . We choose a twolayer network architecture for the inference network and generation network. For cfVAE and rVAE, the scale is 200100 for inference network and 100200 for generation network. For fVAE and cVAE, the scale is 1000100 and 1001000, respectively. The reason that the network scale for cfVAE and rVAE is relatively smaller is that (1) the input for cfVAE is highdimensional with relatively fewer samples; and (2) the input for rVAE is sparse, which easily overfits for larger network scale. In comparison, we can select more hidden neurons for fVAE as it takes each dimension of the features over all items as input, so that the input for the network has relatively fewer dimensions and the number of samples is sufficient. This is similar with cVAE, which uses side information to overcome rating sparsity.
4.1.3. Evaluation method
To evaluate the performance of top recommendation, we split the user rating matrix into and , respectively, for training the model, selecting parameters and testing the recommendation accuracy. Specifically, for each user, we randomly hold 10% of the ratings in the validation set and 10% in the test set and put the other ratings in the training set. For each user, the unrated items are sorted in decreasing order according to the predicted score and the first items are returned as the top recommendations for that user.
Given the list of top recommended items for user , Precision at (Pre@N) and Recall at (Rec@N) are defined as
Average precision at (AP@N) is a ranked precision metric that gives larger credit to correctly recommended items in the top ranks. AP@N is defined as the average of precisions computed at all positions with an adopted item, namely
where Pre@k is the precision at cutoff in the top recommended list. Here, is an indicator function
Mean average precision at (MAP@N) is defined as the mean of the AP scores for all users. Following Wu et al. (2016), the list of recommended items is evaluated with using Rec@N and MAP@N.
4.2. Experimental results
4.2.1. Parameter selection
To compare the performance of alternative top recommendation methods, we first select parameters for all the methods through validation. Specifically, for cSLIM, we select and from , , , , , , . For cfVAE, we select from and from . For rVAE and fVAE, we select from and from . For cVAE, we select from and from . Note that we tune with larger values to possibly regularize heavier during the refinement.
The result of parameter selection is shown in Table 3.
Method  Parameters  

Games  Sports  
cSLIM  
cfVAE  
rVAE  
fVAE  
cVAE 
4.2.2. Performance comparison
We present the results in terms of Rec@N and MAP@N in Table 4, where is respectively set as
. We show the best score in boldface. We attach asterisks to the best score if the improvement over the second best score is statistically significant; to this end, we conducted twosided tests for the null hypothesis that
cVAE and the second best have identical average values; we use one asterisk if and two asterisks if .Rec@5  Rec@10  Rec@15  Rec@20  MAP@5  MAP@10  MAP@15  MAP@20  

Method  Games  
cSLIM  0.0761  0.1162  0.1474  0.1734  0.0590  0.0337  0.0240  0.0188 
cfVAE  0.0685  0.1065  0.1359  0.1608  0.0519  0.0298  0.0212  0.0165 
rVAE  0.0137  0.0206  0.0270  0.0375  0.0106  0.0060  0.0043  0.0034 
fVAE  0.0495  0.0796  0.1072  0.1276  0.0390  0.0230  0.0167  0.0131 
cVAE  0.0858*  0.1376**  0.1731**  0.2081**  0.0668*  0.0394**  0.0279**  0.0218** 
Sports  
cSLIM  0.0419  0.0622  0.0776  0.0911  0.0263  0.0148  0.0104  0.0080 
cfVAE  0.0315  0.0512  0.0639  0.0768  0.0206  0.0119  0.0084  0.0065 
rVAE  0.0171  0.0249  0.0328  0.0390  0.0109  0.0062  0.0044  0.0034 
fVAE  0.0284  0.0437  0.0602  0.0732  0.0190  0.0109  0.0078  0.0061 
cVAE  0.0441  0.0655  0.0857*  0.1035*  0.0268  0.0152  0.0107  0.0084 
As shown in Table 4, cVAE outperforms other methods according to all metrics and on both datasets. The improvement is also significant in many settings. A general trend is revealed that the significance of improvements become more evident when gets larger. Note that the other three methods utilizing VAE are less effective with highdimensional side information. Actually, they even fail to beat linear models. In contrast, cVAE improves over cSLIM by using VAE for nonlinear lowrank approximation. This demonstrates the effectiveness of our proposed cVAE model.
Specifically, on the Games dataset, cVAE shows significant improvements over the stateoftheart methods. Apart from cVAE, cfVAE provides the best recommendation among all VAEbased CF methods, although it fails to beat cSLIM. This is followed by fVAE, which utilizes side information only. rVAE performs the worst, due to the rating sparsity.
On the Sports dataset, significant improvements can only be observed for Rec@15 and Rec@20. The results yield an interesting insight. If we look at the parameter selection for cSLIM, we can see that is set to 0, which means cSLIM performs the best recommendation when no side information is utilized. This does not necessarily mean that the side information of Sports is useless for recommendation. Actually, fVAE provides acceptable recommendations by utilizing side information only. Therefore, the way of incorporating side information by cSLIM is not effective. In comparison, cVAE improves over cSLIM by utilizing side information.
4.2.3. Effect of the number of recommended items
Table 4 reveals a possible trend that the recommendation improvement becomes more evident when more items are recommended. We use Figure 3 to illustrate this, where is increased from 5 to 1000.
As depicted in Figure 3(a), the gaps between cVAE and other methods is getting larger with the growth of . It is interesting to note that fVAE surpasses cfVAE when and . This further demonstrates the effectiveness of a pretrain phase with side information proposed in this model.
In Figure 3(c), both fVAE and cfVAE outperform cSLIM when , and fVAE outperforms cfVAE when . This shows that deep models are superior to linear models when more items are recommended. In comparison, the improvement achieved by cVAE is more evident when , and the gap between cVAE and the second best method is always substantial.
On the other hand, the performance w.r.t. MAP@N does not show big differences when grows. Note that on the Games dataset (Figure 3(b)), cVAE performs much better than cSLIM when is small. The improvement becomes less evident when grows.
5. Related Work
We review related work on linear models for top recommendation with side information and on deep models for collaborative filtering.
5.1. Top recommendation with side information
Various methods have been developed to incorporate side information in recommender systems. Most of these methods have been developed in the context of the rating prediction problem, whereas the top recommendation problem has received less attention. In the rest of this section we only review methods addressing top recommendation problems.
Ning and Karypis (2012) propose several methods to incorporate side information with SLIM (Ning and Karypis, 2011). Among all these methods, cSLIM generally achieves the best performance as it can well compensate sparse ratings with side information. Zhao et al. (2016); Zhao and Guo (2017) proposed a joint model to combine selfrecovery for user rating and predication from side information. Side information is also utilized to address coldstart top recommendation. Elbadrawy and Karypis (2015) learn feature weights for calculating item similarities. Sharma et al. (2015) further improve over (Elbadrawy and Karypis, 2015) by studying feature interactions. While these methods generate the stateoftheart performance for top recommendation, they are all linear models, which have restricted model capacity.
5.2. Deep learning for hybrid recommendation
Several authors have attempted to combine deep learning with collaborative filtering. Wu et al. (2016)
utilize a denoising autoencoder to encode ratings and recover the score prediction.
Zhuang et al. (2017) propose a dualautoencoder to learn representations for both users and items. He et al. (2017) generalize matrix factorization for collaborative filtering by a neural network. These methods utilize user ratings only, that is, side information is not utilized. Wang et al. (2015) propose stacked denoising autoencoders to learn item representations from side information and form a collaborative deep learning method. Later, Li et al. (2015) reduce the computational cost of training by replacing stacked denoising autoencoders by a marginalized denoising autoencoder. Rather than manually corrupt input, variational autoencoders were later utilized for representation learning (Li and She, 2017). These models achieve stateoftheart performance among hybrid recommender systems, but they are less effective when side information is highdimensional. For more discussions on deep learning based recommender systems, we refer to a recent survey (Zhang et al., 2017).6. Conclusion
In this paper, we have proposed an alternative way to feed side information to neural network so as to overcome the highdimensionality. We propose collective Variational Autoencoder (cVAE), which can be regarded as the nonlinear generalization of cSLIM. cVAE overcomes rating sparsity by feeding both ratings and side information into the same inference network and generation network. To cater for the heterogeneity of information (rating and side information), we assume different sources of information to follow different distributions, which is reflected in the use of different loss function. As for the implementation, we introduce a parameter to balance the positive samples and negative samples. We also introduce as the parameter for regularization, which controls how much the latent variable should be complied with the prior distribution. We conduct experiments over Amazon datasets. The results show the superiority of cVAE over other methods under the scenario with highdimensional side information.
In conclusion, deep models are effective as long as the number of inputs are sufficient. Thus, using side information to pretrain cVAE helps to overcome the highdimensionality. A general ruleofthumb is, regularizing cVAE lightly during pretrain and heavily during the refinement of training.
References
 (1)
 Chen et al. (2017) Yifan Chen, Xiang Zhao, and Maarten de Rijke. 2017. TopN Recommendation with HighDimensional Side Information via Locality Preserving Projection. In SIGIR. ACM, 985–988.
 Elbadrawy and Karypis (2015) Asmaa Elbadrawy and George Karypis. 2015. UserSpecific FeatureBased Similarity Models for Topn Recommendation of New Items. TIST 6, 3 (2015), 33:1–33:20.
 He et al. (2017) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and TatSeng Chua. 2017. Neural Collaborative Filtering. In WWW. 173–182.
 Kabbur et al. (2013) Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: factored item similarity models for topN recommender systems. In SIGKDD. ACM, 659–667.
 Kingma and Welling (2013) Diederik P. Kingma and Max Welling. 2013. AutoEncoding Variational Bayes. CoRR abs/1312.6114 (2013).

Kramer (1991)
Mark A Kramer.
1991.
Nonlinear principal component analysis using autoassociative neural networks.
AIChE journal 37, 2 (1991), 233–243.  Lee et al. (2017) Wonsung Lee, Kyungwoo Song, and IlChul Moon. 2017. Augmented Variational Autoencoders for Collaborative Filtering with Auxiliary Information. In CIKM. ACM, 1139–1148.
 Li et al. (2015) Sheng Li, Jaya Kawale, and Yun Fu. 2015. Deep Collaborative Filtering via Marginalized Denoising Autoencoder. In CIKM. ACM, 811–820.
 Li and She (2017) Xiaopeng Li and James She. 2017. Collaborative Variational Autoencoder for Recommender Systems. In SIGKDD. ACM, 305–314.
 Liang et al. (2018) Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In WWW. ACM, 689–698.
 McAuley and Leskovec (2013) Julian J. McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In RecSys. ACM, 165–172.
 Ning and Karypis (2011) Xia Ning and George Karypis. 2011. SLIM: Sparse Linear Methods for TopN Recommender Systems. In ICDM. IEEE, 497–506.
 Ning and Karypis (2012) Xia Ning and George Karypis. 2012. Sparse linear methods with side information for topn recommendations. In RecSys. ACM, 155–162.

Paisley
et al. (2012)
John William Paisley,
David M. Blei, and Michael I. Jordan.
2012.
Variational Bayesian Inference with Stochastic Search. In
ICML. JMLR.  Sedhain et al. (2015) Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. AutoRec: Autoencoders Meet Collaborative Filtering. In WWW. ACM, 111–112.
 Sharma et al. (2015) Mohit Sharma, Jiayu Zhou, Junling Hu, and George Karypis. 2015. Featurebased factorized Bilinear Similarity Model for ColdStart Topn Item Recommendation. In SDM. SIAM, 190–198.
 Strub et al. (2016) Florian Strub, Romaric Gaudel, and Jérémie Mary. 2016. Hybrid Recommender System based on Autoencoders. In DLRS. ACM, 11–16.
 Wang et al. (2015) Hao Wang, Naiyan Wang, and DitYan Yeung. 2015. Collaborative Deep Learning for Recommender Systems. In SIGKDD. ACM, 1235–1244.
 Wu et al. (2016) Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collaborative Denoising AutoEncoders for TopN Recommender Systems. In WSDM. ACM, 153–162.
 Xing et al. (2002) Eric P Xing, Michael I Jordan, and Stuart Russell. 2002. A generalized mean field algorithm for variational inference in exponential families. In UAI. Morgan Kaufmann Publishers Inc., 583–591.
 Zhang et al. (2017) Shuai Zhang, Lina Yao, and Aixin Sun. 2017. Deep Learning based Recommender System: A Survey and New Perspectives. CoRR abs/1707.07435 (2017).
 Zhao and Guo (2017) Feipeng Zhao and Yuhong Guo. 2017. Learning Discriminative Recommendation Systems with Side Information. In IJCAI. 3469–3475.
 Zhao et al. (2016) Feipeng Zhao, Min Xiao, and Yuhong Guo. 2016. Predictive Collaborative Filtering with Side Information. In IJCAI. 2385–2391.
 Zheng et al. (2016) Yin Zheng, Bangsheng Tang, Wenkui Ding, and Hanning Zhou. 2016. A Neural Autoregressive Approach to Collaborative Filtering. In ICML, Vol. 48. JMLR, 764–773.
 Zhuang et al. (2017) Fuzhen Zhuang, Zhiqiang Zhang, Mingda Qian, Chuan Shi, Xing Xie, and Qing He. 2017. Representation learning via DualAutoencoder for recommendation. Neural Networks 90 (2017), 83–89.
Comments
There are no comments yet.