1. Introduction
With the advance of Internet technology, people can access a vast amount of online content, such as news (Zheng et al., 2018), movies (Diao et al., 2014), and commodities (Zhou et al., 2018). A notorious problem with online platforms is that the volume of items can be overwhelming to users. To alleviate the impact of information overloading, recommender systems (RS) is proposed to search for and recommend a small set of items to meet users’ personalized interests.
A traditional recommendation technique is collaborative filtering (CF), which assigns users and items IDbased representation vectors, then models their interactions by specific operation such as inner product
(Wang et al., 2017b)(He et al., 2017). However, CFbased methods usually suffer from sparsity of useritem interactions and the cold start problem. To address these limitations, researchers usually turn to featurerich scenarios, where attributes of users and items are used to compensate for the sparsity and improve the performance of recommendation (Cheng et al., 2016; Wang et al., 2018a).A few recent studies (Yu et al., 2014; Zhang et al., 2016; Zhao et al., 2017; Wang et al., 2018c; Huang et al., 2018; Wang et al., 2018b) have gone a step further than simply using attributes: They point out that attributes are not isolated but linked up with each other, which forms a knowledge graph (KG). Typically, a KG is a directed heterogeneous graph in which nodes correspond to entities (items or item attributes) and edges correspond to relations. Compared with KGfree methods, incorporating KG into recommendation benefits the results in three ways (Wang et al., 2018b): (1) The rich semantic relatedness among items in a KG can help explore their latent connections and improve the precision of results; (2) The various types of relations in a KG are helpful for extending a user’s interests reasonably and increasing the diversity of recommended items; (3) KG connects a user’s historicallyliked and recommended items, thereby bringing explainability to recommender systems.
Despite the above benefits, utilizing KG in RS is rather challenging due to its high dimensionality and heterogeneity. One feasible way is to preprocess the KG by knowledge graph embedding (KGE) methods (Wang et al., 2017a), which map entities and relations to lowdimensional representation vectors (Zhang et al., 2016; Wang et al., 2018c; Huang et al., 2018). However, commonlyused KGE methods focus on modeling rigorous semantic relatedness (e.g., TransE (Bordes et al., 2013) and TransR (Lin et al., 2015) assume ), which are more suitable for ingraph applications such as KG completion and link prediction rather than recommendation. A more natural and intuitive way is to design a graph algorithm directly to exploit the KG structure (Yu et al., 2014; Zhao et al., 2017; Wang et al., 2018b). For example, PER (Yu et al., 2014) and FMG (Zhao et al., 2017) treat KG as a heterogeneous information network, and extract metapath/metagraph based latent features to represent the connectivity between users and items along different types of relation paths/graphs. However, PER and FMG rely heavily on manually designed metapaths or metagraphs, which are hardly to be optimal in reality. RippleNet (Wang et al., 2018b) is a memorynetworklike model that propagates users’ potential preferences in the KG and explores their hierarchical interests. But note that the importance of relations is weakly characterized in RippleNet, because the embedding matrix of a relation can hardly be trained to capture the sense of importance in the quadratic form ( and are embedding vectors of two entities). In addition, the size of ripple set may go unpredictably with the increase of the size of KG, which incurs heavy computation and storage overhead.
In this paper, we investigate the problem of KGaware recommendation. Our design objective is to automatically capture both highorder structure and semantic information in the KG. Inspired by graph convolutional networks (GCN)^{1}^{1}1We will revisit GCN in related work. that try to generalize convolution to the graph domain, we propose Knowledge Graph Convolutional Networks (KGCN) for recommender systems. The key idea of KGCN is to aggregate and incorporate neighborhood information with bias when calculating the representation of a given entity in the KG. Such a design has two advantages: (1) Through the operation of neighborhood aggregation, the local proximity structure is successfully captured and stored in each entity. (2) Neighbors are weighted by scores dependent on the connecting relation and specific user, which characterizes both the semantic information of KG and users’ personalized interests in relations. Note that the size of an entity’s neighbors varies and may be prohibitively large in the worst case. Therefore, we sample a fixedsize neighborhood of each node as the receptive field, which makes the cost of KGCN predictable. The definition of neighborhood for a given entity can also be extended hierarchically to multiple hops away to model highorder entity dependencies and capture users’ potential longdistance interests.
Empirically, we apply KGCN to three datasets: MovieLens20M (movie), BookCrossing (book), and Last.FM (music). The experiment results show that KGCN achieves average AUC gains of , , and in movie, book, and music recommendations, respectively, compared with stateoftheart baselines for recommendation.
Our contribution in this paper are summarized as follows:

We propose knowledge graph convolutional networks, an endtoend framework that explores users’ preferences on the knowledge graph for recommender systems. By extending the receptive field of each entity in the KG, KGCN is able to capture users’ highorder personalized interests.

We conduct experiments on three realworld recommendation scenarios. The results demonstrate the efficacy of KGCNLS over stateoftheart baselines.

We release the code of KGCN and datasets (knowledge graphs) to researchers for validating the reported results and conducting further research. The code and the data are available at https://github.com/hwwang55/KGCN.
2. Related Work
Our method is conceptually inspired by GCN. In general, GCN can be categorized as spectral methods and nonspectral methods. Spectral methods represent graphs and perform convolution in the spectral space. For example, Bruna et al. (Bruna et al., 2014) define the convolution in Fourier domain and calculates the eigendecomposition of the graph Laplacian, Defferrard et al. (Defferrard et al., 2016) approximate the convolutional filters by Chebyshev expansion of the graph Laplacian, and Kipf et al. (Kipf and Welling, 2017) propose a convolutional architecture via a localized firstorder approximation of spectral graph convolutions. In contrast, nonspectral methods operate on the original graph directly and define convolution for groups of nodes. To handle the neighborhoods with varying size and maintain the weight sharing property of CNN, researchers propose learning a weight matrix for each node degree (Duvenaud et al., 2015), extracting locally connected regions from graphs (Niepert et al., 2016), or sampling a fixedsize set of neighbors as the support size (Hamilton et al., 2017). Our work can be seen as a nonspectral method for a special type of graphs (i.e., knowledge graph).
Our method also connects to PinSage (Ying et al., 2018) and GAT (Velickovic et al., 2018). But note that both PinSage and GAT are designed for homogeneous graphs. The major difference between our work and the literature is that we offer a new perspective for recommender systems with the assistance of a heterogeneous knowledge graph.
3. Knowledge Graph Convolutional Networks
In this section, we introduce the proposed KGCN model. We first formulate the knowledgegraphaware recommendation problem. Then we present the design of a single layer of KGCN. At last, we introduce the complete learning algorithm for KGCN, as well as its minibatach implementation.
3.1. Problem Formulation
We formulate the knowledgegraphaware recommendation problem as follows. In a typical recommendation scenario, we have a set of users and a set of items . The useritem interaction matrix is defined according to users’ implicit feedback, where indicates that user engages with item , such as clicking, browsing, or purchasing; otherwise . Additionally, we also have a knowledge graph , which is comprised of entityrelationentity triples . Here , , and denote the head, relation, and tail of a knowledge triple, and are the set of entities and relations in the knowledge graph, respectively. For example, the triple (A Song of Ice and Fire, book.book.author, George Martin) states the fact that George Martin writes the book “A Song of Ice and Fire”. In many recommendation scenarios, an item corresponds to one entity . For example, in book recommendation, the item “A Song of Ice and Fire” also appears in the knowledge graph as an entity with the same name.
Given the useritem interaction matrix as well as the knowledge graph , we aim to predict whether user has potential interest in item with which he has had no interaction before. Our goal is to learn a prediction function , where
denotes the probability that user
will engage with item , and denotes the model parameters of function .3.2. KGCN Layer
KGCN is proposed to capture highorder structural proximity among entities in a knowledge graph. We start by describing a single KGCN layer in this subsection. Consider a candidate pair of user and item (entity) . We use to denote the set of entities directly connected to ,^{2}^{2}2The knowledge graph is treated undirected. and to denote the relation between entity and . We also use a function (e.g., inner product) to compute the score between a user and a relation:
(1) 
where and are the representations of user and relation , respectively, is the dimension of representations. In general, characterizes the importance of relation to user . For example, a user may have more potential interests in the movies that share the same “star” with his historically liked ones, while another user may be more concerned about the “genre” of movies.
To characterize the topological proximity structure of item , we compute the linear combination of ’s neighborhood:
(2) 
where is the normalized userrelation score
(3) 
and is the representation of entity . Userrelation scores act as personalized filters when computing an entity’s neighborhood representation, since we aggregate the neighbors with bias with respect to these userspecific scores.
In a realworld knowledge graph, the size of may vary significantly over all entities. To keep the computational pattern of each batch fixed and more efficient, we uniformly sample a fixedsize set of neighbors for each entity instead of using its full neighbors. Specifically, we compute the neighborhood representation of entity as , where and is a configurable constant.^{3}^{3}3Technically, may contain duplicates if . In KGCN, is also called the (singlelayer) receptive field of entity , as the final representation of is sensitive to these locations. Figure 0(a) gives an illustrative example of a twolayer receptive field for a given entity, where is set as .
The final step in a KGCN layer is to aggregate the entity representation and its neighborhood representation into a single vector. We implement three types of aggregators in KGCN:

Sum aggregator takes the summation of two representation vectors, followed by a nonlinear transformation:
(4) where and are transformation weight and bias, respectively, and is the nonlinear function such ReLU.

Concat aggregator (Hamilton et al., 2017) concatenates the two representation vectors first before applying nonlinear transformation:
(5) 
Neighbor aggregator (Velickovic et al., 2018) directly takes the neighborhood representation of entity as the output representation:
(6)
Aggregation is a key step in KGCN, because the representation of an item is bound up with its neighbors by aggregation. We will evaluate the three aggregators in experiments.
3.3. Learning Algorithm
Through a single KGCN layer, the final representation of an entity is dependent on itself as well as its immediate neighbors, which we name 1order entity representation. It is natural to extend KGCN from one layer to multiple layers to reasonably explore users’ potential interests in a broader and deeper way. The technique is intuitive: Propagating the initial representation of each entity (0order representation) to its neighbors leads to 1order entity representation, then we can repeat this procedure, i.e., further propagating and aggregating 1order representations to obtain 2order ones. Generally speaking, the order representation of an entity is a mixture of initial representations of itself and its neighbors up to hops away. This is an important property for KGCN, which we will discuss in the next subsection.
The formal description of the above steps is presented in Algorithm 1. denotes the maximum depth of receptive field (or equivalently, the number of aggregation iterations), and a suffix attached by a representation vector denotes order. For a given useritem pair (line 2), we first calculate the receptive field of in an iterative layerbylayer manner (line 3, 1319). Then the aggregation is repeated times (line 5): In iteration , we calculate the neighborhood representation of each entity (line 7), then aggregate it with its own representation to obtain the one to be used at the next iteration (line 8). The final order entity representation is denoted as (line 9), which is fed into a function together with user representation for predicting the probability:
(7) 
Figure 0(b) illustrates the KGCN algorithm in one iteration, in which the entity representation and neighborhood representations (green nodes) of a given node are mixed to form its representation for the next iteration (blue node).
Note that Algorithm 1
traverses all possible useritem pairs (line 2). To make computation more efficient, we use a negative sampling strategy during training. The complete loss function is as follows:
(8) 
where is crossentropy loss, is a negative sampling distribution, and is the number of negative samples for user . In this paper, and
follows a uniform distribution. The last term is the L2regularizer.
4. Experiments
In this section, we evaluate KGCN on three realworld scenarios: movie, book, and music recommendations.
4.1. Datasets
We utilize the following three datasets in our experiments for movie, book, and music recommendation, respectively:

MovieLens20M^{4}^{4}4https://grouplens.org/datasets/movielens/ is a widely used benchmark dataset in movie recommendations, which consists of approximately 20 million explicit ratings (ranging from 1 to 5) on the MovieLens website.

BookCrossing^{5}^{5}5http://www2.informatik.unifreiburg.de/~cziegler/BX/ contains 1 million ratings (ranging from 0 to 10) of books in the BookCrossing community.

Last.FM^{6}^{6}6https://grouplens.org/datasets/hetrec2011/ contains musician listening information from a set of 2 thousand users from Last.fm online music system.
Since the three datasets are explicit feedbacks, we transform them into implicit feedback where each entry is marked with 1 indicating that the user has rated the item positively, and sample an unwatched set marked as 0 for each user. The threshold of positive rating is 4 for MovieLens20M, while no threshold is set for BookCrossing and Last.FM due to their sparsity.
We use Microsoft Satori^{7}^{7}7https://searchengineland.com/library/bing/bingsatori to construct the knowledge graph for each dataset. We first select a subset of triples from the whole KG with a confidence level greater than 0.9. Given the subKG, we collect Satori IDs of all valid movies/books/musicians by matching their names with tail of triples (head, film.film.name, tail), (head, book.book.title, tail), or (head, type.object.name, tail). Items with multiple matched or no matched entities are excluded for simplicity. We then match the item IDs with the head of all triples and select all wellmatched triples from the subKG. The basic statistics of the three datasets are presented in Table 1.
MovieLens20M  BookCrossing  Last.FM  
# users  138,159  19,676  1,872 
# items  16,954  20,003  3,846 
# interactions  13,501,622  172,576  42,346 
# entities  102,569  25,787  9,366 
# relations  32  18  60 
# KG triples  499,474  60,787  15,518 
4  8  8  
32  64  16  
2  1  1  
batch size  65,536  256  128 
Model  MovieLens20M  BookCrossing  Last.FM  

AUC  F1  AUC  F1  AUC  F1  
SVD  0.963 (1.5%)  0.919 (1.4%)  0.672 (8.9%)  0.635 (7.7%)  0.769 (3.4%)  0.696 (3.5%) 
LibFM  0.959 (1.9%)  0.906 (2.8%)  0.691 (6.4%)  0.618 (10.2%)  0.778 (2.3%)  0.710 (1.5%) 
LibFM + TransE  0.966 (1.2%)  0.917 (1.6%)  0.698 (5.4%)  0.622 (9.6%)  0.777 (2.4%)  0.709 (1.7%) 
PER  0.832 (14.9%)  0.788 (15.5%)  0.617 (16.4%)  0.562 (18.3%)  0.633 (20.5%)  0.596 (17.3%) 
CKE  0.924 (5.5%)  0.871 (6.5%)  0.677 (8.3%)  0.611 (11.2%)  0.744 (6.5%)  0.673 (6.7%) 
RippleNet  0.968 (1.0%)  0.912 (2.1%)  0.715 (3.1%)  0.650 (5.5%)  0.780 (2.0%)  0.702 (2.6%) 
KGCNsum  0.978  0.932*  0.738  0.688*  0.794 (0.3%)  0.719 (0.3%) 
KGCNconcat  0.977 (0.1%)  0.931 (0.1%)  0.734 (0.5%)  0.681 (1.0%)  0.796*  0.721* 
KGCNneighbor  0.977 (0.1%)  0.932*  0.728 (1.4%)  0.679 (1.3%)  0.781 (1.9%)  0.699 (3.1%) 
KGCNavg  0.975 (0.3%)  0.929 (0.3%)  0.722 (2.2%)  0.682 (0.9%)  0.774 (2.8%)  0.692 (4.0%) 
4.2. Baselines
We compare the proposed KGCN with the following baselines, in which the first two baselines are KGfree while the rest are all KGaware methods. Hyperparameter settings for baselines are introduced in the next subsection.

LibFM (Rendle, 2012) is a featurebased factorization model in CTR scenarios. We concatenate user ID and item ID as input for LibFM.

LibFM + TransE extends LibFM by attaching an entity representation learned by TransE (Bordes et al., 2013) to each useritem pair.

PER (Yu et al., 2014) treats the KG as heterogeneous information networks and extracts metapath based features to represent the connectivity between users and items.

CKE (Zhang et al., 2016) combines CF with structural, textual, and visual knowledge in a unified framework for recommendation. We implement CKE as CF plus a structural knowledge module in this paper.

RippleNet (Wang et al., 2018b) is a memorynetworklike approach that propagates users’ preferences on the KG for recommendation.
4.3. Experiments Setup
In KGCN, we set functions and as inner product, as ReLU for nonlastlayer aggregator and for lastlayer aggregator. Other hyperparameter settings are provided in Table 1. The hyperparameters are determined by optimizing on a validation set. For each dataset, the ratio of training, evaluation, and test set is . Each experiment is repeated times, and the average performance is reported. We evaluate our method in two experiment scenarios: (1) In clickthrough rate (CTR) prediction, we apply the trained model to predict each interaction in the test set. We use and to evaluate CTR prediction. (2) In top recommendation, we use the trained model to select items with highest predicted click probability for each user in the test set, and choose
to evaluate the recommended sets. All trainable parameters are optimized by Adam algorithm. The code of KGCNLS is implemented under Python 3.6, TensorFlow 1.12.0, and NumPy 1.14.3.
The hyperparameter settings for baselines are as follows. For SVD, we use the unbiased version (i.e., the predicted rating is modeled as ). The dimension and learning rate for the four datasets are set as: , for MovieLens20M, BookCrossing; , for Last.FM. For LibFM, the dimension is
and the number of training epochs is
. The dimension of TransE is . For PER, we use manually designed useritemattributeitem paths as features (i.e., “usermoviedirectormovie”, “usermoviegenremovie”, and “usermoviestarmovie” for MovieLens20M; “userbookauthorbook” and “userbookgenrebook” for BookCrossing, “usermusiciandate_of_birthmusician” (date of birth is discretized), “usermusiciancountrymusician”, and “usermusiciangenremusician” for Last.FM). For CKE, the dimension of the three datasets are , , . The training weight for KG part is for all datasets. The learning rate are the same as in SVD. For RippleNet, , , , , for MovieLens20M; , , , , for Last.FM. Other hyperparameters are the same as reported in their original papers or as default in their codes.4.4. Results
The results of CTR prediction and top recommendation are presented in Table 2 and Figure 2, respectively (SVD, LibFM and other variants of KGCN are not plotted in Figure 2 for clarity). We have the following observations:

In general, we find that the improvements of KGCN on book and music are higher than movie. This demonstrates that KGCN can well address sparse scenarios, since BookCrossing and Last.FM are much sparser than MovieLens20M.

The performance of KGfree baselines, SVD and LibFM, are actually better than the two KGaware baselines PER and CKE, which indicates that PER and CKE cannot make full use of the KG with manually designed metapaths and TransRlike regularization.

LibFM + TransE is better than LibFM in most cases, which demonstrates that the introduction of KG is helpful for recommendation in general.

PER performs worst among all baselines, since it is hard to define optimal metapaths in reality.

RippleNet shows strong performance compared with other baselines. Note that RippleNet also uses multihop neighborhood structure, which interestingly shows that capturing proximity information in the KG is essential for recommendation.
The last four rows in Table 2 summarize the performance of KGCN variants. The first three (sum, concat, neighbor) correspond to different aggregators introduced in the preceding section, while the last variant KGCNavg is a reduced case of KGCNsum where neighborhood representations are directly averaged without userrelation scores (i.e., instead of Eq. (2)). Therefore, KGCNavg is used to examine the efficacy of the “attention mechanism”. From the results we find that:

KGCN outperforms all baselines by a significant margin, while their performances are slightly distinct: KGCNsum performs best in general, while the performance of KGCNneighbor shows a clear gap on BookCrossing and Last.FM. This may be because the neighbor aggregator uses the neighborhood representation only, thus losing useful information from the entity itself.

KGCNavg performs worse than KGCNsum, especially in BookCrossing and Last.FM where interactions are sparse. This demonstrates that capturing users’ personalized preferences and semantic information of the KG do benefit the recommendation.
4.4.1. Impact of neighbor sampling size.
2  4  8  16  32  64  

MovieLens20M  0.978  0.979  0.978  0.978  0.977  0.978 
BookCrossing  0.680  0.727  0.736  0.725  0.711  0.723 
Last.FM  0.791  0.794  0.795  0.793  0.794  0.792 
We vary the size of sampled neighbor to investigate the efficacy of usage of the KG. From Table 3 we observe that KGCN achieves the best performance when or . This is because a too small does not have enough capacity to incorporate neighborhood information, while a too large is prone to be misled by noises.
4.4.2. Impact of depth of receptive field.
1  2  3  4  

MovieLens20M  0.972  0.976  0.974  0.514 
BookCrossing  0.738  0.731  0.684  0.547 
Last.FM  0.794  0.723  0.545  0.534 
We investigate the influence of depth of receptive field in KGCN by varying from 1 to 4. The results are shown in Table 4, which demonstrate that KGCN is more sensitive to compared to . We observe the occurrence of serious model collapse when or , as a larger brings massive noises to the model. This is also in accordance with our intuition, since a too long relationchain makes little sense when inferring interitem similarities. An of 1 or 2 is enough for real cases according to the experiment results.
4.4.3. Impact of dimension of embedding.
4  8  16  32  64  128  

MovieLens20M  0.968  0.970  0.975  0.977  0.973  0.972 
BookCrossing  0.709  0.732  0.733  0.735  0.739  0.736 
Last.FM  0.789  0.793  0.797  0.793  0.790  0.789 
Lastly, we examine the influence of dimension of embedding on performance of KGCN. The result in Table 5 is rather intuitive: Increasing initially can boost the performance since a larger can encode more information of users and entities, while a too large adversely suffers from overfitting.
5. Conclusions and Future Work
This paper proposes knowledge graph convolutional networks for recommender systems. KGCN extends nonspectral GCN approaches to the knowledge graph by aggregating neighborhood information selectively and biasedly, which is able to learn both structure information and semantic information of the KG as well as users’ personalized and potential interests. We also implement the proposed method in a minibatch fashion, which is able to operate on large datasets and knowledge graphs. Through extensive experiments on realworld datasets, KGCN is shown to consistently outperform stateoftheart baselines in movie, book, and music recommendation.
We point out three avenues for future work. (1) In this work we uniformly sample from the neighbors of an entity to construct its receptive field. Exploring a nonuniform sampler (e.g., importance sampling) is an important direction of future work. (2) This paper (and all literature) focuses on modeling itemend KGs. An interesting direction of future work is to investigate whether leveraging userend KGs is useful in improving the performance of recommendation. (3) Designing an algorithm to well combine KGs at the two ends is also a promising direction.
References
 (1)
 Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto GarciaDuran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multirelational data. In Advances in Neural Information Processing Systems. 2787–2795.
 Bruna et al. (2014) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. 2014. Spectral networks and locally connected networks on graphs. In the 2nd International Conference on Learning Representations.

Cheng et al. (2016)
HengTze Cheng, Levent
Koc, Jeremiah Harmsen, Tal Shaked,
Tushar Chandra, Hrishi Aradhye,
Glen Anderson, Greg Corrado,
Wei Chai, Mustafa Ispir, et al.
2016.
Wide & deep learning for recommender systems. In
Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7–10.  Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844–3852.
 Diao et al. (2014) Qiming Diao, Minghui Qiu, ChaoYuan Wu, Alexander J Smola, Jing Jiang, and Chong Wang. 2014. Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS). In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 193–202.
 Duvenaud et al. (2015) David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán AspuruGuzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems. 2224–2232.
 Hamilton et al. (2017) Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024–1034.
 He et al. (2017) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and TatSeng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web. 173–182.
 Huang et al. (2018) Jin Huang, Wayne Xin Zhao, Hongjian Dou, JiRong Wen, and Edward Y Chang. 2018. Improving Sequential Recommendation with KnowledgeEnhanced Memory Networks. In the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 505–514.
 Kipf and Welling (2017) Thomas N Kipf and Max Welling. 2017. Semisupervised classification with graph convolutional networks. In the 5th International Conference on Learning Representations.
 Koren (2008) Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 426–434.
 Lin et al. (2015) Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion.. In AAAI, Vol. 15. 2181–2187.

Niepert
et al. (2016)
Mathias Niepert, Mohamed
Ahmed, and Konstantin Kutzkov.
2016.
Learning convolutional neural networks for graphs.
In
International Conference on Machine Learning
. 2014–2023.  Rendle (2012) Steffen Rendle. 2012. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 3 (2012), 57.
 Velickovic et al. (2018) Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the 6th International Conferences on Learning Representations.
 Wang et al. (2017b) Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, and Minyi Guo. 2017b. Joint TopicSemanticaware Social Recommendation for Online Voting. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 347–356.
 Wang et al. (2018a) Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, and Qi Liu. 2018a. SHINE: signed heterogeneous information network embedding for sentiment link prediction. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 592–600.
 Wang et al. (2018b) Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2018b. RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM.
 Wang et al. (2018c) Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018c. DKN: Deep KnowledgeAware Network for News Recommendation. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. 1835–1844.
 Wang et al. (2017a) Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017a. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29, 12 (2017), 2724–2743.
 Ying et al. (2018) Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for WebScale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
 Yu et al. (2014) Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining. ACM, 283–292.
 Zhang et al. (2016) Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and WeiYing Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 353–362.
 Zhao et al. (2017) Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, and Dik Lun Lee. 2017. Metagraph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 635–644.

Zheng et al. (2018)
Guanjie Zheng, Fuzheng
Zhang, Zihan Zheng, Yang Xiang,
Nicholas Jing Yuan, Xing Xie, and
Zhenhui Li. 2018.
DRN: A Deep Reinforcement Learning Framework for News Recommendation. In
Proceedings of the 2018 World Wide Web Conference on World Wide Web. 167–176.  Zhou et al. (2018) Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for clickthrough rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1059–1068.
Comments
There are no comments yet.