Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose Ripple Network, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

01/23/2019

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Collaborative filtering often suffers from sparsity and cold start probl...
11/03/2021

Conditional Attention Networks for Distilling Knowledge Graphs in Recommendation

Knowledge graph is generally incorporated into recommender systems to im...
03/18/2019

Knowledge Graph Convolutional Networks for Recommender Systems

To alleviate sparsity and cold start problem of collaborative filtering ...
11/01/2021

URIR: Recommendation algorithm of user RNN encoder and item encoder based on knowledge graph

Due to a large amount of information, it is difficult for users to find ...
02/17/2019

Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences

Incorporating knowledge graph (KG) into recommender system is promising ...
10/13/2021

Knowledge Graph-enhanced Sampling for Conversational Recommender System

The traditional recommendation systems mainly use offline user data to t...
08/13/2019

GraphSW: a training protocol based on stage-wise training for GNN-based Recommender Model

Recently, researchers utilize Knowledge Graph (KG) as side information i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The explosive growth of online content and services has provided overwhelming choices for users, such as news, movies, music, restaurants, and books. Recommender systems (RS) intend to address the information explosion by finding a small set of items for users to meet their personalized interests. Among recommendation strategies, collaborative filtering (CF), which considers users’ historical interactions and makes recommendations based on their potential common preferences, has achieved great success (Koren et al., 2009). However, CF-based methods usually suffer from the sparsity of user-item interactions and the cold start problem. To address these limitations, researchers have proposed incorporating side information into CF, such as social networks (Jamali and Ester, 2010), user/item attributes (Wang et al., 2018b), images (Zhang et al., 2016) and contexts (Sun et al., 2017).

Among various types of side information, knowledge graph (KG) usually contains much more fruitful facts and connections about items. A KG is a type of directed heterogeneous graph in which nodes correspond to entities and edges correspond to relations. Recently, researchers have proposed several academic KGs, such as NELL111http://rtw.ml.cmu.edu/rtw/, DBpedia222http://wiki.dbpedia.org/, and commercial KGs, such as Google Knowledge Graph333https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html and Microsoft Satori444https://searchengineland.com/library/bing/bing-satori. These knowledge graphs are successfully applied in many applications such as KG completion (Lin et al., 2015), question answering (Dong et al., 2015), word embedding (Xu et al., 2014), and text classification (Wang et al., 2017b).

Figure 1. Illustration of knowledge graph enhanced movie recommender systems. The knowledge graph provides fruitful facts and connections among items, which are useful for improving precision, diversity, and explainability of recommended results.

Inspired by the success of applying KG in a wide variety of tasks, researchers also tried to utilize KG to improve the performance of recommender systems. As shown in Figure 1, KG can benefit the recommendation from three aspects: (1) KG introduces semantic relatedness among items, which can help find their latent connections and improve the precision of recommended items; (2) KG consists of relations with various types, which is helpful for extending a user’s interests reasonably and increasing the diversity of recommended items; (3) KG connects a user’s historical records and the recommended ones, thereby bringing explainability

to recommender systems. In general, existing KG-aware recommendation can be classified into two categories:

The first category is embedding-based methods (Wang et al., 2018c; Zhang et al., 2016; Wang et al., 2018b), which pre-process a KG with knowledge graph embedding (KGE) (Wang et al., 2017a) algorithms and incorporates the learned entity embeddings into a recommendation framework. For example, Deep Knowledge-aware Network (DKN) (Wang et al., 2018c) treats entity embeddings and word embeddings as different channels, then designs a CNN framework to combine them together for news recommendation. Collaborative Knowledge base Embedding (CKE) (Zhang et al., 2016) combines a CF module with knowledge embedding, text embedding, and image embedding of items in a unified Bayesian framework. Signed Heterogeneous Information Network Embedding (SHINE) (Wang et al., 2018b)

designs deep autoencoders to embed sentiment networks, social networks and profile (knowledge) networks for celebrity recommendations. Embedding-based methods show high flexibility in utilizing KG to assist recommender systems, but the adopted KGE algorithms in these methods are usually more suitable for in-graph applications such as link prediction than for recommendation

(Wang et al., 2017a), thus the learned entity embeddings are less intuitive and effective to characterize inter-item relations.

The second category is path-based methods (Yu et al., 2014; Zhao et al., 2017), which explore the various patterns of connections among items in KG to provide additional guidance for recommendations. For example, Personalized Entity Recommendation (PER) (Yu et al., 2014) and Meta-Graph Based Recommendation (Zhao et al., 2017) treat KG as a heterogeneous information network (HIN), and extract meta-path/meta-graph based latent features to represent the connectivity between users and items along different types of relation paths/graphs. Path-based methods make use of KG in a more natural and intuitive way, but they rely heavily on manually designed meta-paths, which is hard to optimize in practice. Another concern is that it is impossible to design hand-crafted meta-paths in certain scenarios (e.g., news recommendation) where entities and relations are not within one domain.

To address the limitations of existing methods, we propose RippleNet, an end-to-end framework for knowledge-graph-aware recommendation. RippleNet is designed for click-through rate (CTR) prediction, which takes a user-item pair as input and outputs the probability of the user engaging (e.g., clicking, browsing) the item. The key idea behind RippleNet is preference propagation: For each user, RippleNet treats his historical interests as a seed set in the KG, then extends the user’s interests iteratively along KG links to discover his hierarchical potential interests with respect to a candidate item. We analogize preference propagation with actual ripples created by raindrops propagating on the water, in which multiple ”ripples” superpose to form a resultant preference distribution of the user over the knowledge graph. The major difference between RippleNet and existing literature is that RippleNet combines the advantages of the above mentioned two types of methods: (1) RippleNet incorporates the KGE methods into recommendation naturally by preference propagation; (2) RippleNet can automatically discover possible paths from an item in a user’s history to a candidate item, without any sort of hand-crafted design.

Empirically, we apply RippleNet to three real-world scenarios of movie, book, and news recommendations. The experiment results show that RippleNet achieves AUC gains of to , to , and to in movie, book, and news recommendations, respectively, compared with state-of-the-art baselines for recommendation. We also find that RippleNet provides a new perspective of explainability for the recommended results in terms of the knowledge graph.

In summary, our contributions in this paper are as follows:

  • To the best of our knowledge, this is the first work to combine embedding-based and path-based methods in KG-aware recommendation.

  • We propose RippleNet, an end-to-end framework utilizing KG to assist recommender systems. RippleNet automatically discovers users’ hierarchical potential interests by iteratively propagating users’ preferences in the KG.

  • We conduct experiments on three real-world recommendation scenarios, and the results prove the efficacy of RippleNet over several state-of-the-art baselines.

2. Problem Formulation

The knowledge-graph-aware recommendation problem is formulated as follows. In a typical recommender system, let and denote the sets of users and items, respectively. The user-item interaction matrix is defined according to users’ implicit feedback, where

(1)

A value of 1 for indicates there is an implicit interaction between user and item , such as behaviors of clicking, watching, browsing, etc. In addition to the interaction matrix , we also have a knowledge graph available, which consists of massive entity-relation-entity triples . Here , , and denote the head, relation, and tail of a knowledge triple, respectively, and denote the set of entities and relations in the KG. For example, the triple (Jurassic Park, film.film.director, Steven Spielberg) states the fact that Steven Spielberg is the director of the film ”Jurassic Park”. In many recommendation scenarios, an item may associate with one or more entities in . For example, the movie ”Jurassic Park” is linked with its namesake in KG, while news with title ”France’s Baby Panda Makes Public Debut” is linked with entities ”France” and ”panda”.

Given interaction matrix as well as knowledge graph , we aim to predict whether user has potential interest in item with which he has had no interaction before. Our goal is to learn a prediction function , where denotes the probability that user will click item , and denotes the model parameters of function .

3. RippleNet

In this section, we discuss the proposed RippleNet in detail. We also give some discussions on the model and introduce the related work.

3.1. Framework

Figure 2. The overall framework of the RippleNet. It takes one user and one item as input, and outputs the predicted probability that the user will click the item. The KGs in the upper part illustrate the corresponding ripple sets activated by the user’s click history.

The framework of RippleNet is illustrated in Figure 2. RippleNet takes a user and an item as input, and outputs the predicted probability that user will click item . For the input user , his historical set of interests is treated as seeds in the KG, then extended along links to form multiple ripple sets (). A ripple set is the set of knowledge triples that are -hop(s) away from the seed set . These ripple sets are used to interact with the item embedding (the yellow block) iteratively for obtaining the responses of user with respect to item (the green blocks), which are then combined to form the final user embedding (the grey block). Lastly, we use the embeddings of user and item together to compute the predicted probability .

3.2. Ripple Set

A knowledge graph usually contains fruitful facts and connections among entities. For example, as illustrated in Figure 3, the film ”Forrest Gump” is linked with ”Robert Zemeckis” (director), ”Tom Hanks” (star), ”U.S.” (country) and ”Drama” (genre), while ”Tom Hanks” is further linked with films ”The Terminal” and ”Cast Away” which he starred in. These complicated connections in KG provide us a deep and latent perspective to explore user preferences. For example, if a user has ever watched ”Forrest Gump”, he may possibly become a fan of Tom Hanks and be interested in ”The Terminal” or ”Cast Away”. To characterize users’ hierarchically extended preferences in terms of KG, in RippleNet, we recursively define the set of -hop relevant entities for user as follows:

Definition 0 (relevant entity).

Given interaction matrix and knowledge graph , the set of -hop relevant entities for user is defined as

(2)

where is the set of user’s clicked items in the past, which can be seen as the seed set of user in KG.

Relevant entities can be regarded as natural extensions of a user’s historical interests with respect to the KG. Given the definition of relevant entities, we then define the -hop ripple set of user as follows:

Definition 0 (ripple set).

The -hop ripple set of user is defined as the set of knowledge triples starting from :

(3)

The word ”ripple” has two meanings: (1) Analogous to real ripples created by multiple raindrops, a user’s potential interest in entities is activated by his historical preferences, then propagates along the links in KG layer by layer, from near to distant. We visualize the analogy by the concentric circles illustrated in Figure 3. (2) The strength of a user’s potential preferences in ripple sets weakens with the increase of the hop number , which is similar to the gradually attenuated amplitude of real ripples. The fading blue in Figure 3 shows the decreasing relatedness between the center and surrounding entities.

One concern about ripple sets is their sizes may get too large with the increase of hop number . To address the concern, note that: (1) A large number of entities in a real KG are sink entities, meaning they only have incoming links but no outgoing links, such as ”2004” and ”PG-13” in Figure 3. (2) In specific recommendation scenarios such as movie or book recommendations, relations can be limited to scenario-related categories to reduce the size of ripple sets and improve relevance among entities. For example, in Figure 3, all relations are movie-related and contain the word ”film” in their names. (3) The number of maximal hop is usually not too large in practice, since entities that are too distant from a user’s history may bring more noise than positive signals. We will discuss the choice of in the experiments part. (4) In RippleNet, we can sample a fixed-size set of neighbors instead of using a full ripple set to further reduce the computation overhead. Designing such samplers is an important direction of future work, especially the non-uniform samplers for better capturing user’s hierarchical potential interests.

Figure 3. Illustration of ripple sets of ”Forrest Gump” in KG of movies. The concentric circles denotes the ripple sets with different hops. The fading blue indicates decreasing relatedness between the center and surrounding entities.Note that the ripple sets of different hops are not necessarily disjoint in practice.

3.3. Preference Propagation

Traditional CF-based methods and their variants (Koren, 2008; Wang et al., 2017c) learn latent representations of users and items, then predict unknown ratings by directly applying a specific function to their representations such as inner product. In RippleNet, to model the interactions between users and items in a more fine-grained way, we propose a preference propagation technique to explore users’ potential interests in his ripple sets.

As shown in Figure 2, each item is associated with an item embedding , where is the dimension of embeddings. Item embedding can incorporate one-hot ID (Koren, 2008), attributes (Wang et al., 2018b), bag-of-words (Wang et al., 2018c) or context information (Sun et al., 2017) of an item, based on the application scenario. Given the item embedding and the 1-hop ripple set of user , each triple in is assigned a relevance probability by comparing item to the head and the relation in this triple:

(4)

where and are the embeddings of relation and head , respectively. The relevance probability can be regarded as the similarity of item and the entity measured in the space of relation . Note that it is necessary to take the embedding matrix into consideration when calculating the relevance of item and entity , since an item-entity pair may have different similarities when measured by different relations. For example, ”Forrest Gump” and ”Cast Away” are highly similar when considering their directors or stars, but have less in common if measured by genre or writer.

After obtaining the relevance probabilities, we take the sum of tails in

weighted by the corresponding relevance probabilities, and the vector

is returned:

(5)

where is the embedding of tail . Vector can be seen as the 1-order response of user ’s click history with respect to item . This is similar to item-based CF methods (Koren, 2008; Wang et al., 2018c), in which a user is represented by his related items rather than a independent feature vector to reduce the size of parameters. Through the operations in Eq. (4) and Eq. (5), a user’s interests are transferred from his history set to the set of his 1-hop relevant entities along the links in , which is called preference propagation in RippleNet.

Note that by replacing with in Eq. (4), we can repeat the procedure of preference propagation to obtain user ’s 2-order response , and the procedure can be performed iteratively on user ’s ripple sets for . Therefore, a user’s preference is propagated up to hops away from his click history, and we observe multiple responses of user with different orders: . The embedding of user with respect to item is calculated by combining the responses of all orders:

(6)

Note that though the user response of last hop contains all the information from previous hops theoretically, it is still necessary to incorporate of small hops in calculating user embedding since they may be diluted in . Finally, the user embedding and item embedding are combined to output the predicted clicking probability:

(7)

where

is the sigmoid function.

3.4. Learning Algorithm

In RippleNet, we intend to maximize the following posterior probability of model parameters

after observing the knowledge graph and the matrix of implicit feedback :

(8)

where includes the embeddings of all entities, relations and items. This is equivalent to maximizing

(9)

according to Bayes’ theorem. In Eq. (

9), the first term measures the priori probability of model parameters . Following (Zhang et al., 2016), we set

as Gaussian distribution with zero mean and a diagonal covariance matrix:

(10)

The second item in Eq. (9) is the likelihood function of the observed knowledge graph given . Recently, researchers have proposed a great many knowledge graph embedding methods, including translational distance models (Bordes et al., 2013; Lin et al., 2015) and semantic matching models (Nickel et al., 2016; Liu et al., 2017) (We will continue the discussion on KGE methods in Section 3.6.3

). In RippleNet, we use a three-way tensor factorization method to define the likelihood function for KGE:

(11)

where the indicator equals if and is otherwise. Based on the definition in Eq. (11), the scoring functions of entity-entity pairs in KGE and item-entity pairs in preference propagation can thus be unified under the same calculation model. The last term in Eq. (9) is the likelihood function of the observed implicit feedback given and the KG, which is defined as the product of Bernouli distributions:

(12)

based on Eq. (2)–(7).

Taking the negative logarithm of Eq. (9

), we have the following loss function for RippleNet:

(13)

where and are the embedding matrices for all items and entities, respectively, is the slice of the indicator tensor in KG for relation , and is the embedding matrix of relation . In Eq. (13), The first term measures the cross-entropy loss between ground truth of interactions and predicted value by RippleNet, the second term measures the squared error between the ground truth of the KG and the reconstructed indicator matrix , and the third term is the regularizer for preventing over-fitting.

0:  Interaction matrix , knowledge graph
0:  Prediction function
1:  Initialize all parameters
2:  Calculate ripple sets for each user ;
3:  for number of training iteration do
4:     Sample minibatch of positive and negative interactions from ;
5:     Sample minibatch of true and false triples from ;
6:     Calculate gradients , , , and on the minibatch by back-propagation according to Eq. (4)-(13);
7:     Update , , , and by gradient descent with learning rate ;
8:  end for
9:  return  
Algorithm 1 Learning algorithm for RippleNet

It is intractable to solve the above objection directly, therefore, we employ a stochastic gradient descent (SGD) algorithm to iteratively optimize the loss function. The learning algorithm of RippleNet is presented in Algorithm

1. In each training iteration, to make the computation more efficient, we randomly sample a minibatch of positive/negative interactions from and true/false triples from following the negative sampling strategy in (Mikolov et al., 2013). Then we calculate the gradients of the loss with respect to model parameters , and update all parameters by back-propagation based on the sampled minibatch. We will discuss the choice of hyper-parameters in the experiments section.

3.5. Discussion

3.5.1. Explainability

Explainable recommender systems (Tintarev and Masthoff, 2007) aim to reveal why a user might like a particular item, which helps improve their acceptance or satisfaction of recommendations and increase trust in RS. The explanations are usually based on community tags (Vig et al., 2009), social networks (Sharma and Cosley, 2013), aspect (Bauman et al., 2017), and phrase sentiment (Zhang et al., 2014) Since RippleNet explores users’ interests based on the KG, it provides a new point of view of explainability by tracking the paths from a user’s history to an item with high relevance probability (Eq. (4)) in the KG. For example, a user’s interest in film ”Back to the Future” might be explained by the path ” Future”, if the item ”Back to the Future” is of high relevance probability with ”Forrest Gump” and ”Robert Zemeckis” in the user’s -hop and -hop ripple set, respectively. Note that different from path-based methods (Yu et al., 2014; Zhao et al., 2017) where the patterns of path are manually designed, RippleNet automatically discovers the possible explanation paths according to relevance probability. We will further present a visualized example in the experiments section to intuitively demonstrate the explainability of RippleNet.

3.5.2. Ripple Superposition

A common phenomenon in RippleNet is that a user’s ripple sets may be large in size, which dilutes his potential interests inevitably in preference propagation. However, we observe that relevant entities of different items in a user’s click history often highly overlap. In other words, an entity could be reached by multiple paths in the KG starting from a user’s click history. For example, ”Saving Private Ryan” is connected to a user who has watched ”The Terminal”, ”Jurassic Park” and ”Braveheart” through actor ”Tom Hanks”, director ”Steven Spielberg” and genre ”War”, respectively. These parallel paths greatly increase a user’s interests in overlapped entities. We refer to the case as ripple superposition, as it is analogous to the interference phenomenon in physics in which two waves superpose to form a resultant wave of greater amplitude in particular areas. The phenomenon of ripple superposition is illustrated in the second KG in Figure 2, where the darker red around the two lower middle entities indicates higher strength of the user’s possible interests. We will also discuss ripple superposition in the experiments section.

3.6. Links to Existing Work

Here we continue our discussion on related work and make comparisons with existing techniques in a greater scope.

3.6.1. Attention Mechanism

The attention mechanism was originally proposed in image classification (Mnih et al., 2014) and machine translation (Bahdanau et al., 2014), which aims to learn where to find the most relevant part of the input automatically as it is performing the task. The idea was soon transplanted to recommender systems (Wang et al., 2017d; Chen et al., 2017; Seo et al., 2017; Zhou et al., 2017; Wang et al., 2018c). For example, DADM (Chen et al., 2017) considers factors of specialty and date when assigning attention values to articles for recommendation; D-Attn (Seo et al., 2017) proposes an interpretable and dual attention-based CNN model that combines review text and ratings for product rating prediction; DKN (Wang et al., 2018c) uses an attention network to calculate the weight between a user’s clicked item and a candidate item to dynamically aggregate the user’s historical interests. RippleNet can be viewed as a special case of attention where tails are averaged weighted by similarities between their associated heads, tails, and certain item. The difference between our work and literature is that RippleNet designs a multi-level attention module based on knowledge triples for preference propagation.

3.6.2. Memory Networks

Memory networks (Weston et al., 2014; Sukhbaatar et al., 2015; Miller et al., 2016)

is a recurrent attention model that utilizes an external memory module for question answering and language modeling. The iterative reading operations on the external memory enable memory networks to extract long-distance dependency in texts. Researchers have also proposed using memory networks in other tasks such as sentiment classification

(Tai et al., 2015; Li et al., 2017) and recommendation (Huang et al., 2017; Chen et al., 2018). Note that these works usually focus on entry-level or sentence-level memories, while our work addresses entity-level connections in the KG, which is more fine-grained and intuitive when performing multi-hop iterations. In addition, our work also incorporates a KGE term as a regularizer for more stable and effective learning.

3.6.3. Knowledge Graph Embedding

RippleNet also connects to a large body of work in KGE methods(Bordes et al., 2013; Wang et al., 2014; Ji et al., 2015; Lin et al., 2015; Wang et al., 2018a; Nickel et al., 2016; Trouillon et al., 2016; Yang et al., 2015). KGE intends to embed entities and relations in a KG into continuous vector spaces while preserving its inherent structure. Readers can refer to (Wang et al., 2017a) for a more comprehensive survey. KGE methods are mainly classified into two categories: (1) Translational distance models, such as TransE (Bordes et al., 2013), TransH (Wang et al., 2014), TransD (Ji et al., 2015), and TransR (Lin et al., 2015), exploit distance-based scoring functions when learning representations of entities and relations. For example, TransE (Bordes et al., 2013) wants when holds, where , and are the corresponding representation vector of , and . Therefore, TransE assumes the score function is low if holds, and high otherwise. (2) Semantic matching models, such as ANALOGY (Nickel et al., 2016), ComplEx (Trouillon et al., 2016), and DisMult (Yang et al., 2015), measure plausibility of knowledge triples by matching latent semantics of entities and relations. For example, DisMult (Yang et al., 2015) introduces a vector embedding and requires . The scoring function is hence defined as . Researchers also propose incorporating auxiliary information, such as entity types (Xie et al., 2016), logic rules (Rocktäschel et al., 2015), and textual descriptions (Zhong et al., 2015) to assist KGE. However, these methods are more suitable for in-graph applications such as link prediction or triple classification, according to their learning objectives. From this point of view, RippleNet can be seen as a specially designed KGE method that serves recommendation directly.

4. Experiments

In this section, we evaluate RippleNet on three real-world scenarios: movie, book, and news recommendations 555Experiment code is provided at https://github.com/hwwang55/RippleNet.. We first introduce the datasets, baselines, and experiment setup, then present the experiment results. We will also give a case study of visualization and discuss the choice of hyper-parameters in this section.

4.1. Datasets

We utilize the following three datasets in our experiments for movie, book, and news recommendation:

  • MovieLens-1M666https://grouplens.org/datasets/movielens/1m/ is a widely used benchmark dataset in movie recommendations, which consists of approximately 1 million explicit ratings (ranging from 1 to 5) on the MovieLens website.

  • Book-Crossing dataset777http://www2.informatik.uni-freiburg.de/~cziegler/BX/ contains 1,149,780 explicit ratings (ranging from 0 to 10) of books in the Book-Crossing community.

  • Bing-News dataset contains 1,025,192 pieces of implicit feedback collected from the server logs of Bing News888https://www.bing.com/news from October 16, 2016 to August 11, 2017. Each piece of news has a title and a snippet.

Since MovieLens-1M and Book-Crossing are explicit feedback data, we transform them into implicit feedback where each entry is marked with 1 indicating that the user has rated the item (the threshold of rating is 4 for MovieLens-1M, while no threshold is set for Book-Crossing due to its sparsity), and sample an unwatched set marked as 0 for each user, which is of equal size with the rated ones. For MovieLens-1M and Book-Crossing, we use the ID embeddings of users and items as raw input, while for Bing-News, we concatenate the ID embedding of a piece of news and the averaged word embedding of its title as raw input for the item, since news titles are typically much longer than names of movies or books, hence providing more useful information for recommendation.

We use Microsoft Satori to construct the knowledge graph for each dataset. For MovieLens-1M and Book-Crossing, we first select a subset of triples from the whole KG whose relation name contains ”movie” or ”book” and the confidence level is greater than 0.9. Given the sub-KG, we collect IDs of all valid movies/books by matching their names with tail of triples (head, film.film.name, tail) or (head, book.book.title, tail). For simplicity, items with no matched or multiple matched entities are excluded. We then match the IDs with the head and tail of all KG triples, select all well-matched triples from the sub-KG, and extend the set of entities iteratively up to four hops. The constructing process is similar for Bing-News except that: (1) we use entity linking tools to extract entities in news titles; (2) we do not impose restrictions on the names of relations since the entities in news titles are not within one particular domain. The basic statistics of the three datasets are presented in Table 1.

MovieLens-1M Book-Crossing Bing-News
# users 6,036 17,860 141,487
# items 2,445 14,967 535,145
# interactions 753,772 139,746 1,025,192
# 1-hop triples 20,782 19,876 503,112
# 2-hop triples 178,049 65,360 1,748,562
# 3-hop triples 318,266 84,299 3,997,736
# 4-hop triples 923,718 71,628 6,322,548
Table 1. Basic statistics of the three datasets.
MovieLens-1M , , , ,
Book-Crossing , , , ,
Bing-News , , , ,
Table 2. Hyper-parameter settings for the three datasets.

4.2. Baselines

We compare the proposed RippleNet with the following state-of-the-art baselines:

  • CKE (Zhang et al., 2016) combines CF with structural knowledge, textual knowledge, and visual knowledge in a unified framework for recommendation. We implement CKE as CF plus structural knowledge module in this paper.

  • SHINE (Wang et al., 2018b) designs deep autoencoders to embed a sentiment network, social network, and profile (knowledge) network for celebrity recommendation. Here we use autoencoders for user-item interaction and item profile to predict click probability.

  • DKN (Wang et al., 2018c) treats entity embedding and word embedding as multiple channels and combines them together in CNN for CTR prediction. In this paper, we use movie/book names and news titles as textual input for DKN.

  • PER (Yu et al., 2014) treats the KG as HIN and extracts meta-path based features to represent the connectivity between users and items. In this paper, we use all item-attribute-item features for PER (e.g., “movie-director-movie”).

  • LibFM (Rendle, 2012) is a widely used feature-based factorization model in CTR scenarios. We concatenate user ID, item ID, and the corresponding averaged entity embeddings learned from TransR (Lin et al., 2015) as input for LibFM.

  • WideDeep (Cheng et al., 2016) is a general deep model for recommendation combining a (wide) linear channel with a (deep) non-linear channel. Similar to LibFM, we use the embeddings of users, items, and entities to feed WideDeep.

(a) MovieLens-1M
(b) Book-Crossing
(c) Bing-News
(d) Ratio of two average numbers
Figure 4. The average number of -hop neighbors that two items share in the KG w.r.t. whether they have common raters in (a) MovieLens-1M, (b) Book-Crossing, and (c) Bing-News datasets. (d) The ratio of the two average numbers with different hops.

4.3. Experiment Setup

In RippleNet, we set the hop number for MovieLens-1M/Book-Crossing and for Bing-News. A larger number of hops hardly improves performance but does incur heavier computational overhead according to experiment results. The complete hyper-parameter settings are given in Table 2, where denotes the dimension of embedding for items and the knowledge graph, and denotes the learning rate. The hyper-parameters are determined by optimizing on a validation set. For fair consideration, the latent dimensions of all compared baselines are set the same as in Table 2, while other hyper-parameters of baselines are set based on grid search.

For each dataset, the ratio of training, evaluation, and test set is . Each experiment is repeated times, and the average performance is reported. We evaluate our method in two experiment scenarios: (1) In click-through rate (CTR) prediction, we apply the trained model to each piece of interactions in the test set and output the predicted click probability. We use and to evaluate the performance of CTR prediction. (2) In top- recommendation, we use the trained model to select items with highest predicted click probability for each user in the test set, and choose , , to evaluate the recommended sets.

4.4. Empirical Study

We conduct an empirical study to investigate the correlation between the average number of common neighbors of an item pair in the KG and whether they have common rater(s) in RS. For each dataset, we first randomly sample one million item pairs, then count the average number of -hop neighbors that the two items share in the KG under the following two circumstances: (1) the two items have at least one common rater in RS; (2) the two items have no common rater in RS. The results are presented in Figures 3(a), 3(b), 3(c), respectively, which clearly show that if two items have common rater(s) in RS, they likely share more common -hop neighbors in the KG for fixed . The above findings empirically demonstrate that the similarity of proximity structures of two items in the KG could assist in measuring their relatedness in RS. In addition, we plot the ratio of the two average numbers with different hops (i.e., dividing the higher bar by its immediate lower bar for each hop number) in Figure 3(d), from which we observe that the proximity structures of two items under the two circumstances become more similar with the increase of the hop number. This is because any two items are probable to share a large amount of -hop neighbors in the KG for a large , even if there is no direct similarity between them in reality. The result motivates us to find a moderate hop number in RippleNet to explore users’ potential interests as far as possible while avoiding introducing too much noise.

4.5. Results

Model MovieLens-1M Book-Crossing Bing-News
AUC ACC AUC ACC AUC ACC
RippleNet* 0.921 0.844 0.729 0.662 0.678 0.632
CKE 0.796 0.739 0.674 0.635 0.560 0.517
SHINE 0.778 0.732 0.668 0.631 0.554 0.537
DKN 0.655 0.589 0.621 0.598 0.661 0.604
PER 0.712 0.667 0.623 0.588 - -
LibFM 0.892 0.812 0.685 0.639 0.644 0.588
WideDeep 0.903 0.822 0.711 0.623 0.654 0.595
Table 3. The results of and in CTR prediction.
(a)
(b)
(c)
Figure 5. , , and in top- recommendation for MovieLens-1M.
(a)
(b)
(c)
Figure 6. , , and in top- recommendation for Book-Crossing.
(a)
(b)
(c)
Figure 7. , , and in top- recommendation for Bing-News.

The results of all methods in CTR prediction and top- recommendation are presented in Table 3 and Figures 5, 6, 7, respectively. Several observations stand out:

  • CKE performs comparably poorly than other baselines, which is probably because we only have structural knowledge available, without visual and textual input.

  • SHINE performs better in movie and book recommendation than news. This is because the 1-hop triples for news are too complicated when taken as profile input.

  • DKN performs best in news recommendation compared with other baselines, but performs worst in movie and book recommendation. This is because movie and book names are too short and ambiguous to provide useful information.

  • PER performs unsatisfactorily on movie and book recommendation because the user-defined meta-paths can hardly be optimal. In addition, it cannot be applied in news recommendation since the types of entities and relations involved in news are too complicated to pre-define meta-paths.

  • As two generic recommendation tools, LibFM and WideDeep achieve satisfactory performance, demonstrating that they can make well use of knowledge from KG into their algorithms.

  • RippleNet performs best among all methods in the three datasets. Specifically, RippleNet outperforms baselines by to , to , and to on in movie, book, and news recommendation, respectively. RippleNet also achieves outstanding performance in top- recommendation as shown in Figures 5, 6, and 7. Note that the performance of top- recommendation is much lower for Bing-News because the number of news is significantly larger than movies and books.

Size of ripple set in each hop. We vary the size of a user’s ripple set in each hop to further investigate the robustness of RippleNet. The results of on the three datasets are presented in Table 4, from which we observe that with the increase of the size of ripple set, the performance of RippleNet is improved at first because a larger ripple set can encode more knowledge from the KG. But notice that the performance drops when the size is too large. In general, a size of or is enough for most datasets according to the experiment results.

Size of ripple set 2 4 8 16 32 64
MovieLens-1M 0.903 0.908 0.911 0.918 0.920 0.919
Book-Crossing 0.694 0.696 0.708 0.726 0.706 0.711
Bing-News 0.659 0.672 0.670 0.673 0.678 0.671
Table 4. The results of w.r.t. different sizes of a user’s ripple set.

Hop number. We also vary the maximal hop number to see how performance changes in RippleNet. The results are shown in Table 5, which shows that the best performance is achieved when is or . We attribute the phenomenon to the trade-off between the positive signals from long-distance dependency and negative signals from noises: too small of an can hardly explore inter-entity relatedness and dependency of long distance, while too large of an brings much more noises than useful signals, as stated in Section 4.4.

Hop number 1 2 3 4
MovieLens-1M 0.916 0.919 0.915 0.918
Book-Crossing 0.727 0.722 0.730 0.702
Bing-News 0.662 0.676 0.679 0.674
Table 5. The results of w.r.t. different hop numbers.

4.6. Case Study

Figure 8. Visualization of relevance probabilities for a randomly sampled user w.r.t. a piece of candidate news with label . Links with value lower than are omitted.

To intuitively demonstrate the preference propagation in RippleNet, we randomly sample a user with clicked pieces of news, and select one candidate news from his test set with label . For each of the user’s -hop relevant entities, we calculate the (unnormalized) relevance probability between the entity and the candidate news or its -order responses. The results are presented in Figure 8, in which the darker shade of blue indicates larger values, and we omit names of relations for clearer presentation. From Figure 8 we observe that RippleNet associates the candidate news with the user’s relevant entities with different strengths. The candidate news can be reached via several paths in the KG with high weights from the user’s click history, such as ”Navy SEAL”–”Special Forces”–”Gun”–”Police”. These highlighted paths automatically discovered by preference propagation can thus be used to explain the recommendation result, as discussed in Section 3.5.1. Additionally, it is also worth noticing that several entities in the KG receive more intensive attention from the user’s history, such as ”U.S.”, ”World War II” and ”Donald Trump”. These central entities result from the ripple superposition discussed in Section 3.5.2, and can serve as the user’s potential interests for future recommendation.

4.7. Parameter Sensitivity

(a) Dimension of embedding
(b) Training weight of KGE term
Figure 9. Parameter sensitivity of RippleNet.

In this section, we investigate the influence of parameters and in RippleNet. We vary from to and from to , respectively, while keeping other parameters fixed. The results of on MovieLens-1M are presented in Figure 9. We observe from Figure 8(a) that, with the increase of , the performance is boosted at first since embeddings with a larger dimension can encode more useful information, but drops after due to possible overfitting. From Figure 8(b), we can see that RippleNet achieves the best performance when . This is because the KGE term with a small weight cannot provide enough regularization constraints, while a large weight will mislead the objective function.

5. Conclusion and Future Work

In this paper, we propose RippleNet, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. RippleNet overcomes the limitations of existing embedding-based and path-based KG-aware recommendation methods by introducing preference propagation, which automatically propagates users’ potential preferences and explores their hierarchical interests in the KG. RippleNet unifies the preference propagation with regularization of KGE in a Bayesian framework for click-through rate prediction. We conduct extensive experiments in three recommendation scenarios. The results demonstrate the significant superiority of RippleNet over strong baselines.

For future work, we plan to (1) further investigate the methods of characterizing entity-relation interactions; (2) design non-uniform samplers during preference propagation to better explore users’ potential interests and improve the performance.

References

  • (1)
  • Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
  • Bauman et al. (2017) Konstantin Bauman, Bing Liu, and Alexander Tuzhilin. 2017. Aspect based recommendations: Recommending items with the most valuable aspects based on user reviews. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 717–725.
  • Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems. 2787–2795.
  • Chen et al. (2017) Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In SIGIR. ACM, 335–344.
  • Chen et al. (2018) Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential Recommendation with User Memory Networks. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining.
  • Cheng et al. (2016) Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016.

    Wide & deep learning for recommender systems. In

    Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7–10.
  • Dong et al. (2015) Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2015.

    Question Answering over Freebase with Multi-Column Convolutional Neural Networks. In

    ACL. 260–269.
  • Huang et al. (2017) Haoran Huang, Qi Zhang, and Xuanjing Huang. 2017. Mention Recommendation for Twitter with End-to-end Memory Network. In IJCAI.
  • Jamali and Ester (2010) Mohsen Jamali and Martin Ester. 2010. A matrix factorization technique with trust propagation for recommendation in social networks. In Proceedings of the 4th ACM conference on Recommender systems. ACM, 135–142.
  • Ji et al. (2015) Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In

    Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

    , Vol. 1. 687–696.
  • Koren (2008) Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 426–434.
  • Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009).
  • Li et al. (2017) Zheng Li, Yu Zhang, Ying Wei, Yuxiang Wu, and Qiang Yang. 2017. End-to-End Adversarial Memory Network for Cross-domain Sentiment Classification. In

    Proceedings of the 26th International Joint Conference on Artificial Intelligence

    .
  • Lin et al. (2015) Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In AAAI. 2181–2187.
  • Liu et al. (2017) Hanxiao Liu, Yuexin Wu, and Yiming Yang. 2017. Analogical Inference for Multi-Relational Embeddings. In

    Proceedings of the 34th International Conference on Machine Learning

    . 2168–2178.
  • Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119.
  • Miller et al. (2016) Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-value memory networks for directly reading documents. arXiv preprint arXiv:1606.03126 (2016).
  • Mnih et al. (2014) Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. 2014. Recurrent models of visual attention. In Advances in Neural Information Processing Systems. 2204–2212.
  • Nickel et al. (2016) Maximilian Nickel, Lorenzo Rosasco, Tomaso A Poggio, et al. 2016. Holographic Embeddings of Knowledge Graphs. In AAAI. 1955–1961.
  • Rendle (2012) Steffen Rendle. 2012. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 3 (2012), 57.
  • Rocktäschel et al. (2015) Tim Rocktäschel, Sameer Singh, and Sebastian Riedel. 2015. Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1119–1129.
  • Seo et al. (2017) Sungyong Seo, Jing Huang, Hao Yang, and Yan Liu. 2017. Interpretable convolutional neural networks with dual local and global attention for review rating prediction. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 297–305.
  • Sharma and Cosley (2013) Amit Sharma and Dan Cosley. 2013. Do social explanations work?: studying and modeling the effects of social explanations in recommender systems. In Proceedings of the 22nd international conference on World Wide Web. ACM, 1133–1144.
  • Sukhbaatar et al. (2015) Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems. 2440–2448.
  • Sun et al. (2017) Yu Sun, Nicholas Jing Yuan, Xing Xie, Kieran McDonald, and Rui Zhang. 2017. Collaborative Intent Prediction with Real-Time Contextual Data. ACM Transactions on Information Systems 35, 4 (2017), 30.
  • Tai et al. (2015) Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015.

    Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In

    Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. 1556–1566.
  • Tintarev and Masthoff (2007) Nava Tintarev and Judith Masthoff. 2007. A survey of explanations in recommender systems. In IEEE 23rd International Conference on Data Engineering Workshop. IEEE, 801–810.
  • Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning. 2071–2080.
  • Vig et al. (2009) Jesse Vig, Shilad Sen, and John Riedl. 2009. Tagsplanations: explaining recommendations using tags. In Proceedings of the 14th international conference on Intelligent user interfaces. ACM, 47–56.
  • Wang et al. (2018a) Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018a. Graphgan: Graph representation learning with generative adversarial nets. In AAAI. 2508–2515.
  • Wang et al. (2017c) Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, and Minyi Guo. 2017c. Joint Topic-Semantic-aware Social Recommendation for Online Voting. In Proceedings of the 2017 ACM Conference on Information and Knowledge Management. ACM, 347–356.
  • Wang et al. (2018b) Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, and Qi Liu. 2018b. Shine: Signed heterogeneous information network embedding for sentiment link prediction. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 592–600.
  • Wang et al. (2018c) Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018c. DKN: Deep Knowledge-Aware Network for News Recommendation. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1835–1844.
  • Wang et al. (2017b) Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. 2017b. Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification. In IJCAI.
  • Wang et al. (2017a) Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017a. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29, 12 (2017), 2724–2743.
  • Wang et al. (2017d) Xuejian Wang, Lantao Yu, Kan Ren, Guanyu Tao, Weinan Zhang, Yong Yu, and Jun Wang. 2017d. Dynamic attention deep model for article recommendation by learning human editors’ demonstration. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2051–2059.
  • Wang et al. (2014) Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014.

    Knowledge Graph Embedding by Translating on Hyperplanes. In

    AAAI. 1112–1119.
  • Weston et al. (2014) Jason Weston, Sumit Chopra, and Antoine Bordes. 2014. Memory networks. arXiv preprint arXiv:1410.3916 (2014).
  • Xie et al. (2016) Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2016. Representation Learning of Knowledge Graphs with Hierarchical Types.. In IJCAI. 2965–2971.
  • Xu et al. (2014) Chang Xu, Yalong Bai, Jiang Bian, Bin Gao, Gang Wang, Xiaoguang Liu, and Tie-Yan Liu. 2014. Rc-net: A general framework for incorporating knowledge into word representations. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 1219–1228.
  • Yang et al. (2015) Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the 3rd International Conference on Learning Representations.
  • Yu et al. (2014) Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining. 283–292.
  • Zhang et al. (2016) Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 353–362.
  • Zhang et al. (2014) Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping Ma. 2014.

    Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In

    Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 83–92.
  • Zhao et al. (2017) Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, and Dik Lun Lee. 2017. Meta-graph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 635–644.
  • Zhong et al. (2015) Huaping Zhong, Jianwen Zhang, Zhen Wang, Hai Wan, and Zheng Chen. 2015. Aligning knowledge and text embeddings by entity descriptions. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 267–272.
  • Zhou et al. (2017) Guorui Zhou, Chengru Song, Xiaoqiang Zhu, Xiao Ma, Yanghui Yan, Xingya Dai, Han Zhu, Junqi Jin, Han Li, and Kun Gai. 2017. Deep interest network for click-through rate prediction. arXiv preprint arXiv:1706.06978 (2017).