ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation

05/25/2020 ∙ by Yufei Feng, et al. ∙ Ant Financial Taobao 0

Recommender system (RS) devotes to predicting user preference to a given item and has been widely deployed in most web-scale applications. Recently, knowledge graph (KG) attracts much attention in RS due to its abundant connective information. Existing methods either explore independent meta-paths for user-item pairs over KG, or employ graph neural network (GNN) on whole KG to produce representations for users and items separately. Despite effectiveness, the former type of methods fails to fully capture structural information implied in KG, while the latter ignores the mutual effect between target user and item during the embedding propagation. In this work, we propose a new framework named Adaptive Target-Behavior Relational Graph network (ATBRG for short) to effectively capture structural relations of target user-item pairs over KG. Specifically, to associate the given target item with user behaviors over KG, we propose the graph connect and graph prune techniques to construct adaptive target-behavior relational graph. To fully distill structural information from the sub-graph connected by rich relations in an end-to-end fashion, we elaborate on the model design of ATBRG, equipped with relation-aware extractor layer and representation activation layer. We perform extensive experiments on both industrial and benchmark datasets. Empirical results show that ATBRG consistently and significantly outperforms state-of-the-art methods. Moreover, ATBRG has also achieved a performance improvement of 5.1 recommendation scenario of Taobao APP.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Figure 1. The comparison of our proposed framework ATBRG with previous models. (a) & (b) indicate the limitations of path based and GNN based methods, respectively, while (c) shows the superiority of ATBRG .

In the era of information overload, recommender system (RS), which aims to match diverse user interests with tremendous resource items, are widely deployed in various online services, including e-commerce Sarwar et al. (2001); Zhou et al. (2018b); Xu et al. (2019), social media Covington et al. (2016); Zhou et al. (2019b) and news Das et al. (2007); Wang et al. (2018b). Traditional recommendation methods, e.g., matrix factorization  (Koren et al., 2009), mainly learn an effective preference prediction function using historical user-item interaction records. Despite effectiveness, these methods suffer from cold-start problem due to data sparsity. With the rapid development of web services, some approaches (Hu et al., 2018; Huang et al., 2018) are proposed to incorporate various auxiliary data for improving recommendation performance.

Recently, knowledge graph (KG), which is flexible to model comprehensive auxiliary data, has attracted increasing attention in RS (Wang et al., 2019c, a, b, 2018a; Cao et al., 2019; Wang et al., 2019d). Generally, KG stores external heterogeneous knowledge in the ternary form , corresponding to attribute (e.g., or relationship (e.g., ) of entities. Due to its abundant information, current recommender systems mainly aim to incorporate KG to enrich representations of users and items and promote the interpretability of recommendations.

Though with great improvements, it remains challenging to effectively integrate such heterogeneous information for recommendation. Roughly speaking, state-of-the-art KG based recommendation methods mainly fall into two groups, path based and graph neural network (GNN) based methods. Path based methods (Wang et al., 2019c) infer user preference by exploring multiple meta-paths for target user-item pairs over KG, which typically requires domain knowledge. More importantly, this type of methods ignores rich structural information implied in KG, and thus cannot sufficiently characterize the underlying relationships between given target user and item. As illustrated in Fig. 1, these methods essentially overlook the strong relationships between Blouse and Dress, since each extracted path is modeled independently.

Inspired by the recently emerging graph neural networks, several GNN based methods Wang et al. (2019a, b) have been proposed and provide strong performance by explicitly modeling high-order connectivities in KG. Nevertheless, these methods still suffer from three limitations: (L1) These methods mainly apply GNN to enrich the representation of target user and item separately by aggregating their own original neighbors in the KG, and thus fail to capture their mutual influence during the procedure of information aggregation. As shown in Fig. 1 (b), current GNN based methods tend to produce representation for target item through aggregating its neighbors without considering target user’s interests (history behaviors). Subsequently, some unnecessary information (i.e., Cup) is involved in target item’s refined embedding, which may harm recommendation performance; (L2) The KG in the real-world industrial scenario is extremely large-scale, where one entity can be linked with up to millions of items. Existing works mainly employ the random sampling on the neighbors beforehand, which may lose latent critical information for the specific target user and item. As shown in Fig. 1 (b), some neighbors (i.e., Shirt) are abandoned by the random sampling strategy, while they are usually informative during aggregation since the target user has engaged with them; (L3) Most of these methods neglect rich relations among user behaviors over KG, while some works Feng et al. (2019); Zhou et al. (2019a) have demonstrated that capturing the relations among user behaviors is also beneficial for expressing user preference.

To address above limitations, we aim to distill the original over-informative KG into recommendation in a more effective way, which is expected to satisfy the following key properties: (1) Target-behavior: we hang on the novel insight that an effective KG base recommendation should produce semantic sub-graph to adapt for each target user-item pair, with the aim of capturing the underlying mutual effect characterized by KG (L1); (2) Adaptive: distinct from random sampling on the whole KG, our idea is to follow the adaptive principle for the sub-graph construction, which adaptively preserves useful information connecting user behaviors and target item over the KG, driving our model to provide more effective recommendation (L2); (3) Relational: the model architecture should be designed to relation-aware in order to consider the rich relations among user behaviors and target item over KG (L3). For convenience, given a target user-item pair, we call the relational graphical structure bridging user behaviors (i.e., historical click records) with the target item as adaptive target-behavior relational graph (shown in Fig. 1(c)). Propagating user preference on such a relational structure potentially takes full advantage of the mutual effect for target user-item pair, as well as comprehensively captures the structural relations derived from KG.

In this paper, by integrating above main ideas together, we propose a new framework named Adaptive Target-Behavior Relational Graph Network (ATBRG), which is comprised of two main parts: (1) Graph construction part. To extract the effective relational sub-graph of for target the user-item pair over the KG adaptively, we propose the graph connect and graph prune techniques. Firstly, we explore multiple layer neighbors over KG for target item and each item in user behavior, respectively. Among these entity sets, we connect entities which appear in multiple entity sets and prune entities belonging to only one entity set. Subsequently, we construct the adaptive target-behavior relational graph, which characterizes the structural relations among user behaviors and target item over the KG. (2) Model part. Considering structural relations derived from KG, we technically design the relation-aware extractor layer, which employs relation-aware attention mechanism to aggregate structural knowledge over the relational graph for each user behavior and target item. Afterwards, we introduce the representation activation layer to activate the relative relational representations of user behavior w.r.t. that of target item.

The main contributions of our work are summarized as follows:

  • To effectively characterize structural relations between the given target user and item, we propose to extract an adaptive target-behavior relational graph, where the graph connect and graph prune strategies are developed to adaptively build relations between user behaviors and target item over KG.

  • We propose a novel framework ATBRG, a well-designed graph neural network based architecture to learn relational representations of user behaviors and target item over the extracted sub-graph. Moreover, we equip it with relation-aware extractor layer and representation activation layer for emphasizing rich relations for interaction in KG.

  • We perform a series of experiments on a benchmark dataset from Yelp and an industrial dataset from Taobao App. Experimental results demonstrate that ATBRG consistently and significantly outperforms various state-of-the-art methods. Moreover, ATBRG has been successfully deployed in one popular recommendation scenario of Taobao APP and gained a performance improvement of 12.1% on CTR metric.

2. Related Work

In this section, we review the most related studies in behavior based and knowledge aware recommendation.

2.1. Behavior based recommendation

In the early stage of recommendation, researchers focus on recommending a suitable list of items based on historical user-item interaction records. In particular, a series of matrix factorization based methods (Koren et al., 2009) have been proposed to infer user preference towards items through learning latent representations of users and items. Due to the ability of modeling complex interaction between users and items, deep neural network based methods (e.g., YoutubeNet (Covington et al., 2016), DeepFM (Guo et al., 2017)) are widely adopted in industrial recommender systems, and reveal the remarkable strength of incorporating various context information (e.g., user profile and item attributes).

In the online e-commerce systems, we are particularly interested in user’s historical behaviors, which implies rich information for inferring user preference. Hence, how to effectively characterize the relationships between user behaviors and target item remains a continuous research topic. DIN (Zhou et al., 2018b) adaptively learns the representation of user interests from historical behaviors w.r.t. the target item by the attention mechanism. Inspired by DIN, the majority of following up works inherit this kind of paradigm. GIN (Li et al., 2019) mines user intention based on co-occurrence commodity graph in the end-to-end fashion. ATRANK (Zhou et al., 2018a) proposes an attention-based behavior modeling framework to model users’ heterogeneous behaviors. DIEN (Zhou et al., 2019a) and SDM (Lv et al., 2019) devote to capturing users’ temporal interests and modeling their sequential relations. DSIN (Feng et al., 2019) focuses on capturing the relationships of users’ inter-session and intra-session behaviors. MIMN (Pi et al., 2019) and HPMN (Ren et al., 2019)

apply the neural turing machine to model users’ lifelong sequential behaviors. Besides these improvements, knowledge graph, consisting of various semantics and relations, emerges as an assistant to describe relationships between user behaviors and target item.

2.2. Knowledge Aware Recommendation

As a newly emerging direction, knowledge graph is widely integrated into recommender systems for enriching relationships among user behaviors and items. A research line utilizes KG aware embeddings (e.g., structural embeddings (Zhang et al., 2016) and semantics embeddings (Wang et al., 2018b)) to enhance the quality of item representations. These methods conduct mutli-task learning within two tasks of recommendation and KG completion and share the embeddings, and thus can hardly take full advantage of high-order information over KG. On the contrary, several efforts (Hu et al., 2018; Wang et al., 2019c) have been made to explore different semantic path (meta-path) connecting target users and items over KG, and then learn prediction function through multiple path modeling. More recently, some works (Xian et al., 2019)

propose to exploit reinforcement learning to explore useful path for recommendation. Despite effectiveness, the path based method ignores rich structural information implied in KG since each extracted path is modeled independently.

Recently, graph neural network has shown its potential in learning accurate node embeddings with the high-order graph topology. Taking advantages of information propagation, RippleNet (Wang et al., 2018a) propagates users’ potential preferences and explores their hierarchical interests over KG, while KGCN-LS (Wang et al., 2019a) and KGAT (Wang et al., 2019b) perform embedding propagation by stacking multiple KG aware GNN layers. Although GNN based methods have achieved performance improvement to some extent, they do not take mutual influence between target user behaviors and item into consideration in the procedure of information aggregation. Moreover, the exponential neighborhood expansion over graph extremely increases the complexity of the system.

3. Preliminary

In a recommendation scenario (e.g., e-commerce and news), we typically have a series of historical interaction records (e.g., purchases and clicks) between users and items. Let denote a set of users and denote a set of items, we denote interaction records as . Here, represents historical behaviors (i.e., item list) for user when recommending item and is the implicit feedback of user w.r.t. item i, where when interaction is observed, and = 0 otherwise. In the real-world industrial recommender systems, each user is associated with a user profile consisting of sparse features (e.g., user id and gender) and numerical features (e.g., user age), while each item is also associated with a item profile consisting of sparse features (e.g., item id and brand) and numerical features (e.g., price).

In order to effectively incorporate auxiliary information of items (i.e., item attributes and external knowledge) into recommendation, we frame our recommendation task over knowledge graph, which can be defined as follows:

Definition 1 ().

Knowledge Graph. A KG is defined as a directed graph with an entity set and a relation set . Each triplet denotes a fact that there is a relationship from head entity to tail entity ,where and .

For example, states the fact that Blouse belongs to the Shirt Category. To bridge knowledge graph with recommender system, we adopt a item-entity alignments function to align items with entities in KG.

Many efforts, especially GNN based methods, have been made to leveraging KG for better recommendation. While, most of these works overlook the mutual effect between target user and item when exploiting structural information derived from KG. To effectively distill structural knowledge through KG based GNN for item recommendation, we particularly investigate into the external knowledge connecting user behaviors and target item in KG, which can reveal semantic context for user-item interactions. Formally, we define such context information as follows:

Definition 2 ().

Adaptive Target-Behavior Relational Graph. Given a target user-item pair and corresponding user behaviors , an adaptive target-behavior relational graph w.r.t. is defined as a sub-graph extracted from the original KG, connecting user behavior and target item .

Given the above preliminaries, we now formulate the recommendation task to be addressed in this paper:

Definition 3 ().

Task Description. Given a knowledge graph with historical interaction records , for each user-item pair

, we aim to predict probability

that user would click item .

4. The Proposed Framework

Notations Description
,
the set of users and items, respectively
the set of historical interaction records
the label and the predicted probability
, ,
user profile, item profile and user behaviors
the specific item in user behavior
the knowledge graph and adaptive target-behavior relational graph, respectively
the set of entity, relation and triples in knowledge graph, respectively
the specific triple in knowledge graph
the embedding of user , item , entity and relation , respectively
the neighbors set in -th relation-aware extractor layer for entity
the relational representation for target item and each item , respectively
the final representation of user
Table 1. Notations.
Figure 2. Overview of the proposed ATBRG framework. Overall, ATBRG consists of two parts, graph construction and model architecture.

In this section, we introduce our proposed framework ATBRG, which aims to take full advantage of knowledge graph for recommendation. The framework is shown in Fig. 2, which is composed of two modules: (1) To effectively extract structural relational knowledge for recommendation, we propose to construct the adaptive target-behavior relational graph for the given target user-item pair over knowledge graph, where the graph connect and graph prune techniques help mine high-order connective structure in an automatic manner; (2) To jointly distill such a relational graph and rich relations among user behaviors in an end-to-end framework, we elaborate on the model design of ATBRG, which propagates user preference on the sub-graph with relation-aware extractor layer and representation activation layer. The key notations we will use throughout the article are summarized in Table 1.

4.1. Graph Construction

A major novelty of our work is to effectively explore adaptive target-behavior relational graph for improving the modeling of the interaction. In this part, we introduce the strategy to construct the adaptive target-behavior relational graph with the proposed graph connect and graph prune techniques. To model the relationship between the given target user-item over KG, previous works either extract different paths through random walk (Wang et al., 2019c), or directly leverage the neighbors of target item over the original KG Wang et al. (2019b, a). Unfortunately, the first strategy neglects the structural relational information of the KG, while the second ignores the mutual effect between user behaviors and target item. Hence, we argue that above two strategies only achieve the suboptimal performance for recommendation.

Intuitively, the reasons driving a user to click a target item maybe implied by his/her historical behaviors, which is expected to guide our model to adequately aggregate useful information over external KG in an automatic manner. To distill the structural relational information over the KG in a more effective way, we propose to construct the adaptive relational graph w.r.t. user behaviors and target item. The procedure of the graph construction is clearly presented in the Algorithm 1 and left part of Fig. 2. Specifically, given a target user-item pair , we firstly exhaustively search the multi-layer entity neighbors for user behaviors and target item over the KG, and restore the paths connecting the entity and item into (lines 1-6). Through this, we connect the user behaviors and target item by multiple overlapped entities. Afterwards, for the entities in , we prune the entities which do not connect different items. (lines 7-16). Finally, we get the relational graph for user and target item , which describes the structural relations for over the KG.

1:Target item ; User behavior ; Knowledge graph ;
2:: Adaptive target-behavior relational graph for ;
3:for item [, do:
4:     for entity  do:
5:         Construct path = (, , , …, , , );
6:         [] ; Graph connect.
7:     end for
8:end for
9:for entity  do:
10:     New item hash set ;
11:     for path  do:
12:         Collect item on the path;
13:         ;
14:     end for
15:     if  then:
16:         Prune in ; Graph prune.
17:     end if
18:end for
Algorithm 1 Graph construction

4.2. Model Architecture

After obtaining the adaptive target-behavior relational graph derived from the KG, we continue to study how to produce predictive embeddings for target user-item pairs through propagating user preference over such a sub-graph. As shown in the right part of Fig. 2, the model architecture of our proposed ATBRG is composed of four layers: 1) Embedding layer, which transforms high-dimensional sparse features into low-dimensional dense representations; 2) Relation-aware extractor layer, which produces knowledge aware embeddings for user behaviors and target item by aggregating structural relational information over adaptive target-behavior relational graph; 3) Representation activation layer, which activates the relative relational representations of user behaviors w.r.t. that of target item. 4) Feature interaction layer, which combines the user and item profile with the activated relational representation of user behaviors and target item for interaction.

4.2.1. Embedding Layer

As mentioned above, users and items in real-world recommendation scenario are both associated with abundant profile information in the form of sparse and dense features. Hence, we set up a embedding layer to parameterize users and items as vector representations, while preserving the above profile information. Formally, giving a user

, we have corresponding raw feature space , comprised of sparse feature space and dense feature space . For sparse features, following Covington et al. (2016); Zhou et al. (2018b, 2019a); Feng et al. (2019), we embed each feature value into

dimensional dense vector, while dense feature can be standardized or batch normalization to ensure normal distribution. Subsequently, each user

can be represented as , where and denotes the size of sparse and dense feature space of user , respectively. Similarly, we represent each item as . Moreover, each entity and relation in the adaptive target-behavior relational graph can also be embeded as and  111Entities and relations in KG only have the sparse id features.

4.2.2. Relation-aware Extractor Layer

This layer is designed to effectively and comprehensively distill the structural relational information from the extracted sub-graph. Previous works Veličković et al. (2018); Kipf and Welling (2017) neglect relational edges during aggregation, which play essential roles in real-world settings. In our scenario, a user may click or buy the same item , while the relations click and buy obviously indicate the different preference of user towards item . Therefore, we elaborately build the relation-aware extractor layer to adequately exploit rich structural relational information in KG in the consideration of various relation between entities.

Based on the above discussions and inspired by the study (Gong and Cheng, 2019), we stack the relation-aware extractor layer by layer in order to recursively propagates the embeddings from an entity’s neighbors to refine the entity’s embedding in KG. Specifically, for each item (i.e., item or target item ), we will regard it as the center node and aggregate information over the extracted sub-graph through relation-aware aggregation. Given an entity 222For convenience, we omit the subscript in this part. in extracted relational sub-graph , let to denote the neighbors set in -th layer and to denote the representation of entity in the -th layer. We implement the -th relation-aware aggregation layer as follows,

(1)

Here, and

denote the single layer perceptron and attentive matrix in

-th layer, respectively. And denotes the concatenation operation. Relation-aware extractor layer is stacked layer by layer to propagate user preference over KG. Subsequently, each entity in sub-graph can be denoted as after relation-aware extractor layer.

Given a target user-item pair and the corresponding adaptive target-behavior relational graph , the knowledge aware representation of target item can be denoted as , where . Similarly, we also obtain relational representation set for user behaviors.

4.2.3. Representation Activation Layer

Intuitively, user behaviors contribute differently to the final prediction. For example, the behavior is more informative than when the target item is . For this purpose, we set a representation activation layer to place different importance on relational representation of user behaviors . Specifically, we apply the vanilla attention mechanism (Bahdanau et al., 2015) to activate representations of user behaviors that are more related to target item, calculated as follows,

(2)

where is the attentive matrix in representation activation layer.

4.2.4. Feature Interaction Layer

Until now, given a target user-item pair , we have the profile embeddings for user and item , and knowledge aware embedding from adaptive target-behavior relational graph for user behaviors and target item. We combine the four embedding vectors into a unified representation and employ Multiple Layer Perceptron (MLP) for better feature interaction Zhou et al. (2018b, a); Feng et al. (2019); Pi et al. (2019); Zhou et al. (2019a).

(3)

where is the logistic function and represents the prediction probability of the user to click on the target item .

4.2.5. Loss Function

We reduce the task to a binary classification problem and use binary cross-entropy loss function defined as follows:

(4)

where is the training dataset and represents whether the user clicked on the target item .

5. Experiments

In this section, we perform a series of experiments on two real-world datasets, with the aims of answering the following research questions:

  • RQ1: How does our proposed model ATBRG perform compared with state-of-the-art methods on the recommendation task?

  • RQ2: How do different experimental settings (i.e., depth of graph, aggregator selection, etc.) influence the performance of ATBRG?

  • RQ3: How does ATBRG provide effective recommendation intuitively?

Description Taobao Yelp
#Users 2.2 4.5
User-Item #Items 1.1 4.5
Interaction #Interactions 7.2 1.0
Knowledge #Entities 1.4 8.3
#Relations 34 35
Graph #Triplets 3.8 1.6
#Max neighbor depth 3 1
Table 2. Statistics of datasets

5.1. Experimental Setup

5.1.1. Datasets

We conduct extensive experiments on two real-world datasets: industrial dataset from Taobao and benchmark dataset from Yelp.

  • Taobao 333www.taobao.com. dataset consists of click logs from 2019/08/22 to 2019/08/29, where the first one week’s samples are used for training and samples of the last day are for testing. Moreover, Taobao dataset also contains user profile (e.g., id and age), item profile (e.g., id and category) and up to 10 real-time user behaviors 444Real-time user behaviors means user behaviors before this action occurs..

  • Yelp 555www.yelp.com/dataset. dataset records interactions between users and local business and contains user profile (e.g., id, review count and fans), item profile (e.g., id, city and stars) and up to 10 real-time user behaviors. For each observed interaction, we randomly sample 5 items that the target user did not engage with before as negative instances. For each user, we hold the latest 30 instances as the test set and utilizes the remaining data for training.

Besides user behaviors, following Luo et al. (2020); Shen et al. (2014); Zhao et al. (2019), we construct item knowledge for Taobao (e.g., category, parent and style). Also, for Yelp dataset, KG is organized as the local business information (e.g., location and category). The detailed descriptions of the two datasets are shown in Table 2. Note that the volume of Taobao dataset is much larger than yelp, which brings more challenges.

5.1.2. Baselines

We compare our ATBRG with three kinds of representative methods: feature based methods (i.e., YoutubeNet and DeepFM) mainly utilizing raw features derived from user and item profile, behavior based methods (i.e., DIN, DIEN and DSIN) capturing user’s historical behaviors and knowledge graph (KG) based methods (i.e., RippleNet, KGAT and KPRN) benefiting from knowledge graphs in recommendation. The comparison methods are given below in detail:

  • YoutubeNet (Covington et al., 2016) is a standard user behavior based method in the industrial recommender system.

  • DeepFM (Guo et al., 2017) combines factorization machine and deep neural network for recommendation.

  • DIN (Zhou et al., 2018b) locates related user behaviors w.r.t. target itemby using attention mechanism.

  • DIEN (Zhou et al., 2019a) models users temporary interests and the interest evolving process via GRU with attention update gate.

  • DSIN (Feng et al., 2019) models user’s session interests and the evolving process with self-attention mechanism and Bi-LSTM.

  • RippleNet (Wang et al., 2018a) propagates user’s potential preferences over the set of knowledge entities.

  • KPRN (Wang et al., 2019c) is a typical path based recommendation method, which extracts qualified path to between a user with an item.

  • KGAT (Wang et al., 2019b) is a state-of-the-art KG-based recommendation methods, which employs GNN on KG to generate representations of users and items, respectively.

5.1.3. Evaluation Metrics

We adopt area under ROC curve (AUC) to evaluate the performance of all methods. Larger AUC indicates better performance. Besides, we also present the relative improvement (RI) w.r.t. AUC of our model achieves over the compared models, which can be formulated as:

(5)

where is the absolute value, refers to our proposed framework ATBRG and refers to the baseline. Note that 0.001 improvement w.r.t. AUC is remarkable in industrial scenario (i.e., Taobao dataset).

5.1.4. Implementation

We implement all models in Tensorflow 1.4. Moreover, for fair comparison, pre-training, batch normalization and regularization are not adopted in our experiments. For RippleNet, we set the max depth of ripple as 3. For KGAT, the max neighbour depth of target user and item is set to 4 and 3, respectively. For KPRN, the max number of extracted paths over the knowledge graph are set to 50. For ATBRG, the max neighbor depth of the item is set to 3. For all models, We employ random uniform to initialize model parameters and adopt Adagrad as optimizer using a learning rate of 0.001. Moreover, embedding size of each feature is set to 4 and the architecture of MLP is set to [512, 256, 128]. We run each model three times and reported the mean of results.

5.1.5. Significance Test

For Experimental results in Tables 4, 5, 6 and 7, we use “*” to indicate that ATBRG is significantly different from the runner-up method based on paired t-tests at the significance level of 0.01.

Model
Taobao Yelp
AUC RI AUC RI
YoubtubeNet 0.6017 +2.72% 0.7109 +26.00%
DeepFM 0.6037 +2.38% 0.7334 +22.14%
DIN 0.6058 +2.03% 0.7520 +19.12%
DIEN 0.6061 +1.97% 0.7581 +18.16%
DSIN 0.6073 +1.77% 0.7774 +15.23%
RippleNet 0.5975 +3.44% 0.7324 +22.31%
KGAT 0.6062 +1.96% 0.7876 +13.73%
KPRN 0.6096 +1.39% 0.8260 +8.45%
ATBRG 0.6181 - 0.8958 -
  • Note that the relative improvement on the public Yelp dataset is much higher than the industrial Taobao dataset, since the negative samples of the public Yelp dataset are generated by random sampling and thus easier to distinguish.

Table 3. Overall performance comparison w.r.t. AUC (bold: best; underline: runner-up).

5.2. Performance Comparison (RQ1)

We report the AUC comparison results of ATBRG and baselines on two datasets in Table 3. The major findings from the experimental results are summarized as follows:

  • Feature based methods (i.e., YoutubeNet and DeepFM) achieve relatively pool performance on two datasets. It indicates that handcrafted feature engineering is insufficient to capture the complex relations between users and items, further limiting performance. Moreover, DeepFM consistently outperforms YoutubeNet across all cases, since it employs FM part for better feature interaction.

  • Compared to feature based methods, the performance of behavior based methods (i.e., DIN, DIEN and DSIN) verifies that incorporating historical behaviors is beneficial to infer user’s preference. Among them, DSIN achieves the best performance on both datasets due to integration of user’s session interests.

  • Generally, KG based methods (i.e., RippleNet, KGAT, KPRN) achieve better performance than behavior based methods in most cases, which indicates the effectiveness of knowledge graph for capturing underlying interaction between users and items. However, RippleNet underperforms other baselines on both datasets. One possible reason is that RippleNet ignores user’s short-term interest implied in historical behaviors. Moreover, KPRN generally achieves remarkable improvements in most cases. It makes sense since reasonable and explainable target user-item paths extracted from KG are helpful to improve recommendation performance.

  • ATBRG consistently yields the best performance on both datasets. In particular, ATBRG improves over the best baseline w.r.t. AUC by 1.39%, and 8.45% on Taobao and Yelp dataset, respectively. By stacking multiple GNN layers, ATBRG is capable of exploring rich structural and relational information over KG, while KPRN only models each extracted path independently. This verifies the importance of capturing both semantics and topological structures derived from KG for recommendation. Besides, compared with KGAT, which only represents target user and item separately by aggregating their own neighbors over the original KG, ATBRG achieves better performance for the following two reasons: 1) ATBRG considers the mutual effect between the given user behaviors and target item by constructing the adaptive relational sub-graph for them. Propagating on such a sub-graph can better capture the structural relations between user behaviors and target item and further explore potential reasons driving the user to click the target item; 2) ATBRG integrates relations when aggregating the entities by the relation-aware attention mechanism, and creatively produces the relational representations over the extracted sub-graph for each user behavior and target item.

5.3. Study of ATBRG (RQ2)

Model
Taobao Yelp
AUC RI AUC RI
ATBRG 0.6157 +0.38% 0.8858 +1.12%
ATBRG 0.6125 +0.91% 0.8940 +0.20%
ATBRG 0.6181 - 0.8958 -
Table 4. Effect of the representation activation layer and relation-aware mechanism.
Model
Taobao Yelp
AUC RI AUC RI
ATBRG 0.6054 +2.09% 0.7523 +19.07%
ATBRG 0.6143 +0.61% 0.8958 -
ATBRG 0.6181 - - -
ATBRG 0.6163 +0.29% - -
  • ATBRG means ATBRG explores layers of neighbors over the extracted sub-graph, which corresponds to layers of neighbors in original KG.

Table 5. Effect of the depth of neighbor.
Model
Taobao Yelp
AUC RI AUC RI
ATBRG 0.6131 +0.64% 0.8901 +0.65%
ATBRG 0.6133 +0.78% 0.8906 +0.58%
ATBRG 0.6145 +0.58% 0.8928 +0.33%
ATBRG 0.6159 +0.35% 0.8946 +0.13%
ATBRG 0.6181 - 0.8958 -
Table 6. Effect of different aggregators.

In this section, we perform a series of experiments to better understand the traits of ATBRG, including well-designed components (e.g., relation-aware mechanism and representation activation layer) and key parameter settings (i.e., neighrbor depth and aggregator).

5.3.1. Effect of Relation-aware Mechanism and Representation Activation Layer

ATBRG provides a principled way to characterize various relations in KG and user behaviors to enhance recommendation performance. To examine the effectiveness of relation-aware mechanism and representation activation layer, we prepare three variants of ATBRG:

  • ATBRG: The variant of ATBRG, which removes the relation-aware mechanism (Eq. 1).

  • ATBRG: The variant of ATBRG, which removes the representation activation layer (Eq. 2).

The AUC comparison results of ATBRG with its variants are show in Table 4. We have the following two observations:

  • It is clear that the performance of ATBRG degrades without the relation-aware mechanism on both datasets (i.e.,ATBRG ATBRG). It demonstrates that different relations in KG should be distinguished, as disregarding such information leads to the worse performance.

  • ATBRG without the representation activation layer performs worse consistently (i.e.,ATBRG ATBRG). It indicates that capturing the semantic relations among user behaviors over the KG can better understand user underlying preference, which is beneficial for the final prediction.

Figure 3. Impact of the number of nodes in adaptive target-behavior relational graph w.r.t. CTR.
Figure 4. An illustrative example of how our proposed ATBRG works more effective than other knowledge graph based methods. (a) indicates the sampled neighbors of the target item over the knowledge graph. (b) & (c) introduce the specific user behaviors of users and the corresponding extracted sub-graph. The red number above the edge implies the calculated weights.

5.3.2. Effect of Neighbor Depth

The proposed ATBRG model is flexible to capture high-order structural information through recursively aggregating the embeddings from an entity’s neighbors to refine the entity’s embedding in KG. Here, we investigate how the neighbor depth over KG influences the model performance. Specifically, the neighbor depth of item is explored in the range of {0, 1, 2, 3} in Taobao dataset and {0, 1} for Yelp dataset. We summarize the results in Table 5 and have the following two observations:

  • Overall, the model performance increases gradually when neighbor depth varies from 0 to 2 on both datasets. It demonstrates that deepening the neighbor layer helps capture the long-term structural relations of user behaviors and target item to some extent.

  • The model performance of ATBRG degrades when the neighbor depth increases from 2 to 3. One possible reason is that long-term relations may include much more ineffective connectives (i.e., - - - -). Such relations in the graph introduce some noise and further harm the model performance.

5.3.3. Effect of Aggregator

In our model, we enrich the information of items by recursively capturing their neighbor information in KG. In order to explore the effect of different neighbor aggregator, we design different variants of ATBRG, listed as follows:

  • ATBRG: It applies concatenation operation (Wang et al., 2019a).

  • ATBRG: It applies sum pooling.

  • ATBRG: It applies self-attention mechanism (Veličković et al., 2018).

  • ATBRG: It applies nonlinear transformation (Wang et al., 2019a).

We present the AUC comparison of ATBRG and its variants in Table 6. From the result, we have the following findings:

  • Obviously, ATBRG with simple aggregators (i.e.,ATBRG and ATBRG) performs worst on both datasets, since they ignore the different contributions of neighbors.

  • Generally, ATBRG with complex aggregators (i.e.,ATBRG and ATBRG) achieves better performance on both datasets. The reason is that ATBRG employs self-attention mechanism to place different importance on neighbors while ATBRG leverages nonlinear trasformation to characterize complex interaction.

  • ATBRG consistently yields the best performance on both datasets. It illustrates that, our proposed relation-aware aggregator not only includes the nonlinear transformation in the weight calculation, but also considers the influence of relations during aggregation.

5.4. Case Study (RQ3)

To better understand the merits of our proposed ATBRG intuitively, we first make comprehensive instance-level analyses on the adaptive target-behavior relational graph. As shown in Fig. 3, we present the influence of the number of nodes over the relational graph on the click through rate (CTR). Here, CTR is calculated by averaging the real labels (1 for click and 0 otherwise). It can be clearly observed that CTR and the number of nodes are positively correlated on both datasets. This demonstrates that, the richer the relations between user behavior and target item on the extracted sub-graph, the more likely the user is to click on target item.

Moreover, with the aims of answering how ATBRG addresses the limitations (described in Section 1) existed in previous GNN based methods for knowledge-aware recommendation, we conduct one case study in large-scale industrial Taobao dataset. As shown in Fig. 4, the main findings are summarized as follows:

  • In part (a), due to the limitation (L2), the original neighbors of the target item over the knowledge graph are randomly sampled beforehand. Hence, some relevant entities (i.e., and ) connecting users are discarded, while other ineffective entities (i.e., and ) are reserved, which inevitably introduce noises. It demonstrates that previous methods are incapable of adaptively sampling neighbors for target user-item pairs, and further harm the recommendation performance.

  • In part (b), we present the users’ recent behaviors, where some behaviors (i.e., and ) are related to target item over the knowledge graph while others (i.e., and ) are not. By the graph connect and prune techniques, we adaptively preserve the effective entities and relations over knowledge graph (L2). Subsequently, in part (c), we construct the specific adaptive target-behavior relational graph for the given target user-item pair, which provides strong evidences for inferring user preference. Propagating embeddings on on such a relational structure can take full advantage of the mutual effect of target user-item pair for recommendation (L1).

  • In order to consider the rich relations among user behaviors over KG, we propose the relation-aware extractor layer to weigh various underlying preferences for recommendation. Compared with part (a), we find the weights are also adaptive for different users (L3). Specifically, is paid more attention to and scored higher than by the user A, while it is the opposite for user B. Therefore, the final relational representation can reflect the personalized preferences of different users towards the target item.

5.5. Online A/B Testing

Figure 5. The deployment of ATBRG in Taobao APP.

To verify the effectiveness of our proposed framework ATBRG in the real-world settings, ATBRG has been deployed in the popular recommendation scenario of Taobao APP. As shown in Fig. 5, the deployment pipelines consist of three parts: 1) User response. Users give implicit feedback (click or not) to the recommended items provided by the recommender system; 2) Offline training. In this procedure, we integrate the knowledge graph , user behaviors and target item to construct the adaptive target-behavior relational graph . Afterwards, , together with user profile and item profile makes up the instances, and are fed into ATBRG for training; 3) Online serving. When the user accesses Taobao APP, some candidates items are generated by the pipelines before real-time prediction (RTP) service. The necessary components of ATBRG are achieved and organized in the same way as the offline training. At last, the candidates are ranked by the predicting scores of ATBRG, and truncated for the final recommend results.

Compared with existed deployed baseline model DIN, 6.8% lift on click count and 5.1% lift on CTR are observed for ATBRG, with the cost of 8 milliseconds for online inference. The promotion of recommendation performance verifies the effectiveness of our proposed framework ATBRG.

6. Conclusion

In this paper, we propose a novel framework ATBRG for knowledge aware recommendation. To effectively characterize the structure relations over KG, we propose the graph connect and graph prune techniques to construct adaptive target-behavior relational graph. Furthermore, we elaborate on the model design of ATBRG, equipped with relation-aware extractor layer and representation activation layer, which aims to take full advantage of structural connective knowledge for recommendation. Extensive experiments on both industrial and benchmark datasets demonstrate the effectiveness of our framework compared to several state-of-the-art methods. Moreover, ATBRG has also achieved 10.1% improvement on CTR metric in online experiments after successful deployment in one popular recommendation scenario of Taobao APP. In the future, we will consider applying causal inference in KG to improve the interpretability of recommender system.

References

  • D. Bahdanau, K. Cho, and Y. Bengio (2015) Neural machine translation by jointly learning to align and translate. In ICLR, Cited by: §4.2.3.
  • Y. Cao, X. Wang, X. He, Z. Hu, and T. Chua (2019) Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In WWW, pp. 151–161. Cited by: §1.
  • Covington, Paul, Adams, Jay, Sargin, and Emre (2016) Deep neural networks for youtube recommendations. In RecSys, pp. 191–198. Cited by: §1, §2.1, §4.2.1, 1st item.
  • A. S. Das, M. Datar, A. Garg, and S. Rajaram (2007) Google news personalization: scalable online collaborative filtering. In WWW, pp. 271–280. Cited by: §1.
  • Y. Feng, F. Lv, W. Shen, M. Wang, F. Sun, Y. Zhu, and K. Yang (2019) Deep session interest network for click-through rate prediction. In IJCAI, pp. 2301–2307. Cited by: §1, §2.1, §4.2.1, §4.2.4, 5th item.
  • L. Gong and Q. Cheng (2019) Exploiting edge features for graph neural networks. In CVPR, pp. 9211–9219. Cited by: §4.2.2.
  • H. Guo, R. Tang, Y. Ye, Z. Li, and X. He (2017) DeepFM: a factorization-machine based neural network for ctr prediction. In IJCAI, pp. 1725–1731. Cited by: §2.1, 2nd item.
  • B. Hu, C. Shi, W. X. Zhao, and P. S. Yu (2018)

    Leveraging meta-path based context for top-n recommendation with a neural co-attention model

    .
    In SIGKDD, pp. 1531–1540. Cited by: §1, §2.2.
  • J. Huang, W. X. Zhao, H. Dou, J. Wen, and E. Y. Chang (2018) Improving sequential recommendation with knowledge-enhanced memory networks. In SIGIR, pp. 505–514. Cited by: §1.
  • T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In ICLR, Cited by: §4.2.2.
  • Y. Koren, R. Bell, and C. Volinsky (2009) Matrix factorization techniques for recommender systems. Computer 42 (8), pp. 30–37. Cited by: §1, §2.1.
  • F. Li, Z. Chen, P. Wang, Y. Ren, D. Zhang, and X. Zhu (2019) Graph intention network for click-through rate prediction in sponsored search. In SIGIR, pp. 961–964. Cited by: §2.1.
  • X. Luo, L. Liu, Y. Yang, L. Bo, Y. Cao, J. Wu, Q. Li, K. Yang, and K. Q. Zhu (2020) AliCoCo: alibaba e-commerce cognitive concept net. In SIGMOD, Cited by: §5.1.1.
  • F. Lv, T. Jin, C. Yu, F. Sun, Q. Lin, K. Yang, and W. Ng (2019) SDM: sequential deep matching model for online large-scale recommender system. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 2635–2643. Cited by: §2.1.
  • Q. Pi, W. Bian, G. Zhou, X. Zhu, and K. Gai (2019) Practice on long sequential user behavior modeling for click-through rate prediction. In SIGKDD, pp. 2671–2679. Cited by: §2.1, §4.2.4.
  • K. Ren, J. Qin, Y. Fang, W. Zhang, L. Zheng, W. Bian, G. Zhou, J. Xu, Y. Yu, X. Zhu, and K. Gai (2019) Lifelong sequential modeling with personalized memorization for user response prediction. In SIGIR, Cited by: §2.1.
  • B. M. Sarwar, G. Karypis, J. A. Konstan, J. Riedl, et al. (2001) Item-based collaborative filtering recommendation algorithms. In WWW, pp. 285–295. Cited by: §1.
  • W. Shen, J. Wang, and J. Han (2014) Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering 27 (2), pp. 443–460. Cited by: §5.1.1.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2018) Graph attention networks. In ICLR, Cited by: §4.2.2, 3rd item.
  • H. Wang, F. Zhang, J. Wang, M. Zhao, W. Li, X. Xie, and M. Guo (2018a) Ripplenet: propagating user preferences on the knowledge graph for recommender systems. In CIKM, pp. 417–426. Cited by: §1, §2.2, 6th item.
  • H. Wang, F. Zhang, X. Xie, and M. Guo (2018b) DKN: deep knowledge-aware network for news recommendation. In WWW, pp. 1835–1844. Cited by: §1, §2.2.
  • H. Wang, M. Zhao, X. Xie, W. Li, and M. Guo (2019a) Knowledge graph convolutional networks for recommender systems. In WWW, pp. 3307–3313. Cited by: §1, §1, §2.2, §4.1, 1st item, 4th item.
  • X. Wang, X. He, Y. Cao, M. Liu, and T. Chua (2019b) KGAT: knowledge graph attention network for recommendation. In SIGKDD, pp. 950–958. Cited by: §1, §1, §2.2, §4.1, 8th item.
  • X. Wang, D. Wang, C. Xu, X. He, Y. Cao, and T. Chua (2019c) Explainable reasoning over knowledge graphs for recommendation. In AAAI, pp. 5329–5336. Cited by: §1, §1, §2.2, §4.1, 7th item.
  • X. Wang, D. Wang, C. Xu, X. He, Y. Cao, and T. Chua (2019d) Explainable reasoning over knowledge graphs for recommendation. In AAAI, pp. 5329–5336. Cited by: §1.
  • Y. Xian, Z. Fu, S. Muthukrishnan, G. De Melo, and Y. Zhang (2019) Reinforcement knowledge graph reasoning for explainable recommendation. In SIGIR, pp. 285–294. Cited by: §2.2.
  • F. Xu, J. Lian, Z. Han, Y. Li, Y. Xu, and X. Xie (2019) Relation-aware graph convolutional networks for agent-initiated social e-commerce recommendation. In CIKM, pp. 529–538. Cited by: §1.
  • F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W. Ma (2016) Collaborative knowledge base embedding for recommender systems. In SIGKDD, pp. 353–362. Cited by: §2.2.
  • W. X. Zhao, G. He, K. Yang, H. Dou, J. Huang, S. Ouyang, and J. Wen (2019) KB4Rec: a data set for linking knowledge bases with recommender systems. Data Intelligence 1 (2), pp. 121–136. Cited by: §5.1.1.
  • C. Zhou, J. Bai, J. Song, X. Liu, Z. Zhao, X. Chen, and J. Gao (2018a) ATRank: an attention-based user behavior modeling framework for recommendation. In AAAI, pp. 4564–4571. Cited by: §2.1, §4.2.4.
  • G. Zhou, N. Mou, Y. Fan, Q. Pi, W. Bian, C. Zhou, X. Zhu, and K. Gai (2019a) Deep interest evolution network for click-through rate prediction. In AAAI, pp. 5941–5948. Cited by: §1, §2.1, §4.2.1, §4.2.4, 4th item.
  • G. Zhou, X. Zhu, C. Song, Y. Fan, H. Zhu, X. Ma, Y. Yan, J. Jin, H. Li, and K. Gai (2018b) Deep interest network for click-through rate prediction. In SIGKDD, pp. 1059–1068. Cited by: §1, §2.1, §4.2.1, §4.2.4, 3rd item.
  • X. Zhou, D. Qin, L. Chen, and Y. Zhang (2019b) Real-time context-aware social media recommendation. The VLDB Journal 28 (2), pp. 197–219. Cited by: §1.