MTBRN: Multiplex Target-Behavior Relation Enhanced Network for Click-Through Rate Prediction

08/13/2020 ∙ by Yufei Feng, et al. ∙ Ant Financial Indiana University Bloomington 0

Click-through rate (CTR) prediction is a critical task for many industrial systems, such as display advertising and recommender systems. Recently, modeling user behavior sequences attracts much attention and shows great improvements in the CTR field. Existing works mainly exploit attention mechanism based on embedding product when considering relations between user behaviors and target item. However, this methodology lacks of concrete semantics and overlooks the underlying reasons driving a user to click on a target item. In this paper, we propose a new framework named Multiplex Target-Behavior Relation enhanced Network (MTBRN) to leverage multiplex relations between user behaviors and target item to enhance CTR prediction. Multiplex relations consist of meaningful semantics, which can bring a better understanding on users' interests from different perspectives. To explore and model multiplex relations, we propose to incorporate various graphs (e.g., knowledge graph and item-item similarity graph) to construct multiple relational paths between user behaviors and target item. Then Bi-LSTM is applied to encode each path in the path extractor layer. A path fusion network and a path activation network are devised to adaptively aggregate and finally learn the representation of all paths for CTR prediction. Extensive offline and online experiments clearly verify the effectiveness of our framework.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Figure 1. An illustration example of multiplex relations between a user’s historical behaviors and the target item in CTR prediction task.

Click-through rate (CTR) prediction lives the heart at display advertising and recommender systems. It estimates the probability of a user to click on a given target item. The quality of CTR is fundamental to user experience and user retention. Recently, modeling user behavior sequences is prevailing in CTR prediction. Several related algorithms 

(Zhou et al., 2018; Feng et al., 2019; Pi et al., 2019; Zhou et al., 2019; Ni et al., 2018; Ren et al., 2019)

have been proposed and achieved good performance in real-world applications. They represent behavior sequences as fixed-length vectors of user interests and feed them into the deep neural network for final CTR prediction. These models mainly exploit attention mechanism based on embedding product to aggregate user behaviors w.r.t target item.

Although these methods have achieved performance improvement to some extent, they still face a few major weaknesses. Most importantly, they lack of concrete semantics and overlook the underlying reasons driving a user to click on a target item. As a result, they fail to precisely comprehend a user’s interest. As illustrated in Figure 1, a movie fan of Avengers is likely to click on an ”Iron Man” graphic T-shirt, even if he has only browsed some spin-off products of Avengers. Such underlying reason, also named multiplex relation, is composed of multi-typed semantic relatedness (e.g., ”fan of”, ”member of” and ”theme of”) and different types of items or entities (e.g., ”members of Avengers” and ”film of Avengers”). It is difficult for existing models to capture the multiplex relation where these spin-off products and T-shirts can be explicitly linked to the same theme of Avengers by meaningful semantics. Furthermore, there are more than one multiplex relations between user behaviors and target item. These multiplex relations are particularly helpful to reveal the preferences (i.e., reasons) of users on consuming items from different perspectives. For instance, after watching the movie Avengers: Infinity War, a user may choose either Avengers: End Game or Sherlock Holmes to watch next. Because the former is the sequel of Avengers: Infinity War, and the latter is also starred by Robert Downey Jr. The two movies should be recommended to the user since both relations with Avengers: Infinity War must be considered. Therefore, without explicitly modeling multiplex relations, it is conceptually difficult for precise recommendation.

In order to address aforementioned problems, we propose a new framework named Multiplex Target-Behavior Relation enhanced Network (MTBRN) for CTR prediction. MTBRN transfers the information of multiplex relations between user behaviors and target item in a unified framework, so as to bridge and associate them. Knowledge graph (KG) emerges as an alternative to describe such relations, as the surge of interests in incorporating KG into recommender systems due to its comprehensive auxiliary data (Zhang et al., 2016; Wang et al., 2019c; Wang:KGCN; Wang et al., 2019b; Hu et al., 2018; Wang et al., 2018a; Huang et al., 2018; Wang et al., 2019a, 2018b; Yu et al., 2014; Wang et al., 2019d; Cao et al., 2019; Wang et al., ). It introduces semantic relatedness among items and various entities, which can capture multiple underlying connections between items. That motivates us to model multiplex relations based on KG.

We explore and construct multiple paths between user behaviors and target item on KG to capture multiplex relations via graph search algorithm. In a modern web-scale recommender system, there are other graphs that can provide useful linking information to describe multiplex relations. An item-item similarity graph, for instance, can establish high-order connection between similar items. In the remaining part of this paper, we demonstrate our framework can incorporate these graphs in the same way as KG. To integrate those relational paths into MTBRN, we use Bi-LSTM to encode each path. Relational paths from various graphs can benefit and complement each other, so we devise a fusion network for their higher order feature interaction. After the fusion network, different path representations are adaptively aggregated into the final representation of multiplex relations through an attention based activation network. At last, the representation and other features are concatenated and fed into feature interacting layer for CTR prediction. Experiments were conducted on a proprietary industrial dataset and a public dataset, on which our framework displays state-of-the-art results. MTBRN has been fully deployed into product recommender of one popular E-commerce Mobile App and achieves significant CTR improvements by 7.9%.

The main contributions of this paper are summarized as follows:

  • We highlight the importance of multiplex relations between user behaviors and target item in CTR prediction. We propose a path based method to leverage such relations on different graphs.

  • A new CTR prediction framework named MTBRN is proposed to explore and model multiplex relations. Multiple relational paths are extracted from various graphs. Bi-LSTM and a path fusion/activation network are employed to adaptively learn the final representation of multiplex relations.

  • We performed extensive experiments on a proprietary industrial dataset and a public dataset. Experimental results verify the rationality of each graph and the effectiveness of the proposed MTBRN framework.

2. Problem Formulation

Figure 2. The graph construction methods and path exploration and extraction strategy.

In this section, we formulate the problem of click-through rate prediction with multiplex relations between user behaviors and target item. Specifically, we construct the following two graphs to extract multiplex relations, which are illustrated in the left part of Figure 2.

Item-item similarity graph. The item-item similarity graph is denoted as , where is the set of items and describes the similarity between item and . We calculate via the idea of item-based collaborative filtering (Sarwar et al., 2001), which can be formulated as follows,

(1)

Here, refers to element-wise product, is the user-item interaction matrix from users’ historical behaviors, where and is the number of users and items respectively and indicates that user interacted with item , otherwise. For example, the triple indicates that the similarity of item and item is 0.3. The triple indicates that the similarity between item and item is . The item is the junction of the two triples.

Knowledge graph. The knowledge graph describes semantic correlation among items and real-world entities via various relations, which can be denoted as , where is the set of items and entities, and is the set of relations. For example, the triple indicates that the item belongs to the category. The triple indicates the parent of is clothing. The entity is the junction of the two triples.

Notations Description
label and the predicted probability
,
the user and target item
,
the user set and item set
, ,
user profile, item profile and user behaviors
the i user behavior item
,
item-item similarity graph and knowledge graph
,
path sets extracted from and
embedding vector for the i feature
embedding vector for the i node of the path
the concatenated output of Bi-LSTM
,
the output of , in the relational path extractor layer
the output of , in the relational path fusion layer
, ,
the output of , and in the relational path activation layer
Table 1. Notations.

Multiplex relations enhanced CTR prediction. Now, we formulate the CTR prediction problem to be addressed in this paper. We assume a set of historical click records between users and items, denoted as . is comprised of , where and respectively represent the user and item sets, is the set of user behaviors consisting of item ids that the user has recently clicked on and is set to one if and only if user has clicked on item . Moreover, each user is associated with a user profile consisting of sparse features (e.g., user id and gender) and numerical features (e.g., user age), while each target item is also associated with a item profile consisting of sparse features (e.g., item id and brand) and numerical features (e.g., price). In order to effectively explore and exploit the multiplex relations between user behaviors and target item, we elaborately construct knowledge graph and item-item similarity graph to enhance CTR prediction. Formally, our goal is to learn a prediction function , such that represents the predicted probability of user to click on target item and represents the parameters of the prediction function . The notations are summarized in Table 1.

3. MTBRN Framework

In this section, we introduce our proposed framework MTBRN. We first propose the path-based algorithm that captures and models multiplex relations from different graphs. Afterwards, we elaborate on the deep neural network architecture of MTBRN, which is proposed to encode the relational information of the paths extracted from auxiliary graphs and adaptively learn how paths contribute to the final prediction.

3.1. Path Exploration and Extraction Strategy

In this part, we introduce the strategy to effectively explore and extract paths between user behaviors and target item, which is a natural way to describe multiplex relations on graphs. Previous path-based models either use the random walk strategy (Wang et al., 2019c) or design an auxiliary task (e.g., matrix factorization) to assign weights to paths (Hu et al., 2018). Unfortunately, random walk based methods may harm the stability of model performance while auxiliary task based methods only work under specific settings. To effectively capture multiplex relations between user behavior and target item , we adopt different search strategies to extract paths on different graphs, listed as follows:

  • For the item-item similarity graph , we exhaustively search any potential path following breadth-first search (BFS) and greedy-selection principle according to the similarity score. We only keep top paths with the shortest length. One demo path extracted from the item-item similarity graph is defined as: , where (, , ) is one triple in .

  • For the knowledge graph , we follow the BFS principle to generate all paths over the linked relations and entities between user behaviors and target item, and keep top paths with the shortest length. One demo path extracted from the knowledge graph can be defined as: , where (, , ) is one triple in .

We illustrate the procedure in the Figure 2. Afterwards, we can get two types of path sets: and , which capture multiplex relations between user behaviors and target item from different perspectives.

3.2. Model Architecture

Figure 3. Overview of proposed framework MTBRN. The right part is embedding vectors of user and target item profile features. The left part is our main contribution, which processes the extracted relational paths. We use Bi-LSTM, path fusion and activation network to encode multiplex relational paths. Representations from the two parts are concatenated, flattened and then fed into feature interacting layer for the final prediction.

As shown in Figure 3, MTBRN consists of two parts before feature interacting layer. The right part is the embedding vectors transformed from the user profile feature and target item profile feature. The left part models the extracted relational paths, which is composed of three layers from left to right: (1) relational path extractor layer extracts bi-directional relational information on paths; (2) relational path fusion layer captures the higher-order interactions of paths; (3) relational path activation layer adaptively learns the representation of the relational paths w.r.t. the target item. Finally the outputs of the two parts are concatenated, flattened and fed into the feature interacting layer for the final prediction. Next, we will present detailed illustration of the proposed MTBRN model.

Embedding. Embedding is a popular technique that projects each feature to a dense vector representation. Formally, let be the embedding vector for the i feature. Since there exist numerical features (e.g., price) in original features, we rescale the embedding vector by its input feature value to account for the real valued features. In this way, the embedding of the i feature for the input vector is calculated as . Due to the sparsity of the input feature, we only need to preserve the embeddings for non-zero features. Thus, the final embedding of input feature vector is obtain as:

(2)

where is the concatenation operation. Following the above embedding procedure, the user and target item profile feature space can be represented by and , respectively. One path in and can be uniformly represented by , where is the -th user behavior item, is the target item, and can be either sparse features (i.e., items, relations and entities), or numerical features (i.e., similarity score). With embedding, the path can be generally represented by [; …; ], where is the length of the path.

Relational Path Extractor Layer. The path sets describe the multiplex relations of target item and user behaviors from different perspectives, and this layer is designed to extract the information passing along the paths. Previous path-based models either use CNN (Hu et al., 2018) or LSTM (Wang et al., 2019c) to encode the relational paths from user to target item. Whereas items in the user behavior and target item are in the same semantic space and the transmission of information along the path is always asymmetric. Therefore, we naturally apply Bi-LSTM (Graves and Schmidhuber, 2005) to extract two-way information transmitted on the path asymmetrically. Mathematically, LSTM (Hochreiter and Schmidhuber, 1997) network is implemented as follows:

(3)

where is the logistic function, i, f, o and c are the input gate, forget gate, output gate and cell vectors, respectively. Forward and backward LSTMs model the bi-direction information, that is, the last representation of each path is calculated as follows:

(4)

where and represent the last hidden state of the forward LSTM and backward LSTM, respectively. Note that the parameters of Bi-LSTM are shared when encoding paths from the same set. After the relational path extractor layer, and are represented by and , respectively. For example, denotes last representations of all paths in .

Relational Path Fusion Layer. Relational paths can benefit and complement each other. The previous two paths and , for example, may bring much more information to light if considered together. Inspired by the improvements Rendle (2010); Juan et al. (2016); Cheng et al. (2016); Guo et al. (2017) of feature interaction in the CTR field, we further capture the higher order interaction among representations of paths. Mathematically, the interactive representation can be calculated as follows:

(5)

where , is element-wise multiply and is the number of all paths.

Relational Path Activation Layer. Intuitively, relational paths contribute unequally to target item and further influence the final prediction. Taking two paths and in as an example, is more similar to target item than since the first path has the higher similarity score than the second path. Meanwhile, relational paths extracted from different graphs are not on the same scale. Taking another two paths and for example, it is hard to distinguish which one is more effective. Therefore, the weights of path representations in need to be reassigned w.r.t. the target item. For this reason, the attention mechanism (Bahdanau et al., 2014) is applied to conduct alignment between paths and the target item. Mathematically, the adaptive representation of relational paths in each path set w.r.t. the target item is calculated as follows:

(6)

where is the number of paths in each path set and W is the trainable parameters. After the relational path activation layer, , and respectively are encoded into vectors , and .

Feature Interacting Layer. Following previous studies Zhou et al. (2018); Feng et al. (2019); Pi et al. (2019); Zhou et al. (2019)

in the CTR prediction field, Multiple Layer Perceptron (MLP) is applied for better feature interaction. Here we calculate the final output as follows:

(7)

where , is the logistic function and represents the prediction probability of the user to click on the target item .

Loss Function.

We reduce the CTR prediction task to a binary classification problem with binary cross-entropy loss function, which can be defined as follows:

(8)

where is the training dataset and represents whether the user clicked on the target item .

4. Experiments

We evaluate the proposed framework MTBRN on a proprietary industrial E-commerce dataset and the public Yelp dataset. Moreover, we conduct strict online A/B testing to evaluate the performance of MTBRN after deployed to real-world settings. Specifically, we will make comprehensive analyses about MTBRN, with the aim of answering the following questions:

  • RQ1 How does MTBRN perform compared with other state-of-the-art (SOTA) user behavior enhanced CTR models?

  • RQ2 How does MTBRN perform compared with competitors that can leverage the same graph in our framework for recommendation?

  • RQ3 How do the multiplex relations between user behaviors and target item derived from different graphs benefit CTR prediction?

4.1. Datasets and Graph Description

We report detailed description of the two datasets and all graphs utilized by MTBRN in Table 2.

4.1.1. E-commerce Dataset.

It is an industrial real-world recommender dataset collected from a popular E-commerce Mobile App. The dataset consists of impression/click logs in 8 consecutive days, where clicked ones are treated as positive instances and negative otherwise. Logs from 2019-08-22 to 2019-08-28 are used for training, and logs from 2019-08-29 are for testing. Moreover, E-commerce dataset contains user profile (e.g., id, age, and gender), item profile (e.g., id, category, and price) and real-time user behaviors111Real-time user behaviors refer to user behaviors before this instance occurs..

4.1.2. Yelp Dataset.

Yelp datasets records interactions between users and local business and contains user profile (e.g., id, review count, and fans), item profile (e.g., id, city, and stars) and real-time user behaviors222https://www.yelp.com/dataset.. To adapt for the CTR prediction task, we treat all observed interactions as positive instance. For each user-item pair in positive instance, we randomly sample 5 negative samples that have no interaction record with the specific user to constitute negative instance set. Then, in chronological order, we take each user’s last 30 instances for testing and last 31 to 120 instances for training.

4.1.3. Item-item Similarity Graph.

We construct the item-item similarity graph as introduced in section 2. To model the real-world recommender system and prevent information leakage, we only construct the graph based on user behaviors that do not exists in the training and testing datasets. Moreover, considering the tremendous number of items, we only keep top 5 neighbors with highest similarity score for each item and the max depth of each node is set to 3.

4.1.4. Knowledge Graph.

Knowledge-aware recommendation relies highly on the quality of the knowledge graph. We construct the knowledge graph following procedures described in Luo et al. (2020). For the E-commerce dataset, relations include category, parent, season, style, etc. For the Yelp dataset, relations include category, location, attribute, etc.

Description E-commerce Yelp
Users 0.2 billion 45.4 thousand
Items 0.1 billion 45.1 thousand
Records 7.2 billion 1.0 million
User behaviors 10 10
Triplets in 26.2 billion 8.1 million
Relations in 34 35
Entities in 14.4 million 83.3 thousand
Triplets in 33.8 billion 1.6 million
Table 2. Statistics of the dataset and graphs.

4.2. Experimental Setup

4.2.1. Competitors

We consider two kinds of representative CTR prediction methods: user behavior sequence enhanced methods (i.e., YoutubeNet, DIN, DIEN and DSIN), item-item similarity graph based method (i.e., GIN) and knowledge graph based methods (i.e., RippleNet, KPRN and KGAT). To examine the effect of the multiplex relations derived from different graphs and relational path fusion layer, we prepare three variants of MTBRN (i.e., MTBRN, MTBRN and MTBRN). The competitors are given below:

  • YoutubeNet (Covington et al., 2016) is designed for video recommendation in Youtube. It treats user behaviors equally and applies average pooling operation.

  • DeepFM (Guo et al., 2017) is technically designed to capture the multi-order interactions of features. It combines FM and deep model, without the need of complicated feature engineering.

  • DIN (Zhou et al., 2018) uses the embedding product attention mechanism to learn the adaptive representation of user behaviors w.r.t. the target item.

  • DIEN (Zhou et al., 2019) designs an auxiliary network to capture user’s temporal interests and proposes AUGRU to model the interest evolution.

  • DSIN (Feng et al., 2019) divides the user behavior sequence into multiple sessions and designs the extractor layer and evolving layer to extract the session interests and model how they evolves during time.

  • GIN (Li et al., 2019) is the first to mine and aggregate the user’s latent intention on the co-occurrence item graph with graph attention technique. GIN can be easily applied on item-item similarity graph.

  • RippleNet (Wang et al., 2018a) explores the multiple ripples of user behaviors on the knowledge graph and propagates the representation of the target item recursively layer by layer.

  • KPRN (Wang et al., 2019c) applies LSTM to directly model the multiple user-item paths via the knowledge graph and then aggregate them for the final prediction.

  • KGAT (Wang et al., 2019b) recursively propagates the high-order connectivity of the user and item via the knowledge graph and user-item bipartite graph with graph attention technique.

  • MTBRN : MTBRN with only item-item similarity graph.

  • MTBRN : MTBRN with only knowledge graph.

  • MTBRN : MTBRN without relational path fusion layer.

4.2.2. Evaluation Metrics

In our experiments, we evaluate the performance of different methods for comparison via AUC (Area Under ROC Curve) and Logloss (cross entropy), which are widely adopted in the CTR field. The larger AUC, the better performance. Base on our practice lessons, 0.1% increase of offline AUC on our proprietary dataset is corresponding to relative 1% online CTR lift.

4.2.3. Implementation.

We implemented all the models in Tensorflow 1.4. We tailored models which were not originally designed for the CTR prediction task, including concatenation with other features and addition of MLP layers at last. We did not apply pre-training, batch normalization and regularization techniques. Instead, random uniform initializer is employed. With the computational cost in mind, only 2 layers of neighbours are reserved for each user behavior item for GIN. For KGAT, the neighbour depth of the user and item is set to 5 and 4, respectively. For RippleNet, the depth of ripple is set to 3. The extracted paths on the item-item similarity graph and knowledge graph are reserved up to 50. All models are tuned using Adagrad optimizer with learning rate 0.001 and batch size 300. Embedding size of each feature is set to 4. The hidden units in MLP layers are set 512, 256, and 128, respectively. We ran each model three times and computed the mean to eradicate any discrepancies.

4.3. Performance Comparison (RQ1&RQ2)

Graph
Model
E-commerce Yelp
AUC Logloss AUC Logloss
- YoutubeNet 0.6017 0.6279 0.7109 0.4899
- DeepFM 0.6037 0.6192 0.7334 0.4882
- DIN 0.6058 0.5735 0.7520 0.4579
- DIEN 0.6065 0.5643 0.7581 0.4518
- DSIN 0.6073 0.5394 0.7774 0.4392
GIN 0.6073 0.5416 0.7604 0.4471
MTBRN 0.6094 0.5329 0.7915 0.4129
MTBRN 0.6103 0.5244 0.7936 0.4075
RippleNet 0.5975 0.6369 0.7324 0.4844
KGAT 0.6062 0.5624 0.7876 0.4214
KPRN 0.6091 0.5292 0.8267 0.3897
MTBRN 0.6209 0.4628 0.9088 0.3486
ALL MTBRN 0.6235 0.4509 0.9231 0.3213
MTBRN 0.6246 0.4482 0.9408 0.3058
  • () Paths shorter than 5 (7) are reserved, exploring up to 2 (3) layers of neighborhood.

Table 3. Model performance (AUC and Logloss) with each separate graph and all graphs.

In this section, we start off comparing the performance of MTBRN with SOTA user behavior enhanced CTR models, as well as with other competitors leveraging each auxiliary graph. We report the performance of all models on the two datasets333Note that the relative improvements on the public Yelp dataset are much higher than those on the industrial E-commerce dataset, because the negative samples of the public Yelp dataset are generated by random sampling, which means easier to distinguish. in Table 3.

4.3.1. User behavior Enhanced CTR models.

As shown in Table 3, DIN improves AUC obviously by leveraging the attention mechanism to activate the user’s relevant interests w.r.t. the target item. DIEN achieves better performance with the technically designed auxiliary net and AUGRU to model the interest evolution. DSIN performs better than DIEN by extracting user’s session interests. Nevertheless, most competitors in the CTR field reallocate weights to user behaviors only based on the item embedding with the attention mechanism. It can hardly figure out the complex reasons driving the user to click the target item. Overall, MTBRN significantly outperforms above state-of-the-art competitors on both datasets, which mainly benefits from two aspects: (1) the extracted multiplex relational paths from different graphs are more reasonable and concrete, so as to provide powerful clues why the user will click on the target item; (2) the technical design of MTBRN helps capture the multiplex relations of user behaviors and the target item. Both contribute much to the final prediction and help achieve the best performance. To answer RQ2, we gave extensive insights on how each graph and different components of MTBRN contribute to the best performance. More complicated models are used as competitors.

4.3.2. Item-item Similarity Graph.

As shown in Table 3, GIN outperforms DIN, mainly benefiting from the exploration of users’ latent intention in graphs. However, GIN still ignores the relation between user behaviors and target item as well as the linked similarity score between items. In MTBRN, we explore and model the paths between user behaviors and the target item within the range of 2-layer adjacent neighborhood as the same as GIN. MTBRN outperforms GIN in both datasets. It empirically demonstrates the usefulness of the relation paths between user behaviors and target items for CTR prediction. Furthermore, we flexibly extend the neighbour depth of the graph to 3 (i.e., MTBRN) for exploiting high-order information on the graph and, not surprisingly, more improvement is observed on MTBRN. It conveys the message that longer paths can capture higher-order similarities of items and benefit the final prediction in the long run.

4.3.3. Knowledge Graph.

We present the AUC performance of various knowledge-aware models for the CTR prediction task on both datasets in Table 3. KGAT and KPRN both outperform DIN, and specifically KPRN offers more increase on the both datasets. In contrast, RippleNet renders inferior performance than DIN. One reasonable explanation is that modeling relations explicitly between user behaviors and target item is more efficacious than user preference propagation in the KG. KGAT incorporates knowledge graph and user-item bipartite graph, hence it yeilds more improvements compared with DIN. However, KGAT has been found empirically to involve useless information for the final prediction and fails to capture direct interaction between user behaviors and target item. KPRN benefits from reasonable and explainable user-target paths derived from knowledge graph and outperforms KGAT. MTBRN devotes to capturing the reasonable and explainable knowledge relations of user behaviors and target item. Moreover, the relational path extractor layer and activation layer help obtain the representation of multiplex relational paths and activate those related to the target item. Therefore, MTBRN outperforms other competitors with the same knowledge graph.

4.3.4. Effect of Relation Path Fusion Layer.

We conduct extensive experiments to verify the effectiveness of the proposed relational path fusion layer. We report the detailed comparison of model performance of MTBRN with or without paths fusion (i.e., MTBRN and MTBRN) in Table 3. Not surprisingly, MTBRN outperforms MTBRN with single graph data (i.e., MTBRN and MTBRN) since it incorporates multiplex relations derived from different graphs for the final prediction. Moreover, MTBRN performs better than MTBRN, which demonstrates the effectiveness of the proposed relational path fusion layer.

4.4. Validity Analysis of Paths (RQ3)

Figure 4. Impact of the number of relational paths.
Figure 5. Impact of the average length of relational paths.

In this section, we make comprehensive instance-level analyses of the effectiveness of multiplex relations (i.e., the extracted relational paths) derived from different graphs on the E-commerce dataset. As shown in Figure 4, CTR is calculated by averaging real label values (0 for non-click or 1 for click) of instances with the same number of paths in dataset. The number of relational paths (both on knowledge graph and item-item similarity graph) is positively correlated with CTR. This indicates that the more relations between the target item and user behaviors, the more likely the user is to click on the target item. We also investigate the influence of average path length extracted from each sample on CTR. Figure 5 shows that, shorter relational paths from the item-item similarity graph and the knowledge graph contributes to higher CTR. It demonstrates that the shorter distance between user behaviors and the target item in a graph implies closer relation, which makes the user more likely to click on the target item.

4.5. Online A/B Testing

Figure 6. MTBRN deployment for Online CTR.

We have deployed MTBRN into product recommender of one popular E-commerce Mobile App for several months. The pipelines of deployment are clearly displayed in Figure 6, which consists of three parts: user response, offline training and online serving. User-item interaction logs from user response, as well as knowledge graph and item-item similarity graph are fed into MTBRN model for training. When a user accesses the App, a series of candidate items are generated by MTBRN in real time. Subsequently, candidate items are sorted and truncated by the predicted scores, and recommended to the user. We conducted strict online A/B testing to validate the performance of MTBRN. Our online baseline is the latest deployed DIN. Online CTR increased by 7.9% on average compared to DIN, at the cost of 7 milliseconds more for MTBRN online inference. The substantial increase in commercial revenue together with tolerable latency in serving serve as proof for the effectiveness of our proposed MTBRN.

5. Related Work

5.1. User Behaviors Enhanced CTR

Click-through rate (CTR) prediction is an essential task in most industrial recommender systems. Recently, modeling user behavior sequences has attracted much attention and been widely proven effective in the CTR field. DIN (Zhou et al., 2018) uses the attention mechanism with embedding product to learn the adaptive representation of the user behavior sequence w.r.t. the target item. Inspired by DIN, the majority of following up works inherit this kind of paradigm. DIEN (Zhou et al., 2019) and SDM (Lv et al., 2019) devote to capturing users’ temporal interests and modeling their sequential relations. DSIN (Feng et al., 2019) focuses on capturing the relationships of users’ inter-session and intra-session behaviors. Though with great improvements, the embedding product based attention mechanism fails to capture the multiplex relations between user behaviors and target item.

5.2. Knowledge-aware Recommendation

Many recent research studies integrate knowledge graph that integrates more side information of items into recommendation to improve the interpretability of recommendation. RippleNet (Wang et al., 2018a) combines the advantages of the previously mentioned two types of methods. KPRN (Wang et al., 2019c) designs multiple paths between the user and item pair and uses LSTM to extract the information of each path. Wang et al. Wang et al. uses graph convolution network to automatically discover both high-order structure information and semantic information of the knowledge graph. KGAT (Wang et al., 2019b) leverages graph attention network to model high-order relation connectivity in the knowledge graph and user-item bipartite graph. Empirically, path-based methods make use of KG in a more natural and efficient way, that are more suitable for depicting the deep relevance between user behaviors and target item in our work for CTR prediction.

6. Conclusion

In this paper, we propose a new framework MTBRN to transfer the information of multiplex relations between user behaviors and target item in CTR prediction. The integration of different graphs and various connection paths ensure the superior performance of MTBRN over related work. We conducted extensive experiments on an industrial-scale dataset and a public dataset to demonstrate the effectiveness of our method. Empirical analyses show the validity of each component in the proposed framework.

References

  • D. Bahdanau, K. Cho, and Y. Bengio (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: §3.2.
  • Y. Cao, X. Wang, X. He, Z. Hu, and T. Chua (2019) Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In WWW, pp. 151–161. Cited by: §1.
  • H. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir, et al. (2016)

    Wide & deep learning for recommender systems

    .
    In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, pp. 7–10. Cited by: §3.2.
  • Covington, Paul, Adams, Jay, Sargin, and Emre (2016) Deep neural networks for youtube recommendations. In RecSys, pp. 191–198. Cited by: 1st item.
  • Y. Feng, F. Lv, W. Shen, M. Wang, F. Sun, Y. Zhu, and K. Yang (2019) Deep session interest network for click-through rate prediction. In IJCAI, Cited by: §1, §3.2, 5th item, §5.1.
  • A. Graves and J. Schmidhuber (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks 18 (5-6), pp. 602–610. Cited by: §3.2.
  • H. Guo, R. Tang, Y. Ye, Z. Li, and X. He (2017) DeepFM: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247. Cited by: §3.2, 2nd item.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §3.2.
  • B. Hu, C. Shi, W. X. Zhao, and P. S. Yu (2018)

    Leveraging meta-path based context for top-n recommendation with a neural co-attention model

    .
    In SIGKDD, pp. 1531–1540. Cited by: §1, §3.1, §3.2.
  • J. Huang, W. X. Zhao, H. Dou, J. Wen, and E. Y. Chang (2018) Improving sequential recommendation with knowledge-enhanced memory networks. In SIGIR, pp. 505–514. Cited by: §1.
  • Y. Juan, Y. Zhuang, W. Chin, and C. Lin (2016) Field-aware factorization machines for ctr prediction. In RecSys, pp. 43–50. Cited by: §3.2.
  • F. Li, Z. Chen, P. Wang, Y. Ren, D. Zhang, and X. Zhu (2019) Graph intention network for click-through rate prediction in sponsored search. In SIGIR, Cited by: 6th item.
  • X. Luo, L. Liu, Y. Yang, L. Bo, Y. Cao, J. Wu, Q. Li, K. Yang, and K. Q. Zhu (2020) AliCoCo: alibaba e-commerce cognitive concept net. In SIGMOD, Cited by: §4.1.4.
  • F. Lv, T. Jin, C. Yu, F. Sun, Q. Lin, K. Yang, and W. Ng (2019) SDM: sequential deep matching model for online large-scale recommender system. pp. 2635–2643. Cited by: §5.1.
  • Y. Ni, D. Ou, S. Liu, X. Li, W. Ou, A. Zeng, and L. Si (2018) Perceive your users in depth: learning universal user representations from multiple e-commerce tasks. In SIGKDD, pp. 596–605. Cited by: §1.
  • Q. Pi, W. Bian, G. Zhou, X. Zhu, and K. Gai (2019) Practice on long sequential user behavior modeling for click-through rate prediction. In SIGKDD, Cited by: §1, §3.2.
  • K. Ren, J. Qin, Y. Fang, W. Zhang, L. Zheng, W. Bian, G. Zhou, J. Xu, Y. Yu, X. Zhu, and K. Gai (2019) Lifelong sequential modeling with personalized memorization for user response prediction. In SIGIR, Cited by: §1.
  • S. Rendle (2010) Factorization machines. In ICDM, pp. 995–1000. Cited by: §3.2.
  • B. Sarwar, G. Karypis, J. Konstan, and J. Riedl (2001) Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pp. 285–295. Cited by: §2.
  • H. Wang, F. Zhang, J. Wang, M. Zhao, W. Li, X. Xie, and M. Guo (2018a) Ripplenet: propagating user preferences on the knowledge graph for recommender systems. In CIKM, pp. 417–426. Cited by: §1, 7th item, §5.2.
  • H. Wang, F. Zhang, X. Xie, and M. Guo (2018b) DKN: deep knowledge-aware network for news recommendation. In WWW, pp. 1835–1844. Cited by: §1.
  • [22] H. Wang, F. Zhang, M. Zhang, J. Leskovec, M. Zhao, W. Li, and Z. Wang Knowledge-aware graph neural networks with label smoothness regularization for recommender systems. In SIGKDD, pp. 968–977. Cited by: §1, §5.2.
  • H. Wang, F. Zhang, M. Zhao, W. Li, X. Xie, and M. Guo (2019a) Multi-task feature learning for knowledge graph enhanced recommendation. In WWW, pp. 2000–2010. Cited by: §1.
  • X. Wang, X. He, Y. Cao, M. Liu, and T. Chua (2019b) KGAT: knowledge graph attention network for recommendation. In SIGKDD, Cited by: §1, 9th item, §5.2.
  • X. Wang, D. Wang, C. Xu, X. He, Y. Cao, and T. Chua (2019c) Explainable reasoning over knowledge graphs for recommendation. In AAAI, Vol. 33, pp. 5329–5336. Cited by: §1, §3.1, §3.2, 8th item, §5.2.
  • X. Wang, D. Wang, C. Xu, X. He, Y. Cao, and T. Chua (2019d) Explainable reasoning over knowledge graphs for recommendation. In AAAI, Vol. 33, pp. 5329–5336. Cited by: §1.
  • X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han (2014) Personalized entity recommendation: a heterogeneous information network approach. In WSDM, pp. 283–292. Cited by: §1.
  • F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W. Ma (2016) Collaborative knowledge base embedding for recommender systems. In SIGKDD, pp. 353–362. Cited by: §1.
  • G. Zhou, N. Mou, Y. Fan, Q. Pi, W. Bian, C. Zhou, X. Zhu, and K. Gai (2019) Deep interest evolution network for click-through rate prediction. In AAAI, Vol. 33, pp. 5941–5948. Cited by: §1, §3.2, 4th item, §5.1.
  • G. Zhou, X. Zhu, C. Song, Y. Fan, H. Zhu, X. Ma, Y. Yan, J. Jin, H. Li, and K. Gai (2018) Deep interest network for click-through rate prediction. In SIGKDD, pp. 1059–1068. Cited by: §1, §3.2, 3rd item, §5.1.