Learning Cross-Domain Representation with Multi-Graph Neural Network

05/24/2019 ∙ by Yi Ouyang, et al. ∙ 0

Learning effective embedding has been proved to be useful in many real-world problems, such as recommender systems, search ranking and online advertisement. However, one of the challenges is data sparsity in learning large-scale item embedding, as users' historical behavior data are usually lacking or insufficient in an individual domain. In fact, user's behaviors from different domains regarding the same items are usually relevant. Therefore, we can learn complete user behaviors to alleviate the sparsity using complementary information from correlated domains. It is intuitive to model users' behaviors using graph, and graph neural networks (GNNs) have recently shown the great power for representation learning, which can be used to learn item embedding. However, it is challenging to transfer the information across domains and learn cross-domain representation using the existing GNNs. To address these challenges, in this paper, we propose a novel model - Deep Multi-Graph Embedding (DMGE) to learn cross-domain representation. Specifically, we first construct a multi-graph based on users' behaviors from different domains, and then propose a multi-graph neural network to learn cross-domain representation in an unsupervised manner. Particularly, we present a multiple-gradient descent optimizer for efficiently training the model. We evaluate our approach on various large-scale real-world datasets, and the experimental results show that DMGE outperforms other state-of-art embedding methods in various tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Recently, many online personalized services have utilized users’ historical behavior data to characterize user preferences, such as: online video sites (Covington et al., 2016), App stores (Cheng et al., 2016), online advertisements (Guo et al., 2017) and E-commmerce sites (Zhao et al., 2018; Wang et al., 2018). Learning the representation from user-item interactions is an essential issue in most personalized services. Usually, low-dimensional embeddings can effectively represent attributes of items and preferences of users in a uniform latent semantic space, which are helpful to provide personalized services and improve user experience. Moreover, the representation of users and items has been widely applied to many research topics related to above real-world scenarios, including: large-scale recommendation (Wang et al., 2018; Ying et al., 2018), search ranking (Grbovic and Cheng, 2018; Chu et al., 2018), cold-start problem (Zhao et al., 2018).

In large-scale personalized services, there are usually a relative small portion of active users, and a majority of non-active users often interact with only a small number of items, users’ behavior data is thus lacking or insufficient in an individual domain, which makes it difficult to learn effective embeddings (Wang et al., 2019). On the other hand, though data from a single domain is sparse, users’ behaviors from correlated domains regarding the same items are usually complementary (Zhuang et al., 2017). Take the App store as an example, there are two ways users interact (e.g., download) with items (i.e., Apps). One is downloading Apps recommended on the homepage or category pages of App store (i.e., recommendation domain), the other is by searching (i.e., search domain). User behaviors in the search domain reflect user’s current needs or intention, while that in the recommendation domain represent user’s relative long-term interests. Leveraging the interaction data from the search domain can improve the performance of recommendation. On the other hand, interaction data from the recommendation domain can also help to explore user’s personalized interests and therefore optimize the ranking list in search domain. Therefore, we are motivated to leverage the complementary information from correlated domains to alleviate the sparsity problem.

Generally, users’ behaviors are sequential (Hidasi et al., 2016) (take the App store as example, as shown in Figure 1 (a)), and graph can be used to model users’ sequential behaviors intuitively (Wang et al., 2018). Specifically, in each domain (as shown in Figure 1 (b)), we can construct an item graph by modeling the items as nodes, the item co-occurrences as edges, and the number of co-occurrences in all users’ behavior sequences as the weights of edges. Through applying graph embedding methods such as DeepWalk (Perozzi et al., 2014; Wang et al., 2018), it can generate abundant item sequences by running random walk on the item graph, and then use the Skip-Gram algorithm (Mikolov et al., 2013a; Mikolov et al., 2013b) to learn item embedding. Compared with the random walk based graph embedding methods (Perozzi et al., 2014; Grover and Leskovec, 2016), graph neural networks (GNNs) have shown the great power for representation learning on graphs recently (Xu et al., 2019). As a state-of-the-art GNN, graph convolutional network (GCN) (Kipf and Welling, 2017)

is proposed based on convolutional neural networks, and generates node embedding by operating convolution on the graph. The graph convolution operation in GCN is to encode node attributes and graph structure using neural networks, thus GCN performs well in graph embedding, and can be used for item embedding. However, these methods are developed for learning single graph embedding, i.e., single domain embedding. Users’ behaviors in cross-domain are more complex, and it is more reasonable to model user’s behaviors as multi-graph (as shown in Figure 

1 (c)), which consists a set of nodes and multiple types of edges (i.e., solid and dashed lines represent two types of edges). Concretely, nodes represent the same items across domains and each type of edge denotes the co-occurrences of item pairs in each domain. In multi-graph, there may exist multiple types of edges between pairs of nodes, each type of edge forms a certain subgraph (i.e., a domain), and these subgraphs are related to each other, as all of them share the same nodes. Thus, it is likely that each node (i.e. item) in different subgraph (i.e. domain) has a different representation, and all these representations of a node are relevant to each other.

However, the existing single graph embedding methods fail to fuse the complex relations in multi-graph, and generate effective node embeddings. The cross-domain scenario poses challenges to transfer the information across domains and learn cross-domain representation. On the other hand, though GCN is effective, stacking many convolutional layers makes GCN difficult to train, as the iterative graph convolution operation is prone to overfit, as stated in (Li et al., 2018). It brings additional complexity and challenges to apply GCN to learn cross-domain (or multi-graph) representation. Thus, to better utilize the power of GCN, dedicated efforts are desired to design a novel neural network architecture based on GCN for cross-domain representation learning, and optimize the neural network efficiently to overcome the disadvantages of GCN.

To address these challenges, in this paper, we propose a novel embedding model, named Deep Multi-Graph Embedding (DMGE). We first construct the item graph as a multi-graph based on users’ sequential behaviors from different domains. Specifically, the nodes in multi-graph represent items, and two nodes are connected by an edge if they consecutively occur in one user’s sequence. Thus, learning the item embedding is converted to learn node embedding in the multi-graph. To utilize the power of GCN on graph embedding, we propose a graph neural network inspired by multi-task learning regime, which extends GCN to learn cross-domain representation. Specifically, each domain is viewed as a task in the model, and we design the domain-specific layers to generate domain-specific representation for each domain, all domains are correlated by the domain-shared layers, which generate domain-shared representation. The model is then trained in an unsupervised manner by learning the graph structure. Besides, to overcome the disadvantages of GCN, we introduce a multiple-gradient descent optimizer to train the proposed model, which can adaptively adjust the weight of each domain. Particularly, It updates the parameters of the domain-shared layers by using the weighted summation of the gradients of all domains, and parameters of the domain-specific layers by using the gradients of the specific domain. The main contributions of our work are summarized as follows:

  • We focus on learning cross-domain representation. Innovatively, we model users’ behaviors in cross-domain as multi-graph, and propose a graph neural network to learn domain-shared and domain-specific representation, simultaneously.

  • We propose a novel embedding model named Deep Multi-Graph Embedding (DMGE), which is a graph neural network based on multi-task learning. Particularly, we present a multiple-gradient descent optimizer to efficiently train the model in an unsupervised manner.

  • We evaluate DMGE on various large-scale real-world datasets, and the experimental results show that DMGE outperforms other state-of-the-art embedding methods in various tasks.

(a) Users’ behavior sequences in multiple domains.
(b) Item graphs.
(c) Multi-graph.
Figure 1. The construction of multi-graph.

2. Deep Multi-Graph Embedding

In this section, we elaborate the Deep Multi-Graph Embedding (DMGE) model for cross-domain item embedding. We first present the problem definition. Then, we propose a multi-graph neural network to learn node embedding in the multi-graph. Finally, we present an multiple-gradient descent optimizer to efficiently train the model in an unsupervised manner.

2.1. Problem Definition

Suppose there are domains. For each domain , we first construct the item graph as an undirected weighted graph . As these domains are correlated and share the same set of items, we then construct the cross-domain item graph as an undirected weighted multi-graph , which contains the node set with nodes and the edge set with types of edges, i.e., .

Our problem can be formally stated as follows, with an undirected weighted multi-graph , and the node feature matrix , representing input for each node as an

-dimensional feature vector, our goal is to learn a set of embedding for all nodes in each subgraph

, i.e., is the node embedding in subgraph , with each node has an -dimensional embedding by solving the following optimization problem:

(1)

where

is the probability that there exists an edge between node

and node in the subgraph , and is the set of neighborhood of node in the subgraph .

2.2. Multi-Graph Neural Network

As is discussed, graph neural networks (GNNs) (Zhou et al., 2018; Xu et al., 2019) have emerged as a powerful approach for representation learning on graphs recently, such as GCN (Kipf and Welling, 2017). Thus we emphasize on applying GCN for multi-graph embedding. In a multi-graph, the same set of nodes are shared in all subgraphs. For each node, it has different neighbors in different subgraph, thus it is likely that it has different representation in different subgraph. Moreover, all these representations belong to the same node, thus they are inherently related to each other. We present two types of representation of nodes in the multi-graph, for each node, it has a shared representation, which denotes the shared information in the multi-graph. Besides, it also has a specific representation in each subgraph, which encodes the specific information in the subgraph.

To learn multiple types of node representations, the architecture of DMGE is presented in Figure 2, which follows the multi-task learning regime (Caruana, 1997; Ruder, 2017). Specifically, the domain-shared layers are graph convolutional layers on multi-graph, which is used to learn shared representation across domains. The domain-specific layers are also graph convolutional layers, which is used to learn specific representation on each subgraph for each domain. The outputs of graph convolutional layers are node embeddings, which model the probability that an link existing between these nodes.

Figure 2. The architecture of DMGE.

The graph convolutional layers can efficiently learn node embedding based on the neighborhood aggregation scheme. In DMGE, the shared graph convolutional layers are used to generate shared node embedding by encoding node attributes and multi-graph structure. Based on the shared embedding, the specific graph convolutional layers are used to generate specific node embedding on each subgraph by the same rule.

To learn shared embedding, the shared graph convolutional layers are defined as follows:

(2)

where is the shared embedding of multi-graph in -th layer, and is the matrix of node feature. is the adjacency matrix of graph with added self-connections, is the adjacency matrix, and if there are any links between node and , and

is the identity matrix.

is a diagonal matrix, and . is the shared weight matrix of -th layer. The output of the -th shared graph convolutional layer is the shared node embedding .

Based on the shared embedding, the specific graph convolutional layers are defined as follows:

(3)

where is the specific embedding of subgraph in -th layer, and is the shared embedding. is the adjacency matrix of subgraph with added self-connections, is the adjacency matrix, and is the weight of edge . is a diagonal matrix, and . is the specific weight matrix of -th layer. The output of the specific graph convolutional layers is the set of node embedding .

To learn node embeddings, we train the neural network in an unsupervised manner by modeling the graph structure. We use the embeddings to generate the linkage between two nodes, i.e. the probability that there exists an edge between these nodes. Therefore, we formulate the embedding learning as a binary classification problem by using the embeddings of two nodes.

The probability that there exists an edge between node and node in subgraph is defined in Eq. (4), and the probability that there exists no edge between node and node in subgraph is defined in Eq. (5):

(4)
(5)

where is the -th row of , which is the embedding vector of node in subgraph .

is the sigmoid function.

Therefore, the objective is to generate the embedding by maximizing the log-likelihood function as follows:

(6)

where is the set of positive samples in subgraph , which contains the tuples with an edge between node and node in subgraph . is the set of negative samples in subgraph . The negative samples are sampled from node set by using negative sampling (Mikolov et al., 2013a; Mikolov et al., 2013b), which contains the tuples with no edge between node and node in subgraph .

Therefore, the objective function is defined as Eq. (7), in which

is the loss function of subgraph

.

(7)

2.3. Optimization

In our model, the parameter set of shared graph convolutional layers is shared across domains, while parameter sets of specific graph convolutional layers are domain-specific. To train DMGE and benefit all domains, we need to optimize all the objectives . In multi-task learning (Caruana, 1997; Ruder, 2017), a commonly used method to optimize the objective function Eq. (7) is to solve the weighted summation of all . However, stacking multiple layers brings additional difficulties to train the model (Li et al., 2018), and it is time-consuming to tune the weight to obtain the optimal solution. Therefore, we formulate the problem as multi-objective optimization, and the optimization objective is defined as follows:

(8)

The goal of Eq. (8) is to find the solution which is optimal for each objective (i.e., each domain). To solve the multi-objective optimization, we introduce a multiple-gradient descent optimizer. Firstly, we state the Karush-Kuhn-Tucker (KKT) conditions (Kuhn and Tucker, 1951) for the multi-objective optimization in Eq. (8), which is a necessary condition for the optimal solution of multi-objective optimization:

(9)

where is the weight of objective .

As proved in (Désidéri, 2012), either the solution to Eq. (10) is 0 and the result satisfies the KKT conditions Eq. (9), or the solution gives a descent direction that improves all objectives in Eq. (8). Thus, solving the KKT conditions Eq. (9) is equivalent to optimizing Eq. (10).

(10)

To clearly illustrate how the optimizer works, we consider the case of two domains. The optimization objective Eq. (10) can be simplified as:

(11)

where is the weight of .

The Eq. (11) is a unary quadratic equation of , and the solution to Eq. (11) is:

(12)

where , .

With the weight , we update the parameters of the model as follows, as the parameter sets and are domain-specific, we update and by using Eq. (13) for each domain, respectively. The parameter set is shared across domains, thus we apply the weighted gradient as a gradient update to the shared parameters as defined in Eq. (14). Notice that is trained by the optimizer according to the gradient of each domain.

(13)
(14)

Finally, we summarize the learning procedure of DMGE in Algorithm 1. In DMGE, the input includes the multi-graph and the node feature matrix . In line 1, we first initialize the parameter sets , and the weight of each domain. Then, we operate convolution on the multi-graph in line 2. For each subgraph , we sample a set of negative samples in line 5. We use the link information to train DMGE, compute the gradients and update the parameters of specific graph convolutional layers in line 6, and we compute the gradients of shared graph convolutional layers in line 7. Based on the gradients, we compute in line 8, and update the parameters of shared graph convolutional layers in line 9. Finally, we return a set of node embeddings in line 12.

Input: A multi-graph , and the node feature matrix

Parameter: , , and

Output: A set of node embedding

1:  Initialize parameters , , and .
2:  Operate convolution on multi-graph by Eq. (2) and Eq. (3).
3:  for  do
4:     for  do
5:        Sample a set of negative samples .
6:        Update .
7:        Compute gradients of : .
8:        Compute by using Eq. (12).
9:        Update .
10:     end for
11:  end for
12:  return A set of node embedding
Algorithm 1 Deep Multi-Graph Embedding

3. Experiments

In this section, we first present the research questions about DMGE. Then, we introduce the datasets and experimental settings. Finally, we present the experimental results to demonstrate the effectiveness of DMGE.

We first present the following three research questions:

  • RQ1: How does DMGE perform in the recommendation task compared with other state-of-the-art embedding methods for recommendation?

  • RQ2: How does the parameter sensitivity affect the performance of DMGE for recommendation?

  • RQ3: How does DMGE perform in the classic task on graph (e.g., link prediction) compared with other state-of-the-art graph embedding methods?

3.1. Datasets

We evaluate our model on two real-world datasets, the details of datasets are as follows and the statistics of datasets are presented in Table 1.

Dataset Domain/Relation Nodes Edges
Tencent App Store Homepage 18,229 548,930
Search 18,229 936,065
Youtube Friendship 15,088 76,765
Co-friends 15,088 1,940,806
Table 1. Statistics of datasets
  • Tencent App Store: It is the App download records from a company App store, which contains recommendation domain and search domain. The time span of the dataset is 31 days, the number of Apps is 18,229, and the number of user is 1,011,567. Based on users’ download records, we construct the item graph for each domain, and the statistics of graph is presented in Table 1. We use this dataset for App recommendation task.

  • Youtube111http://socialcomputing.asu.edu/datasets/YouTube: YouTube dataset (Zafarani and Liu, 2009) consists of two types of relation among users, i.e. friendship and co-friends. Specifically, the friendship relation means two users are friends, and the co-friends means two users have shared friends. We use this dataset for link prediction task.

3.2. Experimental Settings

3.2.1. Baseline Methods

For both tasks, we choose the following state-of-the-art graph embedding methods as baselines:

  • DeepWalk (Perozzi et al., 2014): It applies random walk on graph to generate node sequences, and uses Skip-Gram algorithm to learn embedding. We apply DeepWalk to each subgraph separately.

  • LINE (Tang et al., 2015): It learns node embedding through preserving both local and global graph structures. We apply LINE to each subgraph separately.

  • node2vec (Grover and Leskovec, 2016): It designs a biased random walk procedure, and can explore diverse neighborhoods. We apply node2vec to each subgraph separately.

  • GCN (Kipf and Welling, 2017): It operates convolution on graph, and can generate node embedding based on neighborhoods. We apply GCN to each subgraph separately.

  • mGCN (Ma et al., 2019): It applies graph convolutional networks for multi-graph embedding. It can generate both general embeddings to capture the information for nodes over the entire graph and dimension-specific embeddings to capture the information for nodes in each subgraph.

  • DMGE (): It is a variant of DMGE. It defines the objective function as the weighted summation of in Eq. (8) for multi-graph embedding, in which is the weight of the first domain.

For the App recommendation task, besides the above baselines, we also compare with the matrix factorization (MF(Koren et al., 2009), which factorizes user-item matrix into user embedding and item embedding, respectively. We apply MF to each domain separately.

3.2.2. Evaluation Metrics

To evaluate the performance of recommendation, we compare the recommended top- list with the corresponding ground truth list for each user , and use the following metrics to evaluate the top- recommended results:

  • Recall@: It calculates the fraction of the ground truth (i.e., the user downloaded Apps) that are recommended by different algorithms in Eq. (15), where is the user set, denotes the number of downloaded Apps hits in the candidate top- App list for user , and denotes the number of downloaded App list of user . A larger value of recall@ means better performance.

    (15)
  • MRR@: Mean Reciprocal Rank (MRR) uses the multiplicative inverse of the rank of the first hit item among top- item list to evaluate the performance of rank in Eq. (16), where is the rank of the first hit item. A larger value of MRR@ means better performance.

    (16)

To evaluate the performance of link prediction, we use the metrics of binary classification: AUC and F1.

3.2.3. Model Parameters

The parameters of DMGE are set as follows:

  • Network architecture. The number of shared and specific graph convolutional layers are both 1, shared hidden size is 64, and specific hidden size is 16.

  • Initialization. The node feature matrix can be initialized randomly, or by other embedding methods, we initialize it as the identity matrix.

  • Gradient normalization. We normalize the gradient of shared parameter of each domain, and then use the normalized gradient to calculating in Eq. (12). The normalized gradient of domain is , where is the unnormalized gradient.

  • Other hyper-parameters

    . The number of negative samples is 2; the embedding dimension is 16; the dropout of shared graph convolutional layers is 0.3 and that is 0.1 of specific graph convolutional layers; the batch size is 256 and we train the model for a maximum of 10 epochs using Adam.

In all methods, the dimension of embedding is set to 16. The parameters of baselines are fine-tuning, and set as follows:

  • MF. It is implemented using LibMF222https://www.csie.ntu.edu.tw/ cjlin/libmf/.

  • DeepWalk. The length of context window is 5; the length of random walk is 20; the number of walks per node is 50.

  • LINE. The number of negative samples is 2.

  • node2vec. The length of context window is 5; the length of random walk is 20; the number of walks per node is 50; the number of negative samples is 2; is 1 and is 0.25.

  • GCN. The number of graph convolutional layers is 1.

  • mGCN. The initial general representation size is 64, other parameter settings are the same as (Ma et al., 2019), and we train the model for a maximum of 20 epochs using Adam.

  • DMGE (). Considering that both domains are important, we set the weight to 0.5; the other parameter settings are the same as DMGE.

3.3. Embedding for Recommendation

To demonstrate the performance of DMGE in recommendation task (RQ1), we compare DMGE with other state-of-the-art embedding methods. The intuition is that learning better item embeddings will achieve better performance of recommendation.

Generally, users’ preferences can be characterized by the items they have interacted with, thus we represent users by aggregating embeddings of their interacted items. There are several ways to aggregate item embeddings, such as: average (Zhao et al., 2018), RNN (Okura et al., 2017). We apply average here, and represent users by using the average item embeddings of their interacted items:

(17)

where is the embedding of user in domain , is the number of items user has interacted with, and is the embedding of item in domain .

For each domain, we measure user-item similarity by computing the cosine distance between user embedding and item embedding. Based on the user-item similarity, we then generate candidate top- items for each user. We use consecutive 26 days data to train item embedding, and measure the performance of recommendation in the next 5 days by using the metric Recall@ and MRR@. The performance of different methods for recommendation domain and search domain is presented in Table 2, Table 3, Table 4 and Table 5. (Note that the best results are indicated by the bold font.)

Based on the results, we have the following observations:

  • We first compare the performance of single-domain methods, including: MF, DeepWalk, LINE, node2vec and GCN. We can observe the graph embedding methods outperforms MF, as MF only takes into account the explicit user-item interactions, while ignoring item co-occurrences in users’ behaviors, which can be captured by graph embedding methods.

  • The overall performance of cross-domain methods (i.e., mGCN, DMGE (), DMGE) is better than the single domain methods, which demonstrates that fusing information from correlated domains is helpful to learn better cross-domain representation, and can improve the performance of recommendation in both domains. When is less than 40 in recommendation domain and is less than 30 in search domain, the Recall of mGCN is worse than the single domain methods, the possible reason is that weight between within-domain and across-domain in mGCN is a hyper-parameter to be tuned, and can not be adaptively learned by the importance of each domain. Both DMGE and DMGE () are consistently outperforms the single domain methods.

  • Compared the cross-domain embedding methods, both DMGE () and DMGE outperform mGCN, which indicates that our proposed graph neural network is effective to learn better representations.

  • DMGE outperforms DMGE () (except Recall@1000 in recommendation domain). The average of in DMGE is 0.4409, thus when , DMGE () can also achieve good performance. However, in DMGE (), it is time-consuming and computationally expensive to tune the hyper-parameter to obtain the optimal result. While in DMGE, is a trainable parameter. Thus, we recommend to use the multiple-gradient descent optimizer to train the model.

Overall, the proposed DMGE outperforms the state-of-the-art embedding methods, and improves the performance of recommendation in both domains.

Domain Recall@ 10 20 30 40 50 60 70 80 90 100 1000
Single MF 0.0301 0.0453 0.0565 0.0658 0.0739 0.0812 0.0880 0.0942 0.1002 0.1065 0.2932
DeepWalk 0.0730 0.1104 0.1363 0.1558 0.1720 0.1853 0.1975 0.2082 0.2186 0.2273 0.4744
LINE 0.0471 0.0728 0.0933 0.1106 0.1258 0.1395 0.1525 0.1642 0.1754 0.1861 0.4977
node2vec 0.0345 0.0579 0.0773 0.0936 0.1080 0.1207 0.1324 0.1436 0.1534 0.1630 0.4574
GCN 0.0743 0.1078 0.1317 0.1516 0.1688 0.1848 0.1977 0.2098 0.2217 0.2321 0.5624
Cross mGCN 0.0431 0.0835 0.1273 0.1677 0.2002 0.2142 0.2261 0.2383 0.2505 0.2627 0.6323
DMGE () 0.1019 0.1607 0.2069 0.2436 0.2762 0.3035 0.3260 0.3471 0.3660 0.3826 0.7016
DMGE 0.1024 0.1661 0.2109 0.2455 0.2767 0.3042 0.3277 0.3484 0.3669 0.3831 0.6929
Table 2. Recall@ Performance of Different Methods in Recommendation Domain
Domain Recall@ 10 20 30 40 50 60 70 80 90 100 1000
Single MF 0.0150 0.0251 0.0335 0.0408 0.0474 0.0533 0.0589 0.0642 0.0691 0.0739 0.2679
DeepWalk 0.0638 0.1043 0.1338 0.1571 0.1761 0.1924 0.2064 0.2185 0.2291 0.2387 0.4676
LINE 0.0392 0.0603 0.0769 0.0909 0.1030 0.1138 0.1238 0.1331 0.1414 0.1495 0.4293
node2vec 0.0289 0.0471 0.0622 0.0753 0.0870 0.0982 0.1082 0.1174 0.1260 0.1346 0.4176
GCN 0.0499 0.0764 0.0956 0.1111 0.1242 0.1363 0.1472 0.1575 0.1666 0.1754 0.4905
Cross mGCN 0.0478 0.0939 0.1454 0.1938 0.2218 0.2328 0.2399 0.2480 0.2565 0.2653 0.5920
DMGE () 0.0823 0.1363 0.1784 0.2134 0.2415 0.2652 0.2857 0.3037 0.3206 0.3360 0.6254
DMGE 0.0885 0.1467 0.1900 0.2238 0.2517 0.2759 0.2971 0.3162 0.3328 0.3473 0.6263
Table 3. Recall@ Performance of Different Methods in Search Domain
Domain Recall@ 10 20 30 40 50 60 70 80 90 100 1000
Single MF 0.0149 0.0170 0.0180 0.0185 0.0188 0.0191 0.0193 0.0194 0.0196 0.0197 0.0208
DeepWalk 0.0510 0.0549 0.0563 0.0571 0.0575 0.0578 0.0581 0.0582 0.0584 0.0585 0.0594
LINE 0.0371 0.0398 0.0410 0.0417 0.0421 0.0424 0.0427 0.0429 0.0431 0.0432 0.0444
node2vec 0.0265 0.0290 0.0302 0.0308 0.0312 0.0315 0.0317 0.0319 0.0321 0.0322 0.0334
GCN 0.0558 0.0592 0.0606 0.0613 0.0619 0.0622 0.0625 0.0627 0.0628 0.0630 0.0642
Cross mGCN 0.0264 0.0311 0.0338 0.0354 0.0364 0.0367 0.0370 0.0372 0.0373 0.0375 0.0389
DMGE () 0.0697 0.0756 0.0780 0.0793 0.0801 0.0807 0.0811 0.0814 0.0816 0.0817 0.0829
DMGE 0.0699 0.0761 0.0785 0.0797 0.0805 0.0810 0.0814 0.0817 0.0819 0.0821 0.0832
Table 4. MRR@ Performance of Different Methods in Recommendation Domain
Domain Recall@ 10 20 30 40 50 60 70 80 90 100 1000
Single MF 0.0120 0.0134 0.0141 0.0145 0.0148 0.0150 0.0151 0.0152 0.0153 0.0154 0.0164
DeepWalk 0.0468 0.0515 0.0534 0.0543 0.0549 0.0553 0.0556 0.0558 0.0560 0.0561 0.0571
LINE 0.0345 0.0372 0.0384 0.0390 0.0395 0.0398 0.0400 0.0402 0.0403 0.0405 0.0417
node2vec 0.0237 0.0260 0.0271 0.0278 0.0283 0.0286 0.0289 0.0290 0.0292 0.0293 0.0307
GCN 0.0405 0.0437 0.0450 0.0457 0.0462 0.0465 0.0468 0.0470 0.0471 0.0473 0.0486
Cross mGCN 0.0324 0.0382 0.0415 0.0435 0.0443 0.0446 0.0448 0.0449 0.0450 0.0452 0.0465
DMGE () 0.0592 0.0653 0.0678 0.0691 0.0700 0.0705 0.0709 0.0712 0.0714 0.0716 0.0728
DMGE 0.0629 0.0693 0.0718 0.0731 0.0739 0.0744 0.0748 0.0751 0.0753 0.0755 0.0766
Table 5. MRR@ Performance of Different Methods in Search Domain

3.4. Parameter Sensitivity

The key parameter that affects the performance of embedding is the dimension size (RQ2), we analyze how does the dimension size of learned embedding in DMGE affect the performance of recommendation. In particular, we test the dimension size . Figure 3

show the results of different embedding dimension in recommendation and search domain, and the evaluation metric is Recall@

.

As shown in Figure 3, in both domains, when the dimension of embedding is 16, our model performs best regarding the metric Recall@100. Therefore, we set the dimension of embedding as 16.

Figure 3. The parameter sensitivity analysis of embedding dimension for recommendation and search domain.

3.5. Link Prediction

To demonstrate the performance of DMGE in the link prediction task (RQ3), we compare DMGE with other state-of-the-art graph embedding methods. The intuition is that learning better node embeddings will achieve better performance of link prediction.

In the multi-graph, we perform link prediction in different subgraph separately. In each subgraph, we randomly remove 30% of edges, and we aim to predict whether these removed edges exist. We formulate the link prediction task as a binary classification problem by using the embeddings of two nodes, and there are two types of combination: element-wise addition, element-wise multiplication.

In training set, we use the remaining node pairs as positive samples, and randomly sample an equal number of not connected node pairs as negative samples. In testing set, we use the removed node pairs as positive samples, and randomly sample an equal number of not connected node pairs as negative samples. We train a binary classifier using logistic regression on the training set, and evaluate the performance of link prediction on the testing set. For each method, we select the optimal combination of embeddings and present the best results. The results of different methods are presented in Figure 

4, including the results of each relation, and the average performance over all dimensions.

(a) AUC of different methods.
(b) F1 of different methods.
Figure 4. The performance of different methods in link prediction.

Based on the results, we have the following observations:

  • The multi-graph embedding methods (i.e., mGCN, DMGE (), DMGE) outperforms the single graph embedding methods (i.e., DeepWalk, LINE, node2vec, GCN), which indicates that using multiple relations in the multi-graph is helpful to learn better representation.

  • DMGE () and DMGE outperform mGCN, which indicates that our proposed graph neural network is effective to learn better representations.

  • The average performance of DMGE is better than DMGE (), which indicates the effectiveness of training the model using multi-objective optimization.

3.6. Discussion

3.6.1. The Usage of Embedding

The embeddings of DMGE can be used for candidate items generation in the recall stage. Through calculating the pairwise similarities between the embeddings of users and items, we can generate a candidate set of items which users may like, and the candidate set can be further used in the ranking stage to generate the final recommendation set of items (Covington et al., 2016)

. Besides, the embeddings can also be used for transfer learning 

(Ni et al., 2018), and alleviating the sparsity and cold start problem (Zhao et al., 2018).

3.6.2. Cross-Domain Representation Leaning

Not only designed for two domains, DMGE can also easily be extended to more domains. Using the Frank-Wolfe algorithm (Frank and Wolfe, 1956; Jaggi, 2013), we can solve the optimization problem in Eq. (10) efficiently, when there are more domains.

3.6.3. Scalability

The graph convolutional layers in DMGE adopt the graph convolution operator defined in GCN (Kipf and Welling, 2017). However, GCN requires the full graph Laplacian, thus it is computationally expensive to apply GCN for large-scale graph embedding.

To apply DMGE for large-scale multi-graph embedding, we have the following strategies: 1) we can adopt GraphSAGE (Hamilton et al., 2017a) as the graph convolutional layers in DMGE, as GraphSAGE generates embeddings by sampling and aggregating features from a node’s local neighborhood, and only requires local graph structures; 2) we can replace the graph convolutional layers in DMGE with the graph attentional layers, which are presented in GAT (Veličković et al., 2018), as GAT is computationally efficient and parallelizable across all nodes in the graph, and doesn’t require the entire graph structure upfront.

4. Related Work

4.1. Embedding Methods

Representation learning  (Bengio et al., 2013)

is one of the most fundamental problems in deep learning. As a practical application, effective embedding has been proved to be useful and achieve significant improvement in recommender systems (RSs) including: E-commerce 

(Zhao et al., 2018; Wang et al., 2018), search ranking (Chu et al., 2018; Grbovic and Cheng, 2018) and social media (Ying et al., 2018).

The embedding methods in RSs can be divided into two categories: word embedding based methods and graph embedding based methods. The word embedding based methods (Zhao et al., 2018; Grbovic and Cheng, 2018) learn embedding by modeling the item co-occurrence in users’ behavior sequences. Specifically, they model the items as words and user’s behavior sequences as sentences, and apply word embedding methods (Mikolov et al., 2013a; Mikolov et al., 2013b) to represent items in a low-dimensional space. The graph embedding based methods (Ying et al., 2018; Wang et al., 2018) construct item graph based on users’ behaviors, they model the items as nodes and item co-occurrences as edges, and apply the graph embedding methods (Hamilton et al., 2017b; Cui et al., 2018; Perozzi et al., 2014; Hamilton et al., 2017a) to learn embedding. However, these embedding methods are developed to learn embedding in a single domain, which fail to learn effective cross-domain embedding. Although there are several cross-domain recommendation methods (Man et al., 2017), they aim to improve the recommendation in target domain by transferring the information from source domain. In our work, we adopt graph neural network to learn more effective cross-domain embeddings to benefit all domains.

4.2. Graph Neural Networks

Graph neural networks (GNNs) (Zhou et al., 2018; Xu et al., 2019) have emerged as a powerful approach for representation learning on graphs recently, such as GCN (Kipf and Welling, 2017), GraphSAGE (Hamilton et al., 2017a). Through a recursive neighborhood aggregation scheme, GNNs can generate node embedding by aggregating features of neighbors. In this part, we focus on reviewing related works about the convolution based GNNs, which can be categorized as spectral approaches and non-spectral approaches.

The spectral approaches depend on the theory of spectral graph convolutions. Bruna et al(Bruna et al., 2014) first propose a generalization of convolutional neural networks (CNNs) to graphs, however, it is computationally expensive. Defferrard et al(Defferrard et al., 2016) design -localized convolutional filters on graphs based on spectral graph theory, which is more computationally efficient. Kipf et al(Kipf and Welling, 2017) limit the layer-wise convolution operation to to avoid overfitting, and propose the graph convolutional network (GCN), which can be applied to encode both local graph structure and features of nodes through layer-wise propagation. The non-spectral approaches operate spatial convolutions on the graph. Hamilton et al(Hamilton et al., 2017a) propose GraphSAGE to generates node embeddings by sampling and aggregating features from a node’s local neighborhood, which can be applied for large-scale graph embedding. However, these GNNs are developed for single graph embedding, which fail to learn effective multi-graph embedding.

5. Conclusion

In this paper, we focus on learning effective cross-domain representation. We propose the Deep Multi-Graph Embedding (DMGE) model, which is a multi-graph neural network based on multi-task learning. We construct the item graphs as a multi-graph based on users’ behaviors from different domains, and then design a graph neural network to learn multi-graph embedding in an unsupervised manner. Particularly, we introduce a multiple-gradient descent optimizer for efficiently training the model. We evaluate our approach on various large-scale real-world datasets, and the experimental results show that DMGE outperforms other state-of-the-art embedding methods in various tasks.

6. Acknowledgments

This work was partially supported by the National Key R&D Program of China (2017YFB1001800) and the National Natural Science Foundation of China (No. 61772428, 61725205).

References

  • (1)
  • Bengio et al. (2013) Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798–1828.
  • Bruna et al. (2014) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations. 1–14.
  • Caruana (1997) Rich Caruana. 1997. Multitask learning. Machine learning 28, 1 (1997), 41–75.
  • Cheng et al. (2016) Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7–10.
  • Chu et al. (2018) Chen Chu, Zhao Li, Beibei Xin, Fengchao Peng, Chuanren Liu, Remo Rohs, Qiong Luo, and Jingren Zhou. 2018. Deep Graph Embedding for Ranking Optimization in E-commerce. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2007–2015.
  • Covington et al. (2016) Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. ACM, 191–198.
  • Cui et al. (2018) Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering (2018).
  • Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844–3852.
  • Désidéri (2012) Jean-Antoine Désidéri. 2012. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization. Comptes Rendus Mathematique 350, 5-6 (2012), 313–318.
  • Frank and Wolfe (1956) Marguerite Frank and Philip Wolfe. 1956. An algorithm for quadratic programming. Naval research logistics quarterly 3, 1-2 (1956), 95–110.
  • Grbovic and Cheng (2018) Mihajlo Grbovic and Haibin Cheng. 2018. Real-time personalization using embeddings for search ranking at Airbnb. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 311–320.
  • Grover and Leskovec (2016) Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855–864.
  • Guo et al. (2017) Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In

    Proceedings of the 26th International Joint Conference on Artificial Intelligence

    . 1725–1731.
  • Hamilton et al. (2017a) Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017a. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024–1034.
  • Hamilton et al. (2017b) William L. Hamilton, Rex Ying, and Jure Leskovec. 2017b. Representation Learning on Graphs: Methods and Applications. IEEE Data Engineering Bulletin 40, 3 (2017), 52–74.
  • Hidasi et al. (2016) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016.

    Session-based recommendations with recurrent neural networks. In

    International Conference on Learning Representations. 1–10.
  • Jaggi (2013) Martin Jaggi. 2013. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In International Conference on Machine Learning. 427–435.
  • Kipf and Welling (2017) Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations. 1–14.
  • Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 8 (2009), 30–37.
  • Kuhn and Tucker (1951) HW Kuhn and AW Tucker. 1951. Nonlinear Programming. In Second Berkeley Symposium on Mathematical Statistics and Probability. 481–492.
  • Li et al. (2018) Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018.

    Deeper insights into graph convolutional networks for semi-supervised learning. In

    Thirty-Second AAAI Conference on Artificial Intelligence.
  • Ma et al. (2019) Yao Ma, Suhang Wang, Charu C Aggarwal, Dawei Yin, and Jiliang Tang. 2019. Multi-dimensional Graph Convolutional Networks. In Proceedings of the 2019 SIAM International Conference on Data Mining.
  • Man et al. (2017) Tong Man, Huawei Shen, Xiaolong Jin, and Xueqi Cheng. 2017. Cross-Domain Recommendation: An Embedding and Mapping Approach.. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. 2464–2470.
  • Mikolov et al. (2013a) Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a.

    Efficient estimation of word representations in vector space. In

    Workshop on International Conference on Learning Representations. 1–11.
  • Mikolov et al. (2013b) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.
  • Ni et al. (2018) Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Anxiang Zeng, and Luo Si. 2018. Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 596–605.
  • Okura et al. (2017) Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based news recommendation for millions of users. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1933–1942.
  • Perozzi et al. (2014) Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701–710.
  • Ruder (2017) Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017).
  • Tang et al. (2015) Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, 1067–1077.
  • Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In International Conference on Learning Representations. 1–12.
  • Wang et al. (2018) Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 839–848.
  • Wang et al. (2019) Yaqing Wang, Chunyan Feng, Caili Guo, Yunfei Chu, and Jenq-Neng Hwang. 2019. Solving the Sparsity Problem in Recommendations via Cross-Domain Item Embedding Based on Co-Clustering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 717–725.
  • Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations. 1–17.
  • Ying et al. (2018) Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 974–983.
  • Zafarani and Liu (2009) R. Zafarani and H. Liu. 2009. Social Computing Data Repository at ASU. http://socialcomputing.asu.edu
  • Zhao et al. (2018) Kui Zhao, Yuechuan Li, Zhaoqian Shuai, and Cheng Yang. 2018. Learning and Transferring IDs Representation in E-commerce. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1031–1039.
  • Zhou et al. (2018) Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph Neural Networks: A Review of Methods and Applications. arXiv preprint arXiv:1812.08434 (2018).
  • Zhuang et al. (2017) Fuzhen Zhuang, Yingmin Zhou, Fuzheng Zhang, Xiang Ao, Xing Xie, and Qing He. 2017. Sequential Transfer Learning: Cross-domain Novelty Seeking Trait Mining for Recommendation. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 881–882.