GESF: A Universal Discriminative Mapping Mechanism for Graph Representation Learning

05/28/2018 ∙ by Shupeng Gui, et al. ∙ King Abdullah University of Science and Technology University of Michigan University of Rochester 0

Graph embedding is a central problem in social network analysis and many other applications, aiming to learn the vector representation for each node. While most existing approaches need to specify the neighborhood and the dependence form to the neighborhood, which may significantly degrades the flexibility of representation, we propose a novel graph node embedding method (namely GESF) via the set function technique. Our method can 1) learn an arbitrary form of representation function from neighborhood, 2) automatically decide the significance of neighbors at different distances, and 3) be applied to heterogeneous graph embedding, which may contain multiple types of nodes. Theoretical guarantee for the representation capability of our method has been proved for general homogeneous and heterogeneous graphs and evaluation results on benchmark data sets show that the proposed GESF outperforms the state-of-the-art approaches on producing node vectors for classification tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph node embedding is to learn a mapping that represents nodes as points in a low-dimensional vector space , where the geometric relationship reflects the structure of the original graph. Nodes that are “close” in the graph are embedded to have similar vector representations (survey-CAI). The learned node vectors benefit a number of graph analysis tasks, such as node classification (bhagat2011node), link prediction (liben2007link), community detection (fortunato2010community), and many others (survey-jure). In order to preserve the node geometric relations in an embedded space, the similarity/proximity/distance of a node to its neighborhood is generally taken as input to different graph embedding approaches. For example, matrix-factorization approaches work on pre-defined pairwise similarity measures (e.g., different order of adjacency matrix). Deepwalk (perozzi2014deepwalk), node2vec (grover2016node2vec) and other recent approaches (dong2017metapath2vec) consider flexible, stochastic measure of node similarity by the node co-occurrence on short random walks over the graph (survey-Goyal)

. Neighborhood autoencoder methods compress the information about a node’s local neighborhood that is described as a neighborhood vector containing the node’s pairwise similarity to all other nodes in the graph

(SDNE; DNGR)

. Neural network based approaches such as graph convolutional networks (GCN) and GraphSAGE apply convolution like functions on its surrounding neighborhood for aggregating neighborhood information

(kipf2016semi; GraphSAGE).

Although effective, all existing approaches need to specify the neighborhood and the dependence form to the neighborhood, which significantly degrades their flexibility for general graph representation learning. In this work, we propose a novel graph node embedding method, namely Graph Embedding via Set Function (GESF), that can

  • learn node representation via a universal graph embedding function

    , without pre-defining pairwise similarity, specifying random walk parameters, or choosing aggregation functions among element-wise mean, a max-pooling neural network, or LSTMs;

  • capture the arbitrary relationship between neighbors at different distance to the target node, and automatically decide the significance;

  • be generally applied to any graphs, from simple homogenous graphs to heterogeneous graphs with complicated types of nodes.

The core difficulty of graph node embedding is to characterize an arbitrary relationship to the neighborhood. From a local view of point, switching any two neighbors of a node in the same category would not affect the representation of this node. Based on this key observation, we propose to learn the embedding vector of a node via a partial permutation invariant set function applied on its neighbors’ embedding vector. We provide a neat form to represent such set function and prove that it can characterize an arbitrary partial permutation set function. Evaluation results on benchmark data sets show that the proposed GESF outperforms the state-of-the-art approaches on producing node vectors for classification tasks.

2 Related Work

The main difference among various graph embedding methods lies in how they define the “closeness” between two nodes (survey-CAI). First-order proximity, second-order proximity or even high-order proximity have been widely studied for capturing the structural relationship between nodes (tang2015line; yang2017fast). In this section, we discuss the relevant graph embedding approaches in terms of how node closeness to neighboring nodes is measured, for highlighting our contribution on utilizing neighboring nodes in a most general manner. Comprehensive reviews of graph embedding can be found in (survey-CAI; survey-jure; survey-Goyal; yang2017fast).

Matrix Analysis on Graph Embeddding

As early as 2011, a spectral clustering method

(tang2011leveraging)

took the eigenvalue decomposition of a normalized Laplacian matrix of a graph as an effective approach to obtain the embeddings of nodes. Other similar approaches work on different node similarity matrix by applying various similarity functions to make a trade-off between modeling the “first-order similarity” and “higher-order similarity”

(GraRep; HOPE). Node content information can also be easily fused in the pairwise similarity measure, e.g., in TADW (yang2015network), as well as node label information, which resulting in semi-supervised graph embedding methods, e.g., MMDW in (tu2016max).

Random Walk on a Graph to Node Representation Learning

Both deepwalk (perozzi2014deepwalk) and node2vec (grover2016node2vec) are outstanding graph embedding methods to solve the node representation learning problem. They convert the graph structures into a sequential context format with random walk (lovasz1993random). Thanks to the invention of (mikolov2013distributed) for word representation learning of sentences, deepwalk inherited the learning framework for words representation learning in paragraphs to generate the representation of nodes in random walk context. And then node2vec evolved such the idea with additional hyper-parameter tuning for the trade-off between DFS and WFS to control the direction of random walk. Planetoid (yang2016revisiting)

proposed a semi-supervised learning framework by guiding random walk with available node label information.

Neighborhood Encoders to Graph Embedding

There are also methods focusing on aggregating or encoding the neighbors’ information to generate node embeddings. DNGR (DNGR) and SDNE (wang2016structural) introduce the autoencoder to construct the similarity function between the neighborhood vectors and the embedding of the target node. DNGR defines neighborhood vectors based on random walks and SDNE introduces adjacency matrix and Laplacian eigenmaps to the definition of neighborhood vectors. Although the idea of autoencoder is a great improvement, these methods are painful when the scale of the graph is up to millions of nodes. Therefore, methods with neighborhood aggregation and convolutional encoders are involved to construct a local aggregation for node embedding, such as GCN (kipf2016semi; kipf2016variational; schlichtkrull2017modeling; van2017graph), column networks (pham2017column) and the GraphSAGE algorithm (GraphSAGE). The main idea of these methods is involving an iterative or recursive aggregation procedure e.g., convolutional kernels or pooling procedures to generate the embedding vectors for all nodes and such aggregation procedures are shared by all nodes in a graph.

The above-mentioned methods work differently on using neighboring nodes for node representation learning. They require on pre-defining pairwise similarity measure between nodes, or specifying random walk parameters, or choosing aggregation functions. In practice, it takes non-trivial effort to tune these parameters or try different measures, especially when graphs are complicated with nodes in multiple types, i.e., heterogeneous graphs. This work hence targets on making neighboring nodes play their roles in a most general manner such that their contributions are learned but not user-defined. The resultant embedding method has the flexibility to work on any types of homogeneous and heterogeneous graph. The heterogeneity of graph nodes is handled by a heterogeneous random walk procedure in dong2017metapath2vec and by deep neural networks in chang2015heterogeneous. GESF has a natural advantage on avoiding any manual manipulation of random walking strategies or designs for the relationships between different types of nodes. To the invention of set functions in (zaheer2017deep), all existing valid mapping strategies from neighborhood to the target nodes can be represented by the set functions which are learnt by GESF automatically.

3 A Universal Graph Embedding Model based on Set Function

In this section, we first introduce a universal mapping function to generate the embedding vectors for nodes in a graph via involving the neighborhood with various steps to the target nodes for the graph embedding learning and then we propose a permutation invariant set function as the universal mapping function. Sequentially, matrix function is introduced to process the knowledge of different orders of neighborhood. At last, we propose the overall learning model to solve the proper embedding vectors of nodes respect to a specific learning problem.

3.1 A universal graph embedding model

We target on designing graph embedding models for the most general graph that may include different types of nodes. Formally, a graph , where the node set , i.e., is composed of disjoint types of nodes. One instance of such a graph is the academic publication network, which includes different types of nodes for papers, publication venues, author names, author affiliations, research domains etc.

Given a graph , our goal is to learn the embedding vector for each node in this graph. As we know, the position of a node in the embedded space is collaboratively determined by its neighboring nodes. Therefore, we propose a universal embedding model where the embedding vector of node can be represented by its neighbors’ embedding vectors via a set function

where is a matrix with column vectors corresponding to the embedding of node ’s neighbors in type . Note that the neighbors can be step-1 (or immediate) neighbors, step-2 neighbors, or even higher degree neighbors. However, all neighboring nodes that are steps reachable from a node play the same role when localizing this node in the embedded space. Therefore, function should be a partially permutation invariant function. That is, if we swap any columns in each , the function value remains the same. Unfortunately, the set function is not directly learnable due to the permutation property.

One straightforward idea to represent the partially permutation invariant function is to define it in the following form

(1)

where denotes the set of dimensional permutation matrices, and denote the representation matrix consisting of the vectors in , respectively. is to permute the columns in . It is easy to verify that the function defined in (1) is partially permutation invariant, but it is almost not learnable because it involves “sum” items.

Our solution of learning function is based on the following important theorem, which gives a neat manner to represent any partially permutation invariant function. The proof is in the Appendix.

Theorem 3.1.

Let be a continuous real-valued function defined on a compact set with the following form

If function is partially permutation invariant, that is, any permutations of the values within the group for any does not change the function value, then there must exist functions and to approximate with arbitrary precision in the following form

(2)

Based on this theorem, we only need to parameterize and to learn the node embedding function. We next formulate the embedding model when considering different order of neighborhood.

1-step neighbors

From Theorem 3.1, any mapping function of a node can be characterized by appropriately defined functions , and :

where denotes the step- neighbors of node in node type .

Multi-step neighbors

High order proximity has been shown beneficial on generating high quality embedding vectors (yang2017fast). Extending the 1-step neighbor model, we can have the more general model where the representation of each node could depend on immediate (1-step) neighbors, 2-step neighbors, 3-step neighbors, and even infinite-step neighbors.

(3)

where are the weights for neighbors at different steps. Let be the adjacent matrix indicating all edges by . If we define the polynomial matrix function on the adjacent matrix as , we can cast (3) into its matrix form

(4)

where denotes the representation matrix for nodes in type , denotes the submatrix of indexed by column and rows in , and function (with on the top of function ) is defined as the function extension

Note that the embedding vectors for different type of nodes may be with different dimensions. Homogeneous graph is a special case of the heterogeneous graph with . The above proposed model is thus naturally usable on homogeneous graph.

To avoid optimizing infinite number of coefficients, we propose to use a 1-dimensional NN function to equivalently represent the function to reduce the number of parameters based on the following observations

where

is the singular value decomposition. We parameterize

using 1-dimensional NN, which allows us easily controlling the number of variables to optimize in by choosing the number of layers and the number of nodes in each layer.

3.2 The overall model

For short, we denote the representation function for in (4) by

To fulfill the requirement of a specific learning task, we propose the following learning model involving a supervised component

(5)

where denotes the set of labeled nodes, and

balances the representation error and prediction error. The first unsupervised learning component restricts the representation error between the target node and its neighbors with

norm since it is allowed to have noise in a practical graph. And the supervised component is flexible to be replaced with any designed learning task on the nodes in a graph. For example, to a regression problem, a least square loss can be chosen to replace and a cross entropy loss can be used to formulate a classification learning problem. To solve the problem in Eq. (5

), we apply a stochastic gradient descent algorithm (SGD) to compute the effective solutions for the learning variables simultaneously.

4 Experiments

This section reports experimental results to validate the proposed method, comparing to state-of-the-art algorithms on benchmark datasets including both homogenous and heterogenous graphs.

4.1 Comparison on homogeneous graphs

We consider the multi-class classification problem over the homogeneous graphs. Given a graph with partially labeled nodes, the goal is to learn the representation for each node for predicting the class for unlabelled nodes.

Datasets

We evaluate the performance of GESF and methods for comparison on five datasets.

  • Cora (mccallum2000automating) is a paper citation network. Each node is a paper. There are 2708 papers and 5429 citation links in total. Each paper is associated with one of 7 classes.

  • CiteSeer (giles1998citeseer)

    is another paper citation network. CiteSeer contains 3,312 papers and 4,732 citations in total. All these papers have been classified into 6 classes.

  • Pubmed (sen2008collective) is a larger and more complex citation networks compared to previous two datasets. There are 19,717 vertexes and 88,651 citation links. Papers are classified into 3 classes.

  • Wikipedia (sen2008collective) contains 2,405 online webpages. The 17,981 links are undirected between pairs of them. All these pages are from 17 categories.

  • Email-eu (leskovec2007graph) is an Email communication network which illustrates the email communication relationships between researchers in a large European research institution. There are 1,005 researchers and 25,571 links between them. Department affiliations (42 in total) of the researchers are considered as labels to predict.

Baseline methods

The compared baseline algorithms are listed below:

  • Deepwalk (perozzi2014deepwalk) is an unsupervised graph embedding method which relies on the random walk and word2vec method. For each vertex, we take 80 random walks with length 40, and set windwo size 10. Since deepwalk is unsupervised

    , we apply a logistic regression on the generated embeddings for node classification.

  • Node2vec (grover2016node2vec) is an improved graph embedding method based on deepwalk. We set the window size as 10, the walk length as 80 and the number of walks for each node is set to 100. Similarly, the node2vec is unsupervised as well. We apply the same evaluation procedure on the embeddings of node2vec as what we did for deepwalk.

  • MMDW (tu2016max) is a semi-supervised learning framework of graph embedding which combines matrix decomposition and SVM classification. We tune the method multiple times and take 0.01 as the hyper-parameter in the method which is recommended by authors.

  • Planetoid (yang2016revisiting) is a semi-supervised learning framework. We mute the node attributes involved in planetoid since we only focus on the information abstraction from the graph structures.

  • GCN (Graph Convolutional Networks) (kipf2016semi)

    applies the convolutional neural networks into the

    semi-supervised embedding learning of graph. We eliminate the node attributes for fairness as well.

Experiment setup and results

For fair comparison, the dimension of representation vectors is chosen to be the same for all algorithms (the dimension is ). The hyper-parameters are fine-tuned for all of them. The details of GESF for multi-class case are as follows.

  • Supervised Component: Softmax function is chosen to formulate our supervised component in Eq. (5) which is defined as . For an arbitrary embedding

    , we have the probability term as

    for such node to be predicted as class , where is a classifier for class . Therefore, the whole supervised component in Eq. (5) is , where is an regularization for and is chosen to be .

  • Unsupervised embedding mapping Component: We design a two-layer NN with hidden dimension 64 to form the mapping from embeddings of neighbors to the target node and we also form a two-layer 1-to-1 NN with a 3 dimensional hidden layer to construct the matrix function for the adjacency matrix. We pre-process the matrix with an eigenvalue decomposition where the procedure preserve the highest 1000 eigenvalues in default. The balance hyper-parameter is set to be .

We take experiments on each data set and compare the performance among all methods mentioned above. Since it is the multi-class classification scenario, we use Accuracy as the evaluation criterion. The percentage of labeled samples is chosen from to and the remaining samples are used for evaluation, except for planetoid (yang2016revisiting). Fixed training and testing dataset are used due to the optimization strategy of planetoid, which is dependent upon the matching order of the vertexes in the graph and the order in the training and test set. Therefore, we only provide the results of planetoid

of one time. All experiments are repeated for three times and we report the mean and standard deviation of their performance in the Tables

1 . We highlight the best performance for each dataset with bold font style and the second best results with a “*”. We can observe that in most cases, our method outperforms other methods.

training% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00%
Cora deepwalk
75.47*
1.01
78.29*
1.30
79.43
1.29
79.96
0.76
80.80
0.40
81.33
0.82
81.50
0.69
80.89
0.82
83.04
2.47
node2vec
75.52
1.22
77.81
1.51
78.91
0.94
79.72
0.44
80.86
0.67
80.70
1.32
80.37
1.22
80.59
1.03
81.78
1.41
MMDW
74.88
0.23
79.18
0.10
81.20*
0.11
82.19*
0.25
83.10*
0.41
84.62*
0.09
85.54*
0.44
85.27*
0.22
87.82*
0.37
planetoid
51.310
50.620
47.630
44.490
36.120
27.770
29.150
29.150
28.410
GCN
29.98
0.26
30.67
0.53
30.75
0.51
29.66
0.84
30.35
0.59
30.84
0.42
31.08
1.36
31.49
1.05
27.90
1.86
GESF
69.58
2.12
78.08
1.16
81.37
2.01
84.16
0.41
85.60
1.49
85.66
1.16
86.17
0.38
87.68
0.46
88.27
0.77
CiteSeer deepwalk
51.66
0.73
54.68
1.28
56.15
0.64
56.59
0.37
56.92
1.04
57.78
0.88
56.92
0.70
56.71
1.39
59.40
0.49
node2vec
51.97*
1.09
55.17*
1.22
56.28
0.41
55.82
1.03
57.44
0.54
57.49
1.07
57.42
1.38
56.89
1.87
57.95
1.07
MMDW
55.36
0.60
60.98
0.56
62.00
0.17
63.89
0.12
66.59
0.27
69.00*
0.15
69.72*
0.82
70.40*
0.93
70.64*
0.31
planetoid
41.530
41.620
40.190
37.880
32.910
29.060
20.820
21.110
21.390
GCN
21.00
0.05
19.07
1.40
20.93
0.28
20.91
0.44
19.52
0.72
20.20
0.68
20.54
2.91
20.90
1.82
18.78
1.49
GESF
46.50
2.61
53.26
1.62
59.68*
0.43
62.47*
1.32
66.41*
1.20
69.76
0.49
71.77
2.13
72.16
1.55
77.64
1.69
Pubmed deepwalk
76.98*
0.33
77.48
0.20
77.68
0.09
77.72
0.17
77.99
0.48
78.00
0.19
78.12
0.50
78.60
0.63
78.21
0.89
node2vec
77.47
0.20
77.89
0.16
78.09*
0.20
78.25*
0.25
78.53*
0.44
78.34*
0.35
78.29*
0.46
78.81*
0.61
78.56*
0.68
MMDW - - - - - - - - -
planetoid
39.310
43.400
40.500
40.410
40.320
40.440
40.820
40.510
41.130
GCN
39.43
0.38
39.34
0.45
39.56
0.38
39.46
0.59
39.10
0.45
39.08
0.42
39.31
0.19
39.48
0.84
39.07
0.93
GESF
73.19
0.44
77.70*
0.64
78.47
0.38
79.63
0.46
80.23
0.49
81.05
0.69
81.53
0.36
81.77
0.24
82.62
0.59
Wikipedia deepwalk
57.44*
0.67
62.04*
0.84
63.15*
0.77
64.77*
0.56
65.74*
0.50
66.63*
0.84
65.69
1.73
66.61
1.06
66.17
3.04
node2vec
57.65
1.24
61.73
0.56
62.31
1.42
64.23
1.08
64.94
0.007
66.24
1.06
65.63
1.67
65.99
1.36
66.50
3.66
MMDW
53.05
0.54
59.45
0.35
62.85
0.55
62.42
0.07
64.26
1.09
66.46
0.85
67.50*
0.48
67.37*
0.56
70.20*
1.22
planetoid
10.020
12.620
12.040
14.060
16.870
21.880
28.390
14.520
55.600
GCN
11.01
0.47
11.19
0.23
11.38
0.48
11.34
0.33
11.47
0.43
10.85
0.84
10.49
1.44
12.20
1.15
11.81
0.24
GESF
55.38
1.66
63.12
1.07
64.92
1.23
67.36
0.48
68.63
0.65
70.72
0.77
70.37
2.59
72.84
1.56
73.61
2.14
Email-eu deepwalk
61.11*
4.71
67.81*
2.59
71.15*
2.73
72.54*
1.79
75.30*
0.58
74.53*
1.38
75.48
1.92
75.72*
2.53
77.60*
3.56
node2vec
60.60
4.97
66.59
2.81
69.84
2.39
71.91
1.16
74.38
0.64
74.28
2.29
75.81*
1.76
75.62
2.20
76.00
2.68
MMDW
36.76
0.90
40.72
2.54
43.22
0.61
43.01
1.52
46.11
1.23
44.94
0.38
48.08
1.21
53.62
0.83
65.50
1.34
planetoid
46.470
58.760
55.460
54.300
52.880
50.120
49.670
55.940
57.430
GCN
0.63
0.13
0.58
0.07
0.80
0.08
0.66
0.00
0.73
0.41
0.66
0.14
4.58
5.64
0.50
0.00
0.50
0.71
GESF
64.97
6.80
68.78
2.18
72.31
2.56
73.74
0.67
75.43
0.99
76.70
1.42
77.30
2.77
79.27
2.50
81.67
1.53

MMDW takes over 64GB memory on the experiment of Pubmed. The results of MMDW are not available in this comparison.

Table 1: Accuracy (%) of Multi-class Classification Experiments

4.2 Comparison on heterogeneous graphs

We next conduct evaluation on heterogeneous graphs, where learned node embedding vectors are used for multi-label classification.

Datasets

The used datasets include

  • DBLP (ji2010graph) is an academic community network. Here we obtain a subset of the large network with two types of nodes, authors and key words from authors’ publications. The generated subgraph includes (authors) + (key words) vertexes. The link between a pair of author indicates the coauthor relationships, and the link between an author and a word means the word belongs to at least one publication of this author. There are 66,832 edges between pairs of authors and 338,210 edges between authors and words. Each node can have multiple labels out of four in total.

  • BlogCatalog (Wang-etal10) is a social media network with 55,814 users and according to the interests of users, they are classified into multiple overlapped groups. We take the five largest groups to evaluate the performance of methods. Users and tags are two types of nodes. The 5,413 tags are generated by users with their blogs as keywords. Therefore, tags are shared with different users and also have connections since some tags are generated from the same blogs. The number of edges between users, between tags and between users and tags are about 1.4M, 619K and 343K respectively.

Methods for Comparison

To illustrate the validation of the performance of GESF on heterogeneous graphs, we conduct the experiments on two stages: (1) comparing GESF with Deepwalk (perozzi2014deepwalk) and node2vec (grover2016node2vec) on the graphs by treating all nodes as the same type (GESF with in a homogeneous setting); (2) comparing GESF with the state-of-art heterogeneous graph embedding method, metapath2vec (dong2017metapath2vec), in a heterogeneous setting. The hyper-parameters of the method are fine-tuned and metapath2vec++ is chosen as the option for the comparison.

Experiment Setup and Results

For fair comparison, the dimension of representation vectors is chosen to be the same for all algorithms (the dimension is 64). We fine-tune the hyper-parameter for all of them. The details of GESF for multi-label case are as follows.

  • Supervised Component: Since it is a multi-label classification problem, each label can be treated as a binary classification problem. Therefore, we apply logistic regression for each label and for an arbitrary instance and the -th label , the supervised component is formulated as , where is the classifier for the -th label. Therefore, the supervised component in Eq. (5) is defined as and is the regularization component for , where is chosen to be .

  • Unsupervised Embedding Mapping Component: We design a two-layes NN with a 64-dimensional hidden layer for each type of nodes with the types of nodes in its neighborhood to formulate the mapping from embedding of neighbors to the embedding of the target node. We also form a two-layer 1-to-1 NN wth a 3 dimensional hidden layer to construct the matrix function for the adjacency matrix for the whole graph. We pre-process the matrix with an eigenvalue decomposition by preserving the highest 1000 eigenvalues in default. We denote the nodes to be classified as type 1 and the other type as type 2. The balance hyper-parameter is set to be [0.2, 200].

For the datasets DBLP and BlogCatalog, we carry out the experiments on each of them and compare the performance among all methods mentioned above. Since it is a multi-label classification task, we take f1-score(macro, micro) as the evaluation score for the comparison. The percentage of labeled samples is chosen from 10% to 90%, while the remaining samples are used for evaluation. We repeat all experiments for three times and report the mean and standard deviation of their performance in the Tables 2. We can observe that in most cases, GESF in heterogeneous setting has the best performance, while GESF in homogeneous setting achieves the second best results, demonstrating the validity of our proposed universal graph embedding mechanism.

training% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00%
DBLP (macro) Deepwalk
74.48*
0.34
74.87
0.14
74.95
0.15
75.10
0.22
75.07
0.22
75.44
0.22
75.33
0.33
74.75
0.31
75.36
0.73
Node2vec
73.37
0.24
73.94
0.11
74.00
0.18
74.25
0.23
74.06
0.31
74.52
0.21
74.52
0.35
74.32
0.26
74.55
0.57
Metapath2vec++
74.82
0.30
75.27
0.08
75.55
0.12
75.63
0.27
75.53
0.22
75.92
0.24
75.92
0.42
75.56
0.36
76.09
0.46
GESF
(Homogeneous)
72.51
0.91
76.89*
0.06
79.91*
0.09
82.14*
0.26
84.60*
0.42
86.34*
0.52
87.49*
0.31
88.21*
0.25
89.61*
0.58
GESF
(Heterogeneous)
74.06
0.37
78.43
0.61
81.00
0.19
83.18
0.16
84.95
0.22
86.91
0.54
88.30
0.29
89.18
0.17
90.37
0.38
DBLP (micro) deepwalk
76.65
0.25
77.03
0.19
77.15
0.15
77.21
0.15
77.20
0.17
77.60
0.20
77.44
0.31
76.87
0.35
77.54
0.72
node2vec
75.65
0.16
76.21
0.12
76.30
0.16
76.48
0.18
76.33
0.25
76.82
0.22
76.76
0.33
76.44
0.34
76.73
0.55
metapath2vec++
76.98*
0.21
77.38
0.10
77.66
0.08
77.70
0.21
77.61
0.18
78.04
0.18
77.95
0.39
77.54
0.36
78.02
0.47
GESF
(Homogeneous)
74.37
0.88
78.52*
0.09
81.47*
0.11
83.56*
0.28
85.81*
0.42
87.51*
0.54
88.44*
0.27
89.16*
0.22
90.54*
0.50
GESF
(Heterogeneous)
77.06
0.29
80.67
0.45
82.87
0.13
84.75
0.16
86.29
0.22
88.09
0.47
89.27
0.25
90.11
0.13
91.22
0.43
BlogCatalog (macro) deepwalk
45.13
0.68
44.64
0.21
44.52
0.42
44.64
0.23
44.32
0.21
44.36
0.46
44.78
0.48
44.33
0.62
44.49
0.86
node2vec
45.78
0.68
45.42
0.30
45.28
0.32
45.41
0.18
45.17
0.20
45.19
0.36
45.57
0.13
45.04
0.31
44.96
0.55
metapath2vec++
37.46
0.61
36.72
0.36
36.69
0.29
36.58
0.28
36.74
0.17
36.90
0.33
36.89
0.32
36.42
0.41
36.16
0.98
GESF
(Homogeneous)
47.63*
3.16
50.99*
0.09
51.70*
0.19
50.04*
1.90
50.60*
1.73
50.34*
0.55
52.20*
1.11
51.88*
1.08
51.36*
0.26
GESF
(Heterogeneous)
49.65
0.63
51.47
0.40
52.69
0.24
53.37
0.45
53.73
0.01
53.97
0.43
53.83
0.62
54.07
0.40
53.36
0.84
BlogCatalog (micro) deepwalk
47.93
0.48
47.36
0.15
47.25
0.45
47.30
0.19
47.07
0.16
47.09
0.48
47.39
0.49
47.02
0.72
47.27
0.82
node2vec
48.52
0.50
48.20
0.18
48.01
0.31
48.18
0.17
47.97
0.25
48.04
0.39
48.22
0.15
47.83
0.32
47.95
0.55
metapath2vec++
40.90
0.34
40.06
0.21
40.10
0.25
39.97
0.22
40.04
0.17
40.22
0.29
40.21
0.22
39.82
0.44
39.66
0.82
GESF
(Homogeneous)
50.75*
2.74
54.26*
0.12
55.08*
0.08
53.41*
1.92
53.88*
1.70
53.57*
0.50
55.36*
1.30
54.93*
1.07
55.03*
0.52
GESF
(Heterogeneous)
53.06
0.45
54.77
0.21
55.93
0.17
56.41
0.31
56.86
0.06
57.28
0.38
57.13
0.43
57.43
0.14
56.98
0.53
Table 2: F1-score (macro, micro) (%) of Multi-label Classification Experiments

5 Conclusion and Future Work

To summarize the whole work, GESF is proposed for a most general graph embedding solution with a theoretical guarantee for the effectiveness of the whole model and impressive experiment results compared to the state-of-art algorithms. For the future work, our model can be extended to more general case e.g, involving the node content or attributes into the embedding learning. One possible solution is to introduce the attributes as a special type of neighbors in the graph and we can utilize multiple set functions to map the embeddings within a more complex heterogeneous graph structure.

References