Graph Representation Ensemble Learning

09/06/2019 ∙ by Palash Goyal, et al. ∙ University of California, Irvine Siemens AG University of Southern California Stanford University 0

Representation learning on graphs has been gaining attention due to its wide applicability in predicting missing links, and classifying and recommending nodes. Most embedding methods aim to preserve certain properties of the original graph in the low dimensional space. However, real world graphs have a combination of several properties which are difficult to characterize and capture by a single approach. In this work, we introduce the problem of graph representation ensemble learning and provide a first of its kind framework to aggregate multiple graph embedding methods efficiently. We provide analysis of our framework and analyze -- theoretically and empirically -- the dependence between state-of-the-art embedding methods. We test our models on the node classification task on four real world graphs and show that proposed ensemble approaches can outperform the state-of-the-art methods by up to 8 We further show that the approach is even more beneficial for underrepresented classes providing an improvement of up to 12

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Graphs are used to represent data in various scientific fields including social sciences, biology and physics [Gehrke2003, freeman2000visualizing, theocharidis2009network, goyal2018recommending]

. Such representation allows researchers to gain insights about their problem. The most common tasks on graphs are link prediction, node classification and visualization. For example, link prediction in the social domain is used to determine friendships between people. Node classification in the biology domain is used to identify genes of proteins. Similarly, visualization is used to identify communities and structure of a graph. Recently, significant amount of work has been devoted to learning low dimensional representation of nodes in the graphs to allow the use of machine learning techniques to perform the tasks on graphs. Graph representation learning techniques embed each node in the network in a low dimensional space, and map link prediction and node classification in the network space to a nearest neighbor search and vector classification in the embedding space 

[goyal2017graph]. Several of these techniques have showed state-of-the-art performance on graph tasks [Grover2016, Ou2016].

Figure 1: Example motivating the need of ensemble learning. The graph represents interactions in classrooms (in red) and family trees of students (in purple). Such complex combination of community and structure requires combination of multiple embedding methods for accuracy.

State-of-the-art techniques in graph representation learning define some characteristics of the graphs they aim to capture and define an objective function to learn these features in the low-dimensional embedding. For example, HOPE [Ou2016]

preserves higher order proximity between nodes using the singular value decomposition of the similarity matrix. Similarly,

node2vec [Grover2016] captures the similarity of nodes using random walks on the graph. However, real world graphs do not follow a simple structure and can be layered with several categories of properties with complex interactions between them. It has been shown that no single method outperforms other methods on all network tasks and data sets [goyal2017graph]. We further illustrate this by the example in Figure 1 with a social network from two classrooms (represented by the pink color). We also show the family links of individual students in the classroom and represent family members outside the classroom (represented by the the blue color). Here, we consider the task of multi-label node classification with the classes classroom and role in family. This network is complex and has both community and structural properties. Methods such as HOPE [Ou2016] which preserve community can effectively classify the nodes into classrooms but perform poorly on family links which follow structure. On the other hand, structure preserving methods can classify the role of an individual student in the family but puts nodes in the same classroom into separate categories.

In this work, we introduce graph representation ensemble learning. Given a graph and a list of methods capturing various properties of the graph, we aim to learn a representation of nodes which can combine embeddings from each method such that it outperforms each of the constituent method in terms of prediction performance. Ensemble methods have been very successful in the field of machine learning. Methods such as AdaBoost [ratsch2001soft]

and Random Forest 

[liaw2002classification] have shown to be much more accurate than the individual classifiers that compose them. It has been shown that combining even the simplest but diverse classifiers can yield high performance. However, to the best of our knowledge, no work has focused on ensemble learning on graph representation learning.

Here, we formally introduce ensemble learning on graph representation methods and provide a framework for it. We first provide a motivation example to show that a single embedding approach is not enough for accurate predictions on a graph task and combining methods can yield improvement in performance. We then formalize the problem and define a method to measure correlations of embeddings obtained from various approaches. Then, we provide an upper bound on the correlation assuming certain properties of the graph. The upper bound is used to establish the utility of our framework. We focus our experiments on the task of node classification. We compare our method with the state-of-the-art embedding methods and show its performance on 4 real world networks including collaboration networks, social networks and biology networks. Our experiments show that the proposed ensemble approaches outperform the state-of-the-art methods by 8% on macro-F1. We further show that the approach is even more beneficial for underrepresented classes and get an improvement of 12%.

Overall, our paper makes the following contributions:

  1. We introduce ensemble learning in the field of graph representation learning.

  2. We propose a framework for ensemble learning given a variety of graph embedding methods.

  3. We provide a theoretical analysis of the proposed framework and show its utility theoretically and empirically.

  4. We demonstrate that combining multiple diverse methods through ensemble achieves state-of-the-art accuracy.

  5. We publish a library, GraphEnsembleLearning 111www.anonymous-url.com, implementing the framework for graph ensemble learning.

Related Work

Methods for graph representation learning (aka graph embedding) typically vary in properties preserved by the approach and the objective function used to capture these properties. Based on the properties, embedding methods can be divided into two broad categories: (i) community preserving, and (ii) structure preserving. Community preserving approaches aim to capture the distances in the original graph in the embedding space. Within this category, methods vary on the level of distances captured. For example, Graph Factorization [Ahmed2013] and Laplacian Eigenmaps [belkin2001laplacian] preserve shorter distances (i.e., low order proximity) in the graph, whereas more recent methods such as Higher Order Proximity Embedding (HOPE) [Ou2016] and GraRep [cao2016deep] capture longer distances (i.e., high order proximity). Structure preserving methods aim to understand the structural similarity between nodes and capture role of each node. node2vec [Grover2016]

uses a mixture of breadth first and depth first search for this. Deep learning methods such as Structural Deep Network Embedding (SDNE) 

[Wang2016] and Deep Network Graph Representation (DNGR) [cao2016deep]

use deep autoencoders to preserve distance and structure.

Based on the objective function, embedding methods can be broadly divided into two categories: (i) matrix factorization, and (ii) deep learning methods. Matrix factorization techniques represent graph as a similarity matrix and decompose it to get the embedding. Graph Factorization and HOPE use adjacency matrix and higher order proximity matrix for this. Deep learning methods, on the other hand, use multiple non-linear layers to capture the underlying manifold of the interactions between nodes. SDNE, DNGR and VGAE [kipf2016variational] are examples of these methods. Some other recent approaches use graph convolutional networks to learn graph structure [kipf2016semi, bruna2013spectral, henaff2015deep].

In machine learning, ensemble approaches [zhou2012ensemble] are algorithms which combine the outputs of a set of classifiers. It has been shown that ensemble of classifiers are more accurate than any of its individual members if the classifiers are accurate and diverse [hansen1990neural]. There are several ways individual classifiers can be combined. Broadly, they can be divided into four categories: (i) Bayesian voting, (ii) random selection of training examples, (iii) random selection of input features, and (iv) random selection of output labels. Bayesian voting methods combine the predictions from the classifiers weighted by their confidence. On the other hand, methods such as Random Forest [liaw2002classification] and Adaboost [ratsch2001soft] divide the training data into multiple subsets, train classifiers on each individual subset, and combine the output. The third category of approaches divide the input set of features available to the learning algorithm [opitz1999feature]. Finally, for data with a large number of output labels, some methods divide the set of output labels and learn individual classifiers to learn their corresponding label subset [ricci1997extending].

In this work, we extend the concept of ensemble learning to graph representation learning and get insights into the correlations between various graph embedding methods. Based on this, we propose ensemble methods for them and show the improvement in performance on node classification task.

Motivating Example

This section presents a motivational case study to highlight the effectiveness of the proposed graph representation ensemble learning on a synthetic dataset. We present the analysis by utilizing four synthetic graphs: (a) Barabasi-Albert, (b) Random Geometry (c) Stochastic Block Model, and (d) Watts Strogatz graph (see Figure 2). Each of these graphs exhibits a specific structural property. We use a spring layout to further elucidate the difference in the structural properties of the four different synthetic graphs. The Barabasi-Albert graph makes new connections through preferential attachment using the degree of the existing nodes. Watts Strogatz graph generates a ring of graphs with the addition of edges of each nodes with its neighbors. Stochastic Block Model creates community clusters by preserving the community structure. The Random Geometry graph generates nodes and add edges by utilizing the spatial proximity among the nodes as a measure.

Figure 2: Four synthetic graph with different graph properties (with node color representing different degrees) initially drawn using the spring layout.

We have generated each of the synthetic graphs with 100 nodes each. As mentioned earlier, different embedding algorithms such as Graph Factorization, Laplacian Eigenmaps, High Order Proximity Preserving, Structural Deep Network Embedding and Node2vec capture various characteristics of the graphs. Hence, a single embedding algorithm may not be able to capture the entire complex interaction. To test this hypothesis we have created two node labels for the synthetic graph. The first label is based on the degree of the graph, whereas the second label is based on the closeness centrality measure [freeman1978centrality] of the graph. The centrality values are binned and the respective bins are used as node labels.

(a) Original layout showing added edges
(b) New spring layout
Figure 5: Synthetic graphs merged together (different colors represents the updated node degree).

To simulate the interaction between different synthetic graphs, we have randomly selected node pairs (equal to 40% of the total number of nodes) and added edges between them (with a probability threshold of 0.3). The addition of the edges are shown in Figure

5.

Methods Dimensions Macro-F1
gf 128 0.127
lap 32 0.055
hope 128 0.157
sdne 64 0.177
node2vec 128 0.128
sdne, node2vec hope,gf,lap 128,64,32,64,64 0.183(3.4%)
Table 1: Ensemble methods on motivating example with degree labels.

.

The result of the node classification for the degree labels of the merged synthetic graph is shown in Table 1. The embedding obtained from the state-of-the-art methods and the ensemble approach is utilized to predict the degree labels. It can be seen that compared to the state-of-the-art algorithms, the ensemble based approach is able to achieve 3.4% improvement in macro F1 score. Although not significant, it is still able to improve the classification accuracy.

Methods Dimensions Macro-F1
gf 64 0.108
lap 32 0.064
hope 64 0.090
sdne 128 0.191
node2vec 128 0.142
sdne, node2vec gf,hope,lap 128,64,64,32,128 0.215(12.6%)
Table 2: Ensemble methods on motivating example with centrality labels.

The classification accuracy results for classifying the centrality measures are shown in Table 2. For this label, it can be observed that the ensemble based method is able to achieve 12.6% improvement in macro F1-score. Both the macro F1-score proves that the ensemble based approach are able to utilize the best characteristic of different graph embedding algorithm’s ability to capture the structure of the network.

Graph Representation Ensemble Learning

In this section, we define the notations and provide the graph ensemble problem statement. We then explain multiple variations of deep learning models capable of capturing temporal patterns in dynamic graphs. Finally, we design the loss functions and optimization approach.

Notations

We define a directed graph as , where is the vertex set and E is the directed edge set. The adjacency matrix is denoted as . We define the embedding matrix from a method as . The embedding matrix can be used to reconstruct the distance between all pairwise nodes in the graph. We denote this as , in which .

Problem Statement

In this paper, we introduce the problem of ensemble learning on graph representation learning. We define it as follows: Given a set of embedding methods with corresponding embeddings for a graph as and errors on a graph task , a graph ensemble learning approach aims to learn an embedding with error such that .

Measuring Graph Embedding Diversity

Different graph embedding techniques vary in the types of properties of the graphs preserved by them and the model defined. Broadly, embedding techniques can be divided into: (i) structure preserving, and (ii) community preserving models, defined as follows:

Definition 1.

(Community Preserving Models) It aims to embed nodes with lower distance between them closer in the embedding space.