Introduction
Graphs are used to represent data in various scientific fields including social sciences, biology and physics [Gehrke2003, freeman2000visualizing, theocharidis2009network, goyal2018recommending]
. Such representation allows researchers to gain insights about their problem. The most common tasks on graphs are link prediction, node classification and visualization. For example, link prediction in the social domain is used to determine friendships between people. Node classification in the biology domain is used to identify genes of proteins. Similarly, visualization is used to identify communities and structure of a graph. Recently, significant amount of work has been devoted to learning low dimensional representation of nodes in the graphs to allow the use of machine learning techniques to perform the tasks on graphs. Graph representation learning techniques embed each node in the network in a low dimensional space, and map link prediction and node classification in the network space to a nearest neighbor search and vector classification in the embedding space
[goyal2017graph]. Several of these techniques have showed stateoftheart performance on graph tasks [Grover2016, Ou2016].Stateoftheart techniques in graph representation learning define some characteristics of the graphs they aim to capture and define an objective function to learn these features in the lowdimensional embedding. For example, HOPE [Ou2016]
preserves higher order proximity between nodes using the singular value decomposition of the similarity matrix. Similarly,
node2vec [Grover2016] captures the similarity of nodes using random walks on the graph. However, real world graphs do not follow a simple structure and can be layered with several categories of properties with complex interactions between them. It has been shown that no single method outperforms other methods on all network tasks and data sets [goyal2017graph]. We further illustrate this by the example in Figure 1 with a social network from two classrooms (represented by the pink color). We also show the family links of individual students in the classroom and represent family members outside the classroom (represented by the the blue color). Here, we consider the task of multilabel node classification with the classes classroom and role in family. This network is complex and has both community and structural properties. Methods such as HOPE [Ou2016] which preserve community can effectively classify the nodes into classrooms but perform poorly on family links which follow structure. On the other hand, structure preserving methods can classify the role of an individual student in the family but puts nodes in the same classroom into separate categories.In this work, we introduce graph representation ensemble learning. Given a graph and a list of methods capturing various properties of the graph, we aim to learn a representation of nodes which can combine embeddings from each method such that it outperforms each of the constituent method in terms of prediction performance. Ensemble methods have been very successful in the field of machine learning. Methods such as AdaBoost [ratsch2001soft]
and Random Forest
[liaw2002classification] have shown to be much more accurate than the individual classifiers that compose them. It has been shown that combining even the simplest but diverse classifiers can yield high performance. However, to the best of our knowledge, no work has focused on ensemble learning on graph representation learning.Here, we formally introduce ensemble learning on graph representation methods and provide a framework for it. We first provide a motivation example to show that a single embedding approach is not enough for accurate predictions on a graph task and combining methods can yield improvement in performance. We then formalize the problem and define a method to measure correlations of embeddings obtained from various approaches. Then, we provide an upper bound on the correlation assuming certain properties of the graph. The upper bound is used to establish the utility of our framework. We focus our experiments on the task of node classification. We compare our method with the stateoftheart embedding methods and show its performance on 4 real world networks including collaboration networks, social networks and biology networks. Our experiments show that the proposed ensemble approaches outperform the stateoftheart methods by 8% on macroF1. We further show that the approach is even more beneficial for underrepresented classes and get an improvement of 12%.
Overall, our paper makes the following contributions:

We introduce ensemble learning in the field of graph representation learning.

We propose a framework for ensemble learning given a variety of graph embedding methods.

We provide a theoretical analysis of the proposed framework and show its utility theoretically and empirically.

We demonstrate that combining multiple diverse methods through ensemble achieves stateoftheart accuracy.

We publish a library, GraphEnsembleLearning ^{1}^{1}1www.anonymousurl.com, implementing the framework for graph ensemble learning.
Related Work
Methods for graph representation learning (aka graph embedding) typically vary in properties preserved by the approach and the objective function used to capture these properties. Based on the properties, embedding methods can be divided into two broad categories: (i) community preserving, and (ii) structure preserving. Community preserving approaches aim to capture the distances in the original graph in the embedding space. Within this category, methods vary on the level of distances captured. For example, Graph Factorization [Ahmed2013] and Laplacian Eigenmaps [belkin2001laplacian] preserve shorter distances (i.e., low order proximity) in the graph, whereas more recent methods such as Higher Order Proximity Embedding (HOPE) [Ou2016] and GraRep [cao2016deep] capture longer distances (i.e., high order proximity). Structure preserving methods aim to understand the structural similarity between nodes and capture role of each node. node2vec [Grover2016]
uses a mixture of breadth first and depth first search for this. Deep learning methods such as Structural Deep Network Embedding (SDNE)
[Wang2016] and Deep Network Graph Representation (DNGR) [cao2016deep]use deep autoencoders to preserve distance and structure.
Based on the objective function, embedding methods can be broadly divided into two categories: (i) matrix factorization, and (ii) deep learning methods. Matrix factorization techniques represent graph as a similarity matrix and decompose it to get the embedding. Graph Factorization and HOPE use adjacency matrix and higher order proximity matrix for this. Deep learning methods, on the other hand, use multiple nonlinear layers to capture the underlying manifold of the interactions between nodes. SDNE, DNGR and VGAE [kipf2016variational] are examples of these methods. Some other recent approaches use graph convolutional networks to learn graph structure [kipf2016semi, bruna2013spectral, henaff2015deep].
In machine learning, ensemble approaches [zhou2012ensemble] are algorithms which combine the outputs of a set of classifiers. It has been shown that ensemble of classifiers are more accurate than any of its individual members if the classifiers are accurate and diverse [hansen1990neural]. There are several ways individual classifiers can be combined. Broadly, they can be divided into four categories: (i) Bayesian voting, (ii) random selection of training examples, (iii) random selection of input features, and (iv) random selection of output labels. Bayesian voting methods combine the predictions from the classifiers weighted by their confidence. On the other hand, methods such as Random Forest [liaw2002classification] and Adaboost [ratsch2001soft] divide the training data into multiple subsets, train classifiers on each individual subset, and combine the output. The third category of approaches divide the input set of features available to the learning algorithm [opitz1999feature]. Finally, for data with a large number of output labels, some methods divide the set of output labels and learn individual classifiers to learn their corresponding label subset [ricci1997extending].
In this work, we extend the concept of ensemble learning to graph representation learning and get insights into the correlations between various graph embedding methods. Based on this, we propose ensemble methods for them and show the improvement in performance on node classification task.
Motivating Example
This section presents a motivational case study to highlight the effectiveness of the proposed graph representation ensemble learning on a synthetic dataset. We present the analysis by utilizing four synthetic graphs: (a) BarabasiAlbert, (b) Random Geometry (c) Stochastic Block Model, and (d) Watts Strogatz graph (see Figure 2). Each of these graphs exhibits a specific structural property. We use a spring layout to further elucidate the difference in the structural properties of the four different synthetic graphs. The BarabasiAlbert graph makes new connections through preferential attachment using the degree of the existing nodes. Watts Strogatz graph generates a ring of graphs with the addition of edges of each nodes with its neighbors. Stochastic Block Model creates community clusters by preserving the community structure. The Random Geometry graph generates nodes and add edges by utilizing the spatial proximity among the nodes as a measure.
We have generated each of the synthetic graphs with 100 nodes each. As mentioned earlier, different embedding algorithms such as Graph Factorization, Laplacian Eigenmaps, High Order Proximity Preserving, Structural Deep Network Embedding and Node2vec capture various characteristics of the graphs. Hence, a single embedding algorithm may not be able to capture the entire complex interaction. To test this hypothesis we have created two node labels for the synthetic graph. The first label is based on the degree of the graph, whereas the second label is based on the closeness centrality measure [freeman1978centrality] of the graph. The centrality values are binned and the respective bins are used as node labels.
To simulate the interaction between different synthetic graphs, we have randomly selected node pairs (equal to 40% of the total number of nodes) and added edges between them (with a probability threshold of 0.3). The addition of the edges are shown in Figure
5.Methods  Dimensions  MacroF1 

gf  128  0.127 
lap  32  0.055 
hope  128  0.157 
sdne  64  0.177 
node2vec  128  0.128 
sdne, node2vec hope,gf,lap  128,64,32,64,64  0.183(3.4%) 
.
The result of the node classification for the degree labels of the merged synthetic graph is shown in Table 1. The embedding obtained from the stateoftheart methods and the ensemble approach is utilized to predict the degree labels. It can be seen that compared to the stateoftheart algorithms, the ensemble based approach is able to achieve 3.4% improvement in macro F1 score. Although not significant, it is still able to improve the classification accuracy.
Methods  Dimensions  MacroF1 

gf  64  0.108 
lap  32  0.064 
hope  64  0.090 
sdne  128  0.191 
node2vec  128  0.142 
sdne, node2vec gf,hope,lap  128,64,64,32,128  0.215(12.6%) 
The classification accuracy results for classifying the centrality measures are shown in Table 2. For this label, it can be observed that the ensemble based method is able to achieve 12.6% improvement in macro F1score. Both the macro F1score proves that the ensemble based approach are able to utilize the best characteristic of different graph embedding algorithm’s ability to capture the structure of the network.
Graph Representation Ensemble Learning
In this section, we define the notations and provide the graph ensemble problem statement. We then explain multiple variations of deep learning models capable of capturing temporal patterns in dynamic graphs. Finally, we design the loss functions and optimization approach.
Notations
We define a directed graph as , where is the vertex set and E is the directed edge set. The adjacency matrix is denoted as . We define the embedding matrix from a method as . The embedding matrix can be used to reconstruct the distance between all pairwise nodes in the graph. We denote this as , in which .
Problem Statement
In this paper, we introduce the problem of ensemble learning on graph representation learning. We define it as follows: Given a set of embedding methods with corresponding embeddings for a graph as and errors on a graph task , a graph ensemble learning approach aims to learn an embedding with error such that .
Measuring Graph Embedding Diversity
Different graph embedding techniques vary in the types of properties of the graphs preserved by them and the model defined. Broadly, embedding techniques can be divided into: (i) structure preserving, and (ii) community preserving models, defined as follows:
Definition 1.
(Community Preserving Models) It aims to embed nodes with lower distance between them closer in the embedding space.