AGE
Active Graph Embedding
view repo
Graph embedding provides an efficient solution for graph analysis by converting the graph into a lowdimensional space which preserves the structure information. In contrast to the graph structure data, the i.i.d. node embedding can be processed efficiently in terms of both time and space. Current semisupervised graph embedding algorithms assume the labelled nodes are given, which may not be always true in the real world. While manually label all training data is inapplicable, how to select the subset of training data to label so as to maximize the graph analysis task performance is of great importance. This motivates our proposed active graph embedding (AGE) framework, in which we design a general active learning query strategy for any semisupervised graph embedding algorithm. AGE selects the most informative nodes as the training labelled nodes based on the graphical information (i.e., node centrality) as well as the learnt node embedding (i.e., node classification uncertainty and node embedding representativeness). Different query criteria are combined with the timesensitive parameters which shift the focus from graph based query criteria to embedding based criteria as the learning progresses. Experiments have been conducted on three public data sets and the results verified the effectiveness of each component of our query strategy and the power of combining them using timesensitive parameters. Our code is available online at: https://github.com/vwz/AGE.
READ FULL TEXT VIEW PDF
Active learning (AL) on attributed graphs has received increasing attent...
read it
Graph convolution networks (GCN) have emerged as the leading method to
c...
read it
Semisupervised graph embedding methods represented by graph convolution...
read it
Heterogeneous network embedding (HNE) is a challenging task due to the
d...
read it
Voluminous works have been implemented to exploit contentenhanced netwo...
read it
Node classification on graph data is an important task on many practical...
read it
We propose a novel generic sequential Graph Convolution Network (GCN)
tr...
read it
Active Graph Embedding
Nowadays graph (or network) is becoming more and more popular in many areas, e.g., citation graph in the research area, social graph in social media networks and so on. Directly analysing these graphs may be both time consuming and space inefficient. One fundamental and effective solution is graph embedding, which embeds a graph into a lowdimensional space that preserves the graph structure and other inherent information. With such kind of node representations, the graph analytic tasks, such as node classification, node clustering, link prediction, etc., can be conducted efficiently in both time and space (Ou et al., 2016).
The graph embedding algorithms can be divided into two categories based on whether the label information is involved in the training: unsupervised and semisupervised. In this work, we focus on the latter. Due to the success of deep learning in different areas, the latest semisupervised graph embedding algorithms (e.g.,
(Yang et al., 2016; Kipf and Welling, 2017)) devote to design a neural network model to embed the nodes. However, these methods assume the training labelled data is given which may not be always true in the real world. Take Twitter as an example, for a twitter network graph in which each node represents a user and a link represents the following relationships between two users, the node label can be different user attributes such as occupation, interest and so on. Manually labelling all users for training is inapplicable. To embed such a Twitter graph, we need a certain amount of users with label information for training. Obviously, different sets of training labelled nodes will lead to different graph embedding performance. Given a labelling budget, how to select the training labelled nodes so as to maximize the final performance is thus of great importance. Active learning (AL) is proposed to solve such kind of problems.
Given a labelling budget, our objective is to design an active graph embedding framework which optimizes the performance of semisupervised graph embedding algorithms by actively selecting the training labelled nodes. There are two main challenges for active graph embedding. First, different from traditional AL algorithms which are designed for independent and identically distributed data, the active graph embedding should consider the graph structure when select the “informative” nodes to label. Second, there are two major components in an active graph embedding framework: the active learning component and the graph embedding component. How to combine these two processes to make them reinforce each other so as to maximize the performance is nontrivial.
In this paper, we proposed an effective Active Graph E
mbedding (AGE) framework which tackles the above mentioned challenges. Specifically, we consider two popular AL query criteria, uncertainty and representativeness, to select the most informative node to label. For uncertainty, an information entropy score is calculated. For representativeness, in addition to the information density score which is widely adopted in most AL algorithms, we also propose a graph centrality score which calculates the PageRank centrality of each node to evaluate its representativeness. All the three informativeness scores are combined linearly. The query of the active learning process is raised at the end of every epoch of the graph embedding training process. As the process progresses, the graph embedding will generate more and more accurate node embedding as more informative labelled nodes are provided for training. Meanwhile, with the more accurate node embedding, the AL query strategy is able to find the more and more informative nodes because both the uncertainty and information density scores are calculated based on the embedding results. Moreover, considering these two scores are based on node embeddings which may be not correct at the beginning of the training, inspired by
(Zhang et al., 2017) we combine the three AL scores with the timesensitive parameters which give higher weight to graph centrality at the beginning and shift the focus to the other two scores as the training process progresses. In this paper, we use GCN as an example graph embedding algorithm. Our AGE framework can be directly applied on any other graph embedding algorithms. More details of AGE framework and our proposed AL criteria are introduced in Section 4.The contributions of this paper are summarized as below:
To the best of our knowledge, we are the first to propose an active graph embedding framework which optimizes the graph embedding performance by actively selecting the labelled training nodes.
We defines three node informativeness criteria including uncertainty, information density and graph centrality, and extensively study the impact of them on active graph embedding performance.
We conduct comprehensive experiments on three public citation datasets. The results prove the superiority of our proposed AGE framework over other AL algorithms and a pipeline baseline.
The rest of this paper is organized as follows. We review the literature related to graph embedding and active learning in Section 2. Section 3 introduces the example graph embedding algorithm GCN and the problem to solve in this paper. Our proposed active graph embedding algorithm is elaborated in Section 4, followed by the experiment results analysis in Section 5. Finally, we conclude the paper in Section 6.
In this paper, we focus on actively selecting labelled training instances so as to maximize the graph embedding performance with limited labelling budget. In this section, we review the literature in two relevant topics: graph embedding and active learning.
Graph embedding aims to embed the graph into lowdimensional space which preserves the graph structure information. The earlier studies (Tenenbaum et al., 2000; Belkin and Niyogi, 2002; Roweis and Saul, 2000)
tend to first construct the affinity graph based on feature similarity and then solve the eigenvector of the affinity graph as the node embeddings. These methods usually suffer from the high computational cost. Recently, some graph embedding studies (e.g., LINE
(Tang et al., 2015), GraRep (Cao et al., 2015)) carefully designed the objective functions to preserve the firstorder, secondorder and/or highorder proximities. However, both LINE and GraRep are suboptimal as the embeddings are separately learnt for different step neighbours. Motivated by the recent success of deep learning, some researchers start to learn the node embedding using the deep models. A part of them use truncated random walk (e.g., DeepWalk(Perozzi et al., 2014)) or biased random walk (e.g., Node2vec (Grover and Leskovec, 2016)) to sample paths from graphs, and then apply skipgram on the sampled paths so as to preserve the secondorder proximities. In contrast to those who adopt skipgram from language model, SDNE (Wang et al., 2016)proposed a new deep model which jointly optimize firstorder and secondorder proximity and address the high nonlinearity challenge. We aim to actively select the training labelled data so as to optimize the learnt embeddings given a fixed labelling budget. All the above methods are not applicable to our settings as they are unsupervised learning. The graphbased semisupervised learning usually defines the loss function as a weighted sum of the supervised loss over labelled instances (denoted as
) and a graph Laplacian regularization term (denoted as ). is the standard supervised loss function such as squared loss, log loss or hinge loss. Various have been designed in the literature to incur a large penalty when connected nodes with large edge weight are predicted to have different labels (Talukdar and Crammer, 2009; Belkin et al., 2006), or different embeddings (Weston et al., 2012). Planetoid (Yang et al., 2016)propose a feedforward neural network framework and format
as the log loss of predicting the context using the node embedding. However, the graph Laplacian regularization relies on the assumption that connected nodes in the graph are likely to share the same labels. This assumption may be not always true as graph edges could indicate other information in addition to node similarity. In observation of this, (Kipf and Welling, 2017) proposed GCN to encode the graph structure directly using a neural network model and train on a supervised loss function for all nodes, thus avoiding explicit graphbased regularization in the loss function. GCN has shown its superiority by outperforming the other stateoftheart algorithms in terms of node classification. We adopt GCN as an example graph embedding framework in this work and will introduce more about it in Sect. 3.1.Categories  Subcategories  Main idea  Strong points  Weaknesses 

Uncertainty Sampling (Settles and Craven, 2008)  Label the most uncertain instances  
QuerybyCommittee (Bilgic et al., 2010)  Label the instances that multiple classifiers 
Simple and fast approaches  May find the noisy and  
Heterogeneity based  disagree most  to identify the most  unrepresentative regions  
Expected Model  Label the instances which are most different  unknown regions  
Change (Zhang et al., 2017)  from the current known model  
Expected Error  Minimize label uncertainty of the remaining  Directly optimize the  Too expensive to compute  
Performance based  Reduction (Guo and Greiner, 2007)  unlabelled instances  model performance  all unlabelled data 
Expected Variance 
The variance typically reduces when the  Efficiently express model  Only applicable to limited models, e.g.,  
Reduction (Schein and Ungar, 2007)  error of the model reduces  variances in closed form  neural networks, mixture models  
Representativeness  Label the instance that can represent the  Avoid outlier by the 
Not informative enough, usually  
based (Li and Guo, 2013)  underlying distribution of training instances  representativeness component  combined with other criteria 
In many domains, labelled data is often expensive to obtain. Active learning (AL) is thus proposed to train a classifier that accurately predicts the labels of new instances while requesting as few training labels as possible (Aggarwal et al., 2014). An AL framework usually consists of two primary components: a query system which picks an instance from the training data to query its label and an oracle which labels the queried instance. Researchers have proposed various algorithms to optimize the training performance given a fixed labelling budget. Based on the query strategy, the majority work can be divided into three categories (Aggarwal et al., 2014): the heterogeneity based, the performance based and the representativeness based. The detailed comparisons between them are listed in Table 1. Generally speaking, different implementations of the three major AL categories can be proposed for different classification algorithms. There does not exist an “optimal” AL solution for all classification tasks. Our active graph embedding is distinct from the most AL algorithms in two ways: the training instances are in graph structure rather than i.i.d., and representation of training nodes are learnt during the classifier training process instead of being given as the fixed input. On one hand, several attempts have been made for AL in graph (Bilgic et al., 2010; Gu et al., 2013), in which the graph structure are utilized to train the classifiers and/or calculate the AL query scores when selecting the node to label. Compared with them, we utilize not only the graph structure but also the embeddings learnt during training process to select the informative nodes to label. Moreover, they do not learn any node embeddings but just simply graph classification. On the other hand, only limited work (i.e., (Zhang et al., 2017)) has been done to consider AL strategies for instance representation learning algorithm. In (Zhang et al., 2017), the authors proposed to select the examples that are likely to affect the representationlevel parameters (embeddings) for text classification with embeddings. Their algorithm is specifically designed to the classification model which has embeddings as the model parameters and thus is not applicable to more general graph embedding work such as GCN in our framework.
The notations used in this paper are summarized in Table 2. Next we introduce more about the semisupervised graph embedding algorithm we adopted in our framework, i.e., GCN (Kipf and Welling, 2017).
Notations  Descriptions 

=  Graph with nodes set and edges set 
,  The adjacent matrix and degree matrix of 
Node feature matrix, each row corresponds to the feature  
vector of a node in  
,  Number of nodes, classes in 
Feature dimensionality of a node in  
,  The set of labelled and unlabelled nodes 
,  The number of labelled and unlabelled nodes 
The indicator of node containing label  
The probability of node containing label predicted 

by GCN  
,  The matrix of activations and the trainable weight matrix in 
the th layer of GCN. 
Given a graph = with nodes , edges , an adjacency matrix (binary or weighted), a degree matrix , a node feature matrix (i.e., dimensional feature vector for nodes), label matrix for labelled nodes (i.e., indicates node has label ), (Kipf and Welling, 2017) proposes a multilayer Graph Convolutional Network (GCN) for semisupervised node classification on . Unlike traditional graphbased semisupervised learning which assumes that the connected nodes are likely to share the same labels, GCN avoids such kind of explicit graphbased regularization in the loss function by encoding the graph structure directly using their proposed neural network model.
Specifically, the layerwise propagation rule of GCN is defined as:
(1) 
where is the adjacency matrix of with added selfconnections.
is the identity matrix and
. The active function
is defined as for all layers expect for the output layer in their work. and denotes trainable weight matrix and the matrix of activations in the th layer respectively. .To train a GCN model () with layers, is first calculated in the preprocessing step. The th layer (the output layer) takes the following form:
(2) 
where is derived from Eq. 1, and is a hiddentooutput weight matrix. The active function in the last layer is with , and it is applied rowwise.
Finally the supervised loss function is defined as the crossentropy error over all labelled nodes:
(3) 
where is the set of indices for labelled nodes. is derived from Eq. 2.
The input of the active graph embedding problem includes a graph = , along with its adjacency matrix , its degree matrix , its node feature matrix , an oracle to label a query node, and a labelling budget . Among the nodes , nodes are initially labelled. Denote the set of labelled nodes as , and the unlabelled nodes set as . The objective of this work is to optimize the performance of the semisupervised graph embedding algorithm (we use GCN introduced in Sect. 3.1 as an example in this work) by designing an active learning query strategy to select nodes from for the oracle to label and add to for graph embedding training.
Given a fixed labelling budget, we propose an Active Graph Embedding (AGE) method to actively select the labelled training instances for optimizing graph embedding performance. Next, we introduce the details of our proposed AL strategy for graph embedding methods. Note that our proposed AL strategy can be applied to any semisupervised graph embedding algorithm. In this work, we adopt the stateoftheart algorithm GCN as an example graph embedding method for illustration.
The framework of our proposed Active Graph Embedding (AGE) method is illustrated in Fig. 1. AGE takes a graph = and a small set of initial labelled nodes as input. GCN (Kipf and Welling, 2017) is then applied on the training data for graph embedding and node classification. At the end of every epoch of GCN, AGE will check whether the labelling budget is reached. If yes, another training epoch of GCN will be processed directly. Otherwise, our proposed AL query strategy will pick one(or a few in the batch mode) best candidate(s) from all unlabelled nodes (), ask the oracle to label it, and put it in the labelled nodes set (). Then another epoch of GCN will be trained on the updated training data with the newly added labelled node. This procedure will be repeated until GCN converges.
Here a question arises: what is the best candidate node to label at each iteration? We follow most AL studies and select two widely adopted AL query criteria, i.e., uncertainty (e.g., (Settles and Craven, 2008)) and representativeness (e.g., (Li and Guo, 2013)), in our proposed AL query strategy. Next, we introduce how we define the uncertainty and representativeness in this work, as well as how we combine these two criteria in one objective function.
As one of the most commonly used AL query strategy, uncertainty sampling queries the labels for the nodes which current model is least certain with represent to classification prediction. In this paper, we use the general uncertain measure, i.e., information entropy , as our informativeness metric. The information entropy of a candidate node is calculated as:
(4) 
where is the probability of node belonging to class predicted by GCN, i.e. in Eq. 3. The larger is, the more uncertain current model is regarding to .
One drawback of uncertainty sampling based AL query strategy is that it may find the noisy and unrepresentative region as it tries to explore the most unknown regions of the data (Aggarwal et al., 2014). Consequently, a representativenessbased AL criterion is often considered to be combined with the uncertainty sampling based method so as to find the most informative node to label. We consider two representativeness measurements: the information density and the graph centrality . The first measurement aims to find a node which is “representative” of the underlying data distribution in the embedded space, while the second metric measures the nodes by their centralities in the graph. Next, we introduce these two methods one by one.
To find the nodes that locate dense regions of the training nodes in the embedded space, we calculate the density score for each candidate node by first applying Kmeans on the embeddings of all unlabelled nodes then compute the Euclidean distance between each node and its cluster center (i.e., the average distance to the nodes in the same cluster). The density score of node is calculated by converting the distance value to similarity scores:
(5) 
where is the Euclidean distance measurement, is the embedding of node and is the center of the cluster that belongs to. The larger is, the more representative is in the embedding space.
One characteristic that makes our AGE different from most other AL algorithms is that the input instances are not i.i.d., but connected with links. The graphical structure is then utilized to calculate another node representativeness score based on graph centrality. The graph centrality was first proposed in (Newman, 2010) to reflect the node’s sociological origin in social network analysis. Various metrics have been proposed to measure the centrality of a node, from the classic methods (e.g., degree centrality, closeness centrality(Freeman, 1978)) to the more recent eigenvectorbased metrics (e.g., PageRank centrality (Rodriguez, 2008)). In this work, we adopt PageRank centrality to calculate because it outperforms others as shown in Sect. 5.2. The PageRank centrality of a candidate node is calculated as:
(6) 
where is the damping parameter.
The scores derived from different criteria are on an incomparable scale, thus we convert them into percentiles as in (Zhang et al., 2017). Denote as the percentile of nodes in which have smaller scores than node in terms of metric . Then the objective function of our proposed AGE to select the node for labelling is defined as:
(7) 
where . Our objective is to select a node , which maximize the above objective function (Eq. 7).
Instead of predefining the parameters , and , we follow the settings in (Zhang et al., 2017)
to draw the parameters as the timesensitive random variables. More specifically, since
and are calculated based on the GCN outputs (i.e., the node classification predictions and the node embeddings ), the parameters of these two metric (i.e., and ) should be smaller at the beginning of the AL iterations because the outputs may be not very accurate in the first few epochs. In contrast, (i.e., the parameter for which purely relies on graph structure) should be larger. As learning progresses, GCN runs more epochs with more labelled training data, the model can now pay more attention to and. This timesensitive parameter is realized by drawing the parameters from a beta distribution, e.g.
. increases as the number of AL iterations increases, which will draw with larger expectation. In contrast, , can be drawn from a beta distribution in which decreases as AL iterations increases. Finally, the , and drawn at timestamp are normalized to sum up to 1.We design experiments to: 1) verify the model design of our proposed AGE framework; 2) compare AGE with the other AL baselines. For the first objective, we verify the design of AGE from two perspectives: the adoption of PageRank centrality as graph centrality metric and the timesensitive parameters to combine different AL criteria. For the second objective, we compare AGE with different active learning baselines.
We first introduce our the experimental setup, followed by the experimental results analysis.
We follow the experimental setup in the stateoftheart semisupervised graph embedding methods (Kipf and Welling, 2017; Yang et al., 2016).
All experiments are conducted on three public citation network datasets – Citeseer, Cora and Pubmed. The three datasets contain a list of documents, each of which is represented by sparse bagofwords feature vectors. The documents are connected by citation links, which are treated as undirected and unweighted edges. The statistics of these datasets are summarized in Table 3.
Dataset  Nodes  Edges  Classes  Feature Dim.  Label Rate 
Citeseer  3,327  4,732  6  3,703  0.036 
Cora  2,708  5,429  7  1,433  0.052 
Pubmed  19,717  44,338  3  500  0.003 
For each dataset, we use 500 nodes for validation, 1000 nodes for testing and the remaining nodes for training. The test instances set are the same as in (Kipf and Welling, 2017; Yang et al., 2016). We randomly sample 500 nodes from the nontest nodes and fixed them as the validation set across all experiments to ensure that the performance variation in the experiments is due to different active learning query strategies. We repeat this process for ten times and test all experiments on all the ten validation sets separately. We follow the label rate (i.e., the number of labelled nodes that are used for GCN training divided by the total number of nodes) used in the existing work (Kipf and Welling, 2017; Yang et al., 2016), denoted as , where is the number of classes in each dataset. Then the label budget for active learning methods is , where is the number of initially labelled nodes.
The active learning framework takes a few labelled nodes at the very beginning stage. Follow the settings in the existing work (e.g., (Kipf and Welling, 2017; Yang et al., 2016)) which consider the label balance across classes, we randomly sample nodes for each class from the nontest and nonvalidation nodes as the initially labelled nodes. We repeat the process for 200 times and report the average results for all experiments. In both both (Kipf and Welling, 2017) and (Yang et al., 2016), the number of initially labelled nodes per class is set as 20, denoted as . In our experiment, we initially label 4 nodes for each class. For fair comparison, the budget of AL strategy is set as , so that the same amount of labelled nodes are used to train GCN for all methods. Note that each algorithm is tested 2000 times (10 (validation sets) 200 (initial labelled sets)).
One common task to evaluate the graph embedding performance is node classification. In this work, we adopt MacroF1 and MicroF1 (Perozzi et al., 2014), two classic classification evaluation measurements, to asset the node classification performance.
All experiments are conducted on Linux computers equipped with Intel(R) 3.50GHz CPUs and 16GB RAMs.
In this section, we first show the reason of adopting PageRank centrality (Eq. 6) in our AGE framework. Five different graph centrality metrics are compared, including degree centrality, closeness centrality, betweenness centrality, PageRank centrality and harmonic centrality. All the five metrics are implemented using NetworkX ^{2}^{2}2https://networkx.github.io/. As shown in Fig. 2, PageRank centrality consistently outperforms the other centrality metrics on both Citeseer and Cora. Although betweenness centrality achieves slightly better performance on Pubmed, considering that PageRank outperforms betweenness on the other two datasets by 7.4% (MacroF1) and 5.1% (MicroF1) on Citeseer and 5.7% (MacroF1) and 2.8% (MicroF1) on Cora, we adopt PageRank as the graph centrality metric in our work.
Now we verify the superiority of using timesensitive parameters over the predefined fixed parameters. For fair comparison, we tune the value of with the other two parameters calculated as . The value of is tuned within the range of with the step .
Citeseer  Cora  Pubmed  
MacroF1  MicroF1  MacroF1  MicroF1  MacroF1  MicroF1  
0.1  0.6576  0.7099  0.8040  0.8170  0.7717  0.7792 
0.2  0.6559  0.7050  0.8001  0.8142  0.7721  0.7790 
0.3  0.6601  0.7067  0.802  0.8163  0.7751  0.7826 
0.4  0.6583  0.7037  0.8061  0.819  0.7704  0.7779 
0.5  0.6520  0.6975  0.8078  0.8201  0.7754  0.7819 
0.6  0.6503  0.6955  0.8096  0.8216  0.7747  0.7817 
0.7  0.6396  0.6847  0.8111  0.8230  0.7798  0.7864 
0.8  0.6399  0.6848  0.808  0.8210  0.7806  0.7870 
0.9  0.6416  0.6918  0.7993  0.8126  0.7824  0.7877 
timesensitive  0.6685  0.7206  0.8123  0.8245  0.7933  0.8012 
The results are illustrated in Table 4 and the best results are highlighted in bold. Compared with deterministically setting the parameter for combining different active learning criteria, the timesensitive parameters provide a more flexible way to find the balance between various criteria at different time, and thus show a better performance. We also underline the best results with predefined parameter for each dataset, i.e., Citeseer (), Cora () and Pubmed (). As shown in Table 4, compared with the fixed parameters setting, the timesensitive parameters relatively improve the node classification performance 1.3% (MacroF1) and 2% (MicroF1) on Citeseer and 1.4% (MacroF1) and 1.7% (MicroF1) on Pubmed.
In this section, we compare with the following AL query strategies. For all strategies except for “Random” and “Pipeline”, we randomly label nodes as introduced in Sect. 5.1.3. Then during the training of GCN, we actively select nodes to label based on different AL metrics as shown in Fig. 1.
Random: Randomly label nodes to train GCN.
Entropy based: Actively select nodes to label by Eq. 4.
Density based: Actively select nodes to label by Eq. 5.
Centrality based: Actively select nodes to label by Eq. 6.
Pipeline: Randomly label nodes to train GCN. After GCN converges, actively select nodes to label by Eq. 7 with timesensitive parameter , and . Finally, train GCN again with all labelled nodes. The pipeline approach is designed to verify that the AL process and graph embedding process can reinforce each other during the training.
AGEfp: Actively select nodes to label by Eq. 7 with fixed (tuned) parameter , and .
AGE: Actively select nodes to label by Eq. 7 with timesensitive parameter , and .
Furthermore, we also compared with the semisupervised graph embedding baseline (i.e., GCN (Kipf and Welling, 2017)) to show the effectiveness of our proposed active learning query strategy on semisupervised graph embedding baseline. For GCN, we follow the settings in their work by randomly sampling nodes for each class as the labelled training data.
As shown in Fig. 3, by combining different node informativeness metrics (i.e, information entropy, density and graph centrality) and considering the timesensitive parameters, our proposed AGE outperforms all the other baselines in terms of the node classification performance. Specifically, compared to the random baseline, AGE improves the node classification accuracy by 7.6% (MacroF1) and 3.2% (MicroF1) on Citeseer, 5.6% (MacroF1) and 3.5% (MicroF1) on Cora, 4.9% (MacroF1) and 4.0% (MicroF1) on Pubmed. Considering each AL criteria alone can only improve the performance to certain extend. Among the three AL criteria, information density is the most unstable one, which even brings a negative effect on Cora dataset. This explains why in the literature, representativeness based AL criterion (e.g., information density) is usually combined with the heterogeneity based criterion (e.g., information entropy). As illustrated in Fig. 3, compared with the traditional “Entropydensity based” algorithm, involving graph centrality score improves the performance by 2% in terms of MacroF1 and 0.9% in terms of MicroF1 averagely. And as we analyzed in Section 5.3, by considering the timesensitive parameters, the performance is further improved by 0.9% (MacroF1) and 1.3% (MicroF1) averagely. Pipeline approach does not provide satisfying performance. The reason may be that with the limited number of initial label nodes, GCN cannot embed the graph correctly. Then the nodes selected based on those node embedding may be not informative enough to provide sufficient information to train a good GCN. In our AGE framework, we select the nodes to label during the training of GCN. The two processes, active learning and graph embedding, reinforce each other during the training phase. Compared with the semisupervised graph embedding baseline GCN, AGE achieves 0.3% (MacroF1) and 2.2% (MicroF1) improvements on Citeseer, 3.5% (MacroF1) and 3.7% (MicroF1) improvements on Cora, 3.4% (MacroF1) and 3.7% (MicroF1) improvements on Pubmed.
In this paper, we proposed a novel active learning framework for graph embedding named Active Graph Embedding (AGE). Unlike the traditional active learning algorithms, AGE processes the data with structural information and learnt representations (node embeddings), and it is carefully designed to address the challenges brought by these two characteristics. First, to exploit the graphical information, a graphical centrality based measurement is considered in addition to the popular information entropy based and information density based query criteria. Second, the active learning and graph embedding process are jointly run together by posing the label query at the end of every epoch of the graph embedding training process. Moreover, the timesensitive weights are put on the three active learning query criteria which focus on the graphical centrality at the beginning and shift the focus to the other two embedding based criteria as the training process progresses (i.e., more accurate embeddings are learnt). We evaluate AGE on three public citation network datasets and verify the effectiveness of our framework design, including three query strategy criteria, timesensitive parameters, by the node classification task. We further compare our proposed AGE with a pipeline baseline to show that active learning and graph embedding reinforce each other during the training process.
Proceedings of the 27th International Conference on Machine Learning
.International Joint Conference on Artificial Intelligence
.Active learning for logistic regression: an evaluation.
Machine Learning 68, 3 (2007), 235–265.
Comments
There are no comments yet.