Log In Sign Up

The Vulnerabilities of Graph Convolutional Networks: Stronger Attacks and Defensive Techniques

by   Huijun Wu, et al.

Graph deep learning models, such as graph convolutional networks (GCN) achieve remarkable performance for tasks on graph data. Similar to other types of deep models, graph deep learning models often suffer from adversarial attacks. However, compared with non-graph data, the discrete features, graph connections and different definitions of imperceptible perturbations bring unique challenges and opportunities for the adversarial attacks and defences for graph data. In this paper, we propose both attack and defence techniques. For attack, we show that the discrete feature problem could easily be resolved by introducing integrated gradients which could accurately reflect the effect of perturbing certain features or edges while still benefiting from the parallel computations. For defence, we propose to partially learn the adjacency matrix to integrate the information of distant nodes so that the prediction of a certain target is supported by more global graph information rather than just few neighbour nodes. This, therefore, makes the attacks harder since one need to perturb more features/edges to make the attacks succeed. Our experiments on a number of datasets show the effectiveness of the proposed methods.


A Targeted Universal Attack on Graph Convolutional Network

Graph-structured data exist in numerous applications in real life. As a ...

I-GCN: Robust Graph Convolutional Network via Influence Mechanism

Deep learning models for graphs, especially Graph Convolutional Networks...

Graphfool: Targeted Label Adversarial Attack on Graph Embedding

Deep learning is effective in graph analysis. It is widely applied in ma...

A Deep Marginal-Contrastive Defense against Adversarial Attacks on 1D Models

Deep learning algorithms have been recently targeted by attackers due to...

GUAP: Graph Universal Attack Through Adversarial Patching

Graph neural networks (GNNs) are a class of effective deep learning mode...

Graph Universal Adversarial Attacks: A Few Bad Actors Ruin Graph Learning Models

Deep neural networks, while generalize well, are known to be sensitive t...

Attack Graph Convolutional Networks by Adding Fake Nodes

Graph convolutional networks (GCNs) have been widely used for classifyin...

Code Repositories


Code implementation of the paper: Graph Structure Reshaping Against Adversarial Attacks on Graph Neural Networks, which has been submitted to NeurIPS 2020 for blind review.

view repo


A study of adversarial attack techniques on Graph Convolutional Networks(GCN) and methods to defend against such attacks - done with emphasis on two recently published papers in the domain.

view repo

1 Introduction

Graph is commonly used to model many real-world relationships, such as social networks [Newman et al.2002], citation networks and transactions [Ron and Shamir2013]. The recent advance [Kipf and Welling2016, Veličković et al.2017, Cao et al.2016, Henaff et al.2015] in deep learning expands its applications on graph data. One common task on graph data is node classification

: for a graph and labels of a portion of nodes, the goal is to predict the labels for the unlabelled nodes. This can be used to classify the unknown roles in the graph. For example, topics of papers in the citation network, customer types in the recommendation systems.

Compared with the classic methods [Bhagat et al.2011][Xu et al.2013], deep learning starts to push forward the performance of node classification tasks. The graph convolutional networks [Bruna et al.2013, Edwards and Xie2016] and its recent variants [Kipf and Welling2016] performs convolution operations in the graph domain by aggregating and combining the information of neighbour nodes. In these works, both node features and the graph structures (i.e., edges) are considered for classifying nodes.

Deep learning methods are often criticized for their lack of robustness [Goodfellow et al.2014]

. In other words, it is not difficult to craft adversarial examples by only perturbing a tiny portion of examples to fool the deep neural networks to give incorrect predictions. Graph convolutional networks are no exception. These vulnerabilities under adversarial attacks are major obstacles for deep learning applications to be used in the safety-critical scenarios. In graph neural networks, one node can be a user in the social network or an e-commerce website. A malicious user may manipulate his profile or connect to targeted users on purpose to mislead the analytics system. Similarly, adding fake comments to specific products can fool the recommender systems of a website.

The key challenge for simply adopting existing adversarial attack techniques used in non-graph data on graph convolutional networks is the discrete input problems. Specifically, the features of the graph nodes are often discrete. The edges, especially those in unweighted graphs, are also discrete. To address this, some recent studies have proposed greedy methods [Wang et al.2018][Zügner et al.2018] to attack the graph-based deep learning systems. A greedy method to perturb either features or graph structure iteratively. Graph structure and features statistics are preserved during the greedy attack. In this paper, we show that although having the discrete input issue, the gradients can still be approximated accurately by integrated gradients. Integrated gradients approximate Shapley values [Hart1989][Lundberg and Lee2016] by integrating partial gradients with respect to input features from reference input to the actual input. Integrated gradients greatly improve the efficiency of the node and edge selection in comparison to iterative methods.

Compared with explorations in attacks, the defence of adversarial examples in graph models is not well-studied. In this paper, we show that one key reason for the vulnerabilities of graph models, such as GCN, is that these models heavily rely on the nearest neighbouring information while making predictions on target nodes. As a result, these models are not robust enough and the attacker can attack the target node easily with few perturbations. Apart from the robustness concern, ignoring the information of distant nodes/edges beyond the nearest neighbours may also result in sub-optimal accuracy [Li et al.2018]. In this paper, we show that we can learn an adjacency matrix where the influence of the distant nodes are optimized for both accuracy and robustness.

Our results on a number of real-world datasets show the effectiveness and efficiency of the proposed attack and defence.

2 Preliminaries

2.1 Graph Convolutional Network

Given an attributed graph , is the adjacency matrix and represents the -dimenisonal binary node features. Assuming the indices for nodes and features are and , respectively. We then consider the task of semi-supervised node classification where a subset of nodes are labelled with labels from classes . The target of the task is to map each node in the graph to a class label. This is often called transductive learning given the fact that the test nodes are already known during the training time.

In this work, we study Graph Convolutional Network (GCN) [Kipf and Welling2016], a well-established method for semi-supervised node classifications. For GCN, initially, . The GCN model then follows the following rule to aggregate the neighbouring features:


where is the adjacency matrix of the graph with self connections added, is a diagonal matrix with , and

is the activation function to introduce non-linearity. Each of the above equation corresponds to one graph convolution layer. A fully connected layer with softmax loss is usually used after

layers of graph convolution layers for the classification. A two-layer GCN is commonly used for semi-supervised node classification tasks [Kipf and Welling2016]. The model can, therefore, be described as:


where . is essentially the symmetrically normalized adjacency matrix. and are the input-to-hidden and hidden-to-output weights, respectively.

2.2 Gradients Based Adversarial Attack for DNN Models

Gradients are commonly exploited to attack deep learning models [Yuan et al.2019]

. One can either use the gradients of the loss function or the gradients of the model output w.r.t the input data to achieve the attacks. Two examples are Fast Gradient Sign Method (FGSM) attack and Jacobian-based Saliency Map Approach (JSMA) attack. Fast Gradient Sign Method (FGSM) 

[Ian J. Goodfellow2014] generates adversarial examples by performing gradient update along the direction of the sign of gradients of loss function w.r.t each pixel for image data. Their perturbation can be expressed as:


where is the magnitude of the perturbation. The generated example is .

JSMA attack was first proposed in  [Papernot et al.2016]. By exploiting the forward derivative of a DNN model, one can find the adversarial perturbations that force the model to mis-classify the test point into a specific target class

. Given a feed-forward neural network

and sample , the Jacobian is computed by:


where the dimensions for the model output and input data are and , respectively. To achieve a target class , one wants gets increased while for all the other to decrease. This is accomplished by exploiting the adversarial saliency map which is defined by:


Starting from a normal example, the attacker follows the saliency map and iteratively perturb the example with a very tiny amount until the predicted label is flipped. For untargeted attack, one tries to minimize the prediction score for the winning class.

3 Integrated Gradients Guided Attack

Although FGSM and JSMA are not the most sophisticated attack techniques, they are still not well-studied for graph models. The success of FGSM and JSMA benefits from the continuous features in pixel color space. However, recent explorations in the graph adversarial attack techniques [Dai et al.2018][Zügner et al.2018][Wang et al.2018]

show that simply applying these methods may not lead to successful attacks. These work address this problem by either using greedy methods or reinforcement learning based methods which are often expensive.

The node features in a graph are often bag-of-words kind of features which can either be 1 or 0. The edges in a graph are also frequently used to express the existence of specific relationships, thus having only 1 or 0 in the adjacency matrix. When attacking the model, the adversarial perturbations are limited to either changing 1 to 0 or vice versa. The main issue of applying vanilla FGSM and JSMA in graph models is the inaccurate gradients. Given a target node , for FGSM, measures the feature importance of all nodes to the loss function value. Here, is the feature matrix, each row of which describes the features for a node in the graph. For a specific feature of node , a larger value of

indicates perturbing the feature to 1 is helpful to get the target node misclassified. However, following this gradient may not help for two reasons: First, the feature value might already be 1 so that we could not perturb it anymore; Second, even if the feature value is 0, since a GCN model may not learn a local linear function between 0 and 1 for this feature value, the result of this perturbation is unpredictable. It is also similar for JSMA as the Jacobian of the model shares all the limitations with the gradients of loss. In other words, vanilla gradients suffer from local gradient problems. Take a simple ReLU network

as an example, when increase from 0 to 1, the function value also increases by 1. However, computing the gradient at gives 0, which does not capture the model behaviours accurately. To address this, we propose an integrated gradients based method rather than directly using vanilla derivatives for the attacks. Integrated gradients was initially proposed by [Sundararajan et al.2017]

to provide sensitivity and implementation invariance for feature attribution in the deep neural networks, particularly the convolutional neural networks for images.

The integrated gradient is defined as follows: for a given model , let be the input, is the baseline input (i.e., for image data, it’s the black image). Consider a straight-line path from to the input , the integrated gradients are obtained by accumulating all the gradients at all the points along the path. Formally, for the feature of , the integrated gradients (IG) is as follows:


For GCN on graph data, we propose a generic attack framework. Given the adjacency matrix , feature matrix , and the target node , we compute the integrated gradients for function w.r.t where is the input for attack. indicates edge attack while indicates feature attack. When is the loss function of the GCN model, we call this attack technique FGSM-like attack with integrated gradients, namely IG-FGSM. Similarly, we call the attack technique by IG-JSMA when is the prediction output of the GCN model. For a targeted IG-JSMA or IG-FGSM attack, the optimization goal is to maximize the value of . Therefore, for the features or edges having the value of 1, we select the features/edges which have the lowest negative IG scores and perturb them to 0. The untargeted IG-JSMA attack aims to minimize the prediction score for the winning class so that we try to increase the input dimensions with high IG scores to 0.

Note that unlike image feature attribution where the baseline input is the black image, we use the all-zero or all-one feature/adjacency matrices to represent the 1 0 or 0 1 perturbations. While removing a specific edge or setting a specific feature from 1 to 0, we set the adjacency matrix and feature matrix to all-zero respectively since we want to describe the overall change pattern of the target function while gradually adding edges/features to the current state of and . On the contrary, to add edges/features, we compute the change pattern by gradually removing edges/features from all-one to the current state, thus setting either or to an all-one matrix. To keep the direction of gradients consistent and ensure the computation is tractable, the IG (for edge attack) is computed as follows:


Algorithm 1 shows the pseudo-code for untargeted IG-JSMA attack. We compute the integrated gradients of the prediction score for winning class w.r.t the entries of . The integrated gradients are then used as metrics to measure the priority of perturbing specific features or edges in the graph . Note that we compute the integrated gradients for 0 and 1 features/edges differently following Eq.(7). Therefore, for a feature/an edge with high perturbation priority, we perturb it by simply flipping it to a different binary value.

Input : Graph , target node
: the GCN model trained on
: the maximum number of perturbations.
Output : Modified graph .
1 Procedure Attack()
2        //compute the gradients as the perturbation scores for nodes and edges. is the label of the winning class. node_scores = IG()
3        edge_scores = IG() //sort nodes and edges according to their scores. iter_nodes = argsort(node_scores)
4        iter_edges = argsort(edge_scores)
5        v = iter_nodes.first, e = iter_edges.first ; while  do
6               //decide which to perturb
7               if  then
8                      if [v] == 0 then
9                             [v] = 1
10                     else
11                             [v] = 0
12                      end if
13                     v =
15              else
16                      if [e] == 0 then
17                             [e] = 1
18                     else
19                             [e] = 0
20                      end if
21                     e =
23               end if
24              budget -= 1
25        end while
26       return //Then train graph model on the corrupted graph.
Algorithm 1 IG-JSMA: Integrated Gradient Guided untargeted JSMA attack on GCN

While setting the number of steps

for computing integrated gradients, one size does not fit all. Essentially, more steps are required to accurately estimate the discrete gradients when the function learned for certain features/edges is non-linear. A simple yet useful heuristic for measuring the non-linearity is the prediction score. A low classification margin for

indicates that the model does not fit the node well. Correspondingly, there is a high chance that the model overfits the features/edges associated with node . In this case, selecting a large may lead to better approximations of the discrete gradients. Therefore, we enlarge the number of steps while attacking the nodes with low classification margins until stable performance is achieved. This adaptive strategy has unique advantages compared with existing methods.

To ensure the perturbations are unnoticeable, the graph structure and feature statistics should be preserved for edge attack and feature attack, respectively. The specific properties to preserve highly depend on the application requirements. For our IG based attacks, we simply check against these application level requirements while selecting an edge or a feature for perturbation. In practice, this process can be trivial as many statistics can be pre-computed or re-computed incrementally [Zügner et al.2018].

4 Defence by Avoiding Over-Localization

As mentioned earlier, GCN models can be easily attacked if we perturb the neighbouring nodes since these models heavily rely on the nearest neighbouring information while aggregating the features. The distant nodes can be highly useful for the predictions but their information is not well captured due to the shallow network structure [Xu et al.2018][Li et al.2018]. The direct consequence of over-relying on nearest neighbours is that an attacker does not need to perturb many features/edges to get the predictions flipped. It is natural to ask the following question: When the node predictions are influenced by more nodes, will the attacker have to perturb more features/edges in order to attack the model?

Figure 1: The average accuracy and classification margin vs. number of layers for GCN model on CORA-ML dataset.

To verify the hypothesis, we perform FGSM node attack (budget = deg(v)) on 40 correctly classified sample nodes in CORA-ML citation network dataset. The model is retrained after the graph is perturbed by the attacks. We measure the average accuracy and classification margins for the samples. For a target node , the classification margin of is where is the ground truth class,

is the probability of class

given to the node

by the graph model. A lower classification margin indicates better attack performance. We keep all the other hyperparameters but set the number of GCN layer to 2, 4, …, 12. As shown in Figure 

1, we found that shallow GCN models exhibit better overall accuracy but vulnerable to attack. This is the over-smooth problem as revealed by existing research [Li et al.2018]. Nevertheless, for deeper GCN models, the information of much more distant nodes are better captured during the learning so that each target node is supported indirectly by more nodes. Therefore, it becomes harder to achieve the adversarial attacks.

In addition, existing study [Zügner et al.2018] also shows that high-degree nodes tend to be hard to attack. This is essentially similar to the above observations as one can regard more layers to bring the influence of distant nodes to the target one. Simply adding more layers, however, may drag down the model accuracy due to that overparameterization often leads to over-fitting.

To address this problem, we propose an optimized GCN model which learns the influence of nodes beyond two-hop neighbours without adding too more layers. The key idea is that: rather than having a fixed adjacency matrix demonstrating the direct edges in the graph, we learn during the training. For a node which are further than hops from the target node , the weights learned in represents the indirect influence of on . However, learning is non-trivial as the number of parameters is where is the number of nodes in the graph. Such a large amount of parameters are hard to train. In fact, not all the entries in needs to be trained. The purpose of learning a non-zero entry is to allow the information of a training node to be aggregated effectively to its distant neighbours which are reachable within hops. Specifically, for a given split of the dataset, suppose we want to learn the influence of the nodes within hops for a target node and . Starting from the each node in the graph, we follow the original edges in the graph to find the -hop neighbours for . The neighbour set contains all the nodes which appear as a -hop neighbour for node . Based on this, we set the an entry in trainable only when: (1) ; (2)

. This reduces the number of parameters significantly. Training only parts of the adjacency matrix can be easily done in the existing ML frameworks. We simply mask the gradients before applying to the parameters in TensorFlow.

Randomly initializing the adjacency matrix parameters turns out not working well. A useful prior for initializing the parameters often improves the training dramatically. A randomized start, however, does not fully utilize the knowledge of the original graph structure. To address this, for the original edges in the graph, we keep the values in the adjacency matrix. For those remotely connected indirect neighbours,

5 Evaluation

We use the widely used CORA-ML, CITESEER [Bojchevski and Günnemann2017] and Polblogs [Adamic and Glance2005] datasets. The overview of the datasets is listed below.

Dataset Nodes Features Edges
CORA-ML 2708 1433 5429
Citeseer 3327 3703 4732
Polblogs 1490 - 19025

We split each graph in labeled (20%) and unlabeled nodes (80%). Among the labeled nodes, half of them is used for training while the rest half is used for validation. For polblogs dataset, since there are no feature attributes, we set the attribute matrix to an identity matrix.

5.1 Transductive Attack

As mentioned, due to the transductive setting, the models are not regarded as fixed while attacking. After perturbing either features or edges, the model is retrained for evaluating the attack effectiveness. To verify the effectiveness of the attack, we select the nodes with different prediction scores. Specifically, we select in total 40 nodes which contain the 10 nodes with top scores, 10 nodes with the lowest scores and 20 randomly selected nodes. We compare the proposed IG-JSMA with several baselines including random attacks, FGSM, and netattack. Note that for the baselines, we conducted direct attacks on the features of the target node or the edges directly connected to the target node. Direct attacks achieve much better attacks so that can act as stronger baselines. Figure 5 shows the classification margins of nodes after re-training the model on the modified graph. Lower classification margins indicate more successful attacks.

(a) CORA
(b) Citeseer
(c) polblogs
Figure 5: The classification margin under different attack techniques.

We found that IG-JSMA outperforms the baselines. More remarkably, IG-JSMA is quite stable as the classification margins have much less variance. Just as stated in  

[Zügner et al.2018], the vanilla gradient based methods, such as FGSM are not able to capture the actual change of loss for discrete data. Similarly, while used to describe the saliency map, the vanilla gradients are also not accurate.

To demonstrate the effectiveness of IG-JSMA, we also compare it with the original JSMA method where the saliency map is computed by the vanilla gradients.

Dataset CORA Citeseer Polblogs
JSMA 0.04 0.06 0.04
IG_JSMA 0.00 0.01 0.01
Table 1: The ratio of correctly classified nodes under JSMA and IG-JSMA attacks.

Table 1 compares the ratio of correctly classified nodes after JSMA and IG-JSMA attacks, respectively. A lower value is better as more nodes are misclassified. We can see that IG-JSMA outperforms JSMA attack. This shows that the saliency map computed by integrated gradients approximate the change patterns of the discrete features/edges better.

Figure 9 gives an intuitive example about this. For the graph, we conducted evasion attack where the parameters of the model are kept fixed as the clean graph. For a target node in the graph, given a two-layer GCN model, the prediction of the target node only relies on its two-hop ego graph. We define the importance of a feature/an edge as follows: For a target node , The brute-force method to measure the importance of the nodes and edges is to remove one node or one edge a time in the graph and check the change of prediction score of the target node.

Assume the prediction score for the winning class is . After setting entry of the adjacency matrix from 1 to 0, the changes to . We define the importance of the edge by . To measure the importance of a node, we could simply remove all the edges connected to the node and see how the prediction scores change. The importance values can be regarded as the ground truth discrete gradients.

Both vanilla gradients and integrated gradients are approximations of the ground truth importance scores. The node importance can be approximated by the sum of the gradients of the prediction score w.r.t all the features of the node as well as the gradients w.r.t to the entries of the adjacency matrix.

In Figure 9, the node color represents the class of the node. Round nodes indicate positive importance scores while diamond nodes indicate negative importance score. The node size indicates value of positive/negative importance score. A larger node means higher importance. Similarly, red edges are the edges which have positive importance scores while blue ones have negative importance scores. while a thicker edge corresponds to a more important edge. The pentagram represents the target node in the attack.

(a) Ground Truth
(b) Vanilla Gradients
(c) Integrated Gradients
Figure 9: The approximations of node/edge importance.

Figure (a)a,  (b)b and  (c)c show the node importance results of brute-force, vanilla gradients and integrated gradients approach respectively (# of steps = 20). The vanilla gradients reveal little information about node/edge importance as almost all the edges are assigned with certain importance scores and it is difficult to see the actual node/edge influence. However, in the brute-force case, we notice that the majority number of edges are considered not important for the target node. Moreover, vanilla gradients underestimate the importance of nodes overall. The integrated gradients, as shown in Figure (c)c is consistent with the ground truth produced by brute-force approach shown in Figure (a)a. With only 20 steps along the path, integrated gradients provide accurate approximations for the importance scores. This shows the integrated gradients approach is effective when used to guide the adversarial attacks on graphs with discrete values.

5.2 Defences

Dataset Attack technique Accuracy Accuracy (w/ defence) defence depth CM (attack) CM (attack and defence)
CORA JSMA 0.809 0.789 3 -0.8021 0.5698
CORA FGSM 3 -0.8335 0.6894
CORA netattack 3 -0.7912 0.6609
CORA IG_JSMA 3 -0.9120 0.6980
CORA JSMA 0.782 4 -0.8021 0.7162
CORA FGSM 4 -0.8335 0.7220
CORA netattack 4 -0.7912 0.4921
CORA IG_JSMA 4 -0.9120 0.5324
Citeseer JSMA 0.682 0.675 3 -0.7928 0.5893
Citeseer FGSM 3 -0.7712 0.5490
Citeseer netattack 3 -0.7802 0.6001
Citeseer IG_JSMA 3 -0.9338 0.5998
Citeseer JSMA 0.671 4 -0.7928 0.5427
Citeseer FGSM 4 -0.7712 0.5134
Citeseer netattack 4 -0.7802 0.5256
Citeseer IG_JSMA 4 -0.9338 0.5242
Table 2: Results of accuracy and classification margins(CM) under different attack settings.

In the following, we study the effectiveness of the proposed defence technique under different settings. The results are given in Table 2, where we show the test accuracy before/after applying the defence techniques. The defence depth is the maximum depth of the trainable edge entry in the adjacency matrix. With a large defence depth value, the GCN model take the distant nodes into account during training. We also compare the average classification margin (CM) for samples with (or without) defence techniques applied. The CM results with attack are only essentially for the original two-layer GCN models where the knowledge of distant nodes is not introduced, therefore they only differ with different attack techniques. For fair comparisons, the attack budgets are set the same for these cases.

Overall, the proposed defence method is able to improve the robustness of the model. The classification margins of the model under attack are improved significantly. Nevertheless, a larger defence depth will inevitably introduces more trainable parameters, thus making the model prone of overfitting. Correspondingly, we noticed a slight accuracy drop with larger defence depth values. However, the number of parameters in our defence technique is still much less than an enhanced GCN model with additional layers. As a result, the accuracy obtained by the proposed model is much better compared with deep GCN models (see Figure 1).

6 Conclusions

Graph neural networks (GNN) significantly improved the analytic performance on many types of graph data. However, as deep neural networks in other types of data, GNN suffers from robustness problems. In this paper, we gave insight into the robustness problem in graph convolutional networks (GCN). We proposed an integrated gradients based attack method that outperformed existing iterative and gradient-based techniques in terms of attack performance. We also revealed the robustness issue was rooted in the local aggregation in GCN model training. We give an effective defence method to improve the robustness of GCN models. We demonstrated the effectiveness and efficiency of our methods on benchmark data. The code of this paper will be available in the StellarGraph library 111