DeepRobust
A pytorch adversarial library for attack and defense methods on images and graphs
view repo
Deep neural networks (DNNs) have achieved significant performance in various tasks. However, recent studies have shown that DNNs can be easily fooled by small perturbation on the input, called adversarial attacks. As the extensions of DNNs to graphs, Graph Neural Networks (GNNs) have been demonstrated to inherit this vulnerability. Adversary can mislead GNNs to give wrong predictions by modifying the graph structure such as manipulating a few edges. This vulnerability has arisen tremendous concerns for adapting GNNs in safetycritical applications and has attracted increasing research attention in recent years. Thus, it is necessary and timely to provide a comprehensive overview of existing graph adversarial attacks and the countermeasures. In this survey, we categorize existing attacks and defenses, and review the corresponding stateoftheart methods. Furthermore, we have developed a repository with representative algorithms (https://github.com/DSEMSU/DeepRobust/tree/master/deeprobust/graph). The repository enables us to conduct empirical studies to deepen our understandings on attacks and defenses on graphs.
READ FULL TEXT VIEW PDFA pytorch adversarial library for attack and defense methods on images and graphs
Graphs can be used as the denotation of a large number of systems across various areas such as social science (social networks), natural science (physical systems, and proteinprotein interaction networks) and knowledge graphs. Graph Neural Networks (GNNs), which generalize traditional deep neural networks (DNNs) to graphs, pave a new way to effectively learn representations for graphs
(Wu et al., 2019b). Due to their strong representation learning capability, GNNs have gained practical significance in various applications ranging from data mining (Kipf and Welling, 2016)(Marcheggiani and Titov, 2017), and computer vision
(Landrieu and Simonovsky, 2018) to healthcare and biology (Ma et al., 2018).As new generalizations of traditional DNNs to graphs, GNNs inherit both advantages and disadvantages of traditional DNNs. Similar to traditional DNNs, GNNs are also powerful in learning representations of graphs and have permeated numerous areas of science and technology. Traditional DNNs are easily fooled by adversarial attacks (Goodfellow et al., 2014; Xu et al., 2019a). In other words, the adversary can insert slight perturbation during either the training or test phases, and the DNN models will totally fail. It is evident (Zügner et al., 2018) that GNNs also inherit this drawback. The attacker can generate graph adversarial perturbations by manipulating the graph structure or node features to fool the GNN models. As illustrated in Figure 1, originally node
was classified by the GNN model as a green node; after node
creates a new connection with node and modifies its own features, the GNN model misclassifies it as a blue node. Such vulnerability of GNNs has arisen tremendous concerns on applying them in safetycritical applications such as financial system and risk management. For example, in a credit scoring system, fraudsters can fake connections with several highcredit customers to evade the fraudster detection models; and spammers can easily create fake followers to increase the chance of fake news being recommended and spread. Therefore, there is an urgent need to investigate graph adversarial attacks and their countermeasures.Pushing this research has a great potential to facilitate the successful adoption of GNNs in a broader range of fields, which encourages increasing attention on graph adversarial attacks and defenses in recent years. Thus, it is necessary and timely to provide a comprehensive and systematic overview on existing algorithms. Meanwhile, it is of great importance to deepen our understandings on graph adversarial attacks via empirical study. These understandings can not only provide knowledge about the behaviors of attacks but also offer insights for us to design defense strategies. These motivate this survey with the following key purposes:
We categorize existing attack methods from various perspectives in Section 3 and review representative algorithms in Section 4.
We classify existing countermeasures according to their defense strategies and give a review on representative algorithms for each category in Section 5.
We perform empirical studies based on the repository we developed that provide comprehensive understandings on graph attacks and defenses in Section 6.
We discuss some promising future directions in Section 7.
Before presenting the review and empirical studies, we first introduce concepts, notations and definitions in this section.
In this survey, we use to denote the structure of a graph where is the set of nodes and is the edge set. We use matrix to denote the adjacency matrix of , where each entry means nodes and are connected in . Furthermore, we use to denote the node attribute matrix where
is the dimension of the node feature vectors. Thus, graph data can be denoted as
. There are a lot of learning tasks on graphs and in this work, we focus on the classification problems on graphs. Furthermore, we use with parameters to denote the learning models in this survey.NodeLevel Classification For nodelevel classification, each node in the graph belongs to a class in the label set . The graph model aims to learn a neural network, based on labeled nodes (training nodes), denoted as , to predict the class of unlabeled nodes (test nodes). The training objective function can be formulated as:
(1) 
where and are the predicted and the true label of node and
is a loss function such as cross entropy.
GraphLevel Classification For graphlevel classification, each individual graph has a class in the label set . We use to denote a set of graphs, and is the labeled set (training set) of . The goal of graphlevel classification is to learn a mapping function to predict the labels of unlabeled graphs. Similar to nodelevel classification, the objective function can be formulated as
(2) 
where is the labeled graph with ground truth and is the prediction of the graph .
Based on the objectives in Section 2.1, we can define a general form of the objective for adversarial attacks, which aims to maximize the loss value of the model in order to get wrong predictions. Thus, the problem of nodelevel graph adversarial attacks can be stated as:
Given and victim nodes subset . Let denote the class for node (predicted or using ground truth). The goal of the attacker is to find a perturbed graph that maximizes the loss value of the victim nodes,
(3)  
where can either be or . Note that is chosen from a constrained domain . Given a fixed perturbation budget , a typical can be implemented as,
(4) 
Notations  Description  Notations  Description 
Graph  Target node  
Perturbed graph  Label of node  
The set of nodes  Neural network model  
The set of labeled nodes  Loss function  
The set of edges  Pairwise loss function  
Adjacency matrix  norm  
Perturbed adjacency matrix  Perturbation budget  
Node attribute matrix  Predicted probability 

Perturbed node attribute matrix  Hidden representation of node  
Dimension of node features  Edge between node and  
We omit the definition of graphlevel adversarial attacks since (1) the graphlevel adversarial attacks can be defined similarly and (2) the majority of the adversarial attacks and defenses focus on nodelevel. Though adversarial attacks have been extensively studied in the image domain, we still need dedicated efforts for graphs due to unique challenges – (1) graph structure is discrete; (2) the nodes in the graph are not independent; and (3) it is difficult to measure whether the perturbation on the graph is imperceptible or not.
With the aforementioned definitions, we list all the notations which will be used in the following sections in Table 1.
In this section, we briefly introduce the main taxonomy of adversarial attacks on graph structured data. Attack algorithms can be categorized into different types based on different goals, resources, knowledge and capacity of attackers. We try to give a clear overview on the main components of graph adversarial attacks.
The adversarial attacks can happen at two phases, i.e., the model training and model testing. It depends on the attacker’s capacity to insert adversarial perturbation:
Evasion Attack: Attacking happens after the GNN model is trained or in the test phase. The model is fixed, and the attacker cannot change the model parameter or structure. The attacker performs evasion attack when in Eq. (3).
Poisoning Attack: Attacking happens before the GNN model is trained. The attacker can add “poisons” into the model training data, letting trained model have malfunctions. It is the case when in Eq. (3).
The attacker can insert adversarial perturbations from different aspects. The perturbations can be categorized as modifying node features, adding/deleting edges, and adding fake nodes. Attackers should also keep the perturbation unnoticeable, otherwise it would be easily detected.
Modifying Feature: Attackers can slightly change the node features while maintaining the graph structure.
Adding or Deleting Edges: Attackers can add or delete edges under certain budget of total actions.
Injecting Nodes: Attackers can insert fake nodes to the graph, and link it with some benign nodes in the graph.
According to the goals of attacks, we can divide the attacks into the following two categories
Targeted Attack: There is a small set of test nodes. The attacker aims to let the trained model misclassify these test samples. It is the case when in Eq. (3). We can further divide targeted attacks into (1) direct attack where the attacker directly modifies the features or edges of the target nodes and (2) influencer attack where the attacker can only manipulate other nodes to influence the targets.
Untargeted Attack: The attacker aims to insert poisons to let the trained model have bad overall performance on all test data. It is the case when in Eq. (3).
Attacker’s knowledge means how much information an attacker knows about the model that he aims to attack. Usually, there are three settings:
Whitebox Attack: All information about the model parameters, training input (e.g, adjacency matrix and attribute matrix) and the labels are given to the attacker.
Graybox Attack:
The attacker only has limited knowledge about the victim model. For example, the attacker cannot access the model paramerters but can access the training labels. Then it can utilize the training data to train surrogate models to estimate the information from victim model.
Blackbox Attack: The attacker does not have access to the model’s parameters or training labels. It can access the adjacency matrix and attribute matrix, and do blackbox query for output scores or labels.
In this part we are going to summarize the victim models that have been proven to be susceptible to adversarial examples.
Graph Neural Networks Graph neural networks are powerful tools in learning representation of graphs (Sun et al., 2018). One of the most successful GNN variants is Graph Convolutional Networks (GCN) (Kipf and Welling, 2016). GCN learns the representation for each node by keeping aggregating and transforming the information from its neighbor nodes. Though GNNs can achieve high performance in various tasks, studies have demonstrated that GNNs including GCN are vulnerable to adversarial attacks (Zügner et al., 2018; Sun et al., 2018).
Other Graph Learning Algorithms In addition to graph neural networks, adversary may attack some other important algorithms for graphs such as network embeddings including LINE (Tang et al., 2015) and Deepwalk (Perozzi et al., 2014), graphbased semisupervised learning (GSSL) (Zhu and Ghahramani, 2002), and knowledge graph embedding (Bordes et al., 2013; Lin et al., 2015).
Attack Methods 



Perturbation Type  Application  Victim Model  
PGD, Minmax (Xu et al., 2019b)  Whitebox  Untargeted  Both  Add/Delete edges  Node Classification  GNN  

Whitebox  Both  Evasion 

Node Classification  GNN  
(Wang and Gong, 2019) 

Targeted  Poisoning  Add/Delete edges  Node Classification  GNN  
Nettack (Zügner et al., 2018)  Graybox  Targeted  Both 

Node Classification  GNN  
Metattack (Zügner and Günnemann, 2019a)  Graybox  Untargeted  Poisoning  Add/Delete edges  Node Classification  GNN  
NIPA (Sun et al., 2019)  Graybox  Untargeted  Poisoning  Inject nodes  Node Classification  GNN  
RLS2V (Dai et al., 2018)  Blackbox  Targeted  Evasion  Add/Delete edges 

GNN  
ReWatt (Ma et al., 2019)  Blackbox  Untargeted  Evasion  Add/Delete edges  Graph Classification  GNN  
(Liu et al., 2019) 

Untargted  Poisoning 


GSSL  
GFAttack (Chang et al., 2019)  Blackbox  Targeted  Evasion  Add/Delete edges  Node Classification 


(Bojchevski and Günnemann, 2018)  Blackbox  Both  Poisoning  Add/Delete edges 



(Zhang et al., 2019)  Whitebox  Targeted  Poisoning  Add/Delete facts  Plausibility Prediction 


CDAttack (Li et al., 2020)  Blackbox  Targeted  Poisoning  Add/Delete edges  Community Detection 


In this section, we review representative algorithms for graph adversarial attacks. Following the categorizations in the previous section, we first divide these algorithms into whitebox, graybox and blackbox and then for algorithms in each category, we further group them into targeted and untargeted attacks. An overall categorization of representative attack methods is shown in Table 2
. In addition, some open source implementations of representative algorithms are listed in Table
5.In whitebox attack setting, the adversary has access to any information about the victim model such as model parameters, training data, labels, and predictions. Although in most of the real world cases we do not have the access to such information, we can still assess the vulnerability of the victim models under the worst situation. Typically, whitebox attacks use the gradient information from the victim model to guide the generation of attacks (Chen et al., 2018b; Xu et al., 2019b; Wu et al., 2019a; Chen et al., 2019a).
Targeted attack aims to mislead the victim model to make wrong predictions on some target samples. A lot of studies follow the whitebox targeted attack setting with a wide range of realworld applications. FGA (Chen et al., 2018b)
extracts the link gradient information from GCN, and then greedily selects the pair of nodes with maximum absolute gradient to modify the graph iteratively. Genetic algorithm based QAttack is proposed to attack a number of community detection algorithms
(Chen et al., 2019a). Iterative gradient attack (IGA) based on the gradient information in the trained graph autoencoder, which is introduced to attack link prediction (Chen et al., 2018a). Furthermore, the vulnerability of knowledge graph embedding is investigated in (Zhang et al., 2019) and the plausibility of arbitrary facts in knowledge graph can be effectively manipulated by the attacker. Recommender systems based on GNNs are also vulnerable to adversarial attacks, which is shown in (Zhou et al., 2020). In addition, there are great efforts on attacking node classification. Traditional attacks in the image domain always use models’ gradients to find adversarial examples. However, due to the discrete property of graph data, directly calculating gradients of models could fail. To solve this issue, the work (Wu et al., 2019a) suggests to use integrated gradient (Sundararajan et al., 2017) to better search for adversarial edges and feature perturbations. During the attacking process, the attacker iteratively chooses the edge or feature which has the strongest effect to the adversarial objective. By this way, it can cause the victim model to misclassify target nodes with a higher successful rate. The work (Zang et al., 2020) assumes there is a set of “bad actor” nodes in a graph. When they flip the edges with any target nodes in a graph, it will cause the GNN model to have a wrong prediction on the target node. These “bad actor” nodes are critical to the safety of GNN models. For example, Wikipedia has hoax articles which have few and random connections to real articles. Manipulating the connections of these hoax articles will cause the system to make wrong prediction of the categories of real articles.Currently there are not many studies on untargeted whitebox attack, and topology attack (Xu et al., 2019b) is one representative algorithm. It first constructs a binary symmetric perturbation matrix where indicates to flip the edge between and and means no modification on . Thus, the goal of the attacker is to find that minimizes the predefined attack loss given a finite budget of edge perturbations , i.e., . It considers two different attack scenarios: attacking pretrained GNN with fixed parameters and attacking a retrainable GNN . For attacking a fixed , the problem can be formulated as,
(5) 
It utilizes the Projected Gradient Descent (PGD) algorithm in (Madry et al., 2017) to search the optimal . Note that the work (Madry et al., 2017) is also one popular attack algorithm in the image domain. For the retrainable GNNs, parameter will be retrained after adversarial manipulation, thus the attack problem is formulated as a minmax form where the inner maximization can be solved by gradient ascent and the outer minimization can be solved by PGD.
Whitebox attacks assume that attackers can calculate gradient through model parameters, which is not always practical in realworld scenarios. Graybox attacks are proposed to generate attacks with limited knowledge on the victim model (Zügner et al., 2018; Zügner and Günnemann, 2019a; Sun et al., 2019). Usually they first train a surrogate model with the labeled training data to approximate the information of the victim model and then generate perturbations to attack the surrogate model. It is noted that these models need the access to the labels of training data, thus they are not blackbox attacks that will be introduced in the following subsection.
The early work on targeted graybox attacks is for graph clustering (Chen et al., 2017). It demonstrates that injecting noise to a DNS query graph can degrade the performance of graph embedding models. Different from (Chen et al., 2017), the work (Zügner et al., 2018) proposes an attack method called Nettack to generate structure and feature attacks, aiming at solving Eq. (3). Besides, they argue that only limiting the perturbation budgets cannot always make the perturbation “unnoticeable”. They suggest the perturbed graphs should also maintain important graph properties, including degree distribution and feature cooccurrence. Therefore, Nettack first selects possible perturbation candidates not violating degree distribution and feature cooccurrence of the original graph. Then it greedily chooses the perturbation that has the largest score to modify the graph, where the score is defined as,
(6) 
where is the probability of node to be the class predicted by the surrogate model. Thus, the goal of the attacker is to maximize the difference in the logprobabilities of the target node . By doing this repeatedly until reaching the perturbation budge , it can get the final modified graph. Furthermore, it suggests that such graph attack can also transfer from model to model, just as the attacks in the image domain (Goodfellow et al., 2014). The authors also conduct influencer attacks where they can only manipulate the nodes except the target. It turns out that influencer attacks lead to a lower decrease in performance compared with directly modifying target node given the same perturbation budget.
Although following the same way of training a surrogate model as Nettack, Metattack (Zügner and Günnemann, 2019a) is a kind of untargeted poisoning attack. It tackles the bilevel problem in Eq. (3) by using metagradient. Basically, it treats the graph structure matrix as a hyperparameter and the gradient of the attacker loss with respect to it can be obtained by:
(7) 
Note that is actually a function with respect to both and . If is obtained by some differential operations, we can compute as follows,
(8) 
where is often obtained by gradient descent in fixed iterations . At iteration , the gradient of with respect to can be formulated as,
(9) 
where denotes learning rate of the gradient descent operation. By unrolling the training procedure from back to , we can get and then . A greedy approach is applied to select the perturbation based on the meta gradient.
Instead of modifying the connectivity of existing nodes, a novel reinforment learning method for node injection poisoning attacks (NIPA) (Sun et al., 2019) is proposed to inject fake nodes into graph data . Specifically, NIPA first injects singleton nodes into the original graph. Then in each action , the attacker first chooses an injected node to connect with another node in the graph and then assigns a label to the injected node. By doing this sequentially, the final graph is statistically similar to the original graph but can degrade the overall model performance.
Different from graybox attacks, blackbox attacks (Dai et al., 2018; Ma et al., 2019; Sun et al., 2019; Bojchevski and Günnemann, 2018; Chang et al., 2019) are more challenging since the attacker can only access the input and output of the victim model. The access of parameters, labels and predicted probability is prohibited.
As mentioned earlier, training a surrogate model requires access to the labels of training data, which is not always practical. We hope to find a way that we only need to do blackbox query on the victim model (Dai et al., 2018) or attack the victim in an unsupervised fashion (Bojchevski and Günnemann, 2018; Chang et al., 2019).
To do blackbox query on the victim model, reinforcement learning is introduced. RLS2V
(Dai et al., 2018)is the first work to employ reinforcement learning technique to generate adversarial attacks on graph data under the blackbox setting. They model the attack procedure as a Markov Decision Process (MDP) and the attacker is allowed to modify
edges to change the predicted label of the target node . They study both nodelevel (targeted) and graphlevel (untargeted) attacks. For nodelevel attack, they define the MDP as follows,State The state is represented by the tuple where is the modified graph at time step .
Action A single action at time step is denoted as . For each action , the attacker can choose to add or remove an edge from the graph. Furthermore, a hierarchical structure is applied to decompose the action space.
Reward Since the goal of the attacker is to change the classification result of the target node , RLS2V gives nonzero reward to the attacker at the end of the MDP:
In the intermediate steps, the attacker receives no reward, i.e., .
Termination The process terminates when the attacker finishes modifying edges.
Since they define the MDP of graphlevel attack in the similar way, we omit the details. Further, the Qlearning algorithm (Mnih et al., 2013) is adopted to solve the MDP and guide the attacker to modify the graph.
Instead of attacking node classification, the work (Bojchevski and Günnemann, 2018) shows a way to attack the family of node embedding models in the blackbox setting. Inspired by the observation that DeepWalk can be formulated in matrix factorization form (Qiu et al., 2018), they maximize the unsupervised DeepWalk loss with matrix perturbation theory by performing edge flips. It is further demonstrated that the perturbed structure is transferable to other models like GCN and Label Propagation. However, this method only considers the structure information. GFAttack (Chang et al., 2019) is proposed to incorporate the feature information into the attack model. Specifically, they formulate the connection between the graph embedding method and general graph signal process with graph filter and construct the attacker based on the graph filter and attribute matrix. GFAttack can also be transferred to other network embedding models and achieves better performance than the method in (Bojchevski and Günnemann, 2018).
It is argued that the perturbation constraining only the number of modified edges may not be unnoticeable enough. A novel framework ReWatt (Ma et al., 2019) is proposed to solve this problem and perform untargeted graphlevel attack. Still employing a reinforcement learning framework, ReWatt adopts the rewiring operation instead of simply adding/deleting an edge in one single modification to make perturbation more unnoticeable. One rewiring operation involves three nodes and , where ReWatt removes the existing edge between and and connects and . ReWatt also constrains to be the 2hop neighbor of to make perturbation smaller. Such rewiring operation does not change the number of nodes and edges in the graph and it is further proved that such rewiring operation affects algebraic connectivity and effective graph resistance, both of which are important graph properties based on graph Laplacian, in a smaller way than adding/deleting edges.
In previous sections, we have shown that graph neural networks can be easily fooled by unnoticeable perturbation on graph data. The vulnerability of graph neural networks poses great challenges to apply them in safetycritical applications. In order to defend the graph neural networks against these attacks, different countermeasure strategies have been proposed. The existing methods can be categorized into the following types: (1) adversarial training, (2) adversarial perturbation detection, (3) certifiable robustness, (4) graph purification, and (5) attention mechanism.
Adversarial training is a widely used countermeasure for adversarial attacks in image data (Goodfellow et al., 2014). The main idea of adversarial training is to inject adversarial examples into the training set such that the trained model can correctly classify the future adversarial examples. Similarly, we can also adopt this strategy to defend graph adversarial attacks as follows,
(10) 
where , denote the perturbation on , respectively; and stand for the domains of imperceptible perturbation. The minmax optimization problem in Eq (10) indicates that adversarial training involves two process: (1) generating perturbations that maximize the prediction loss and (2) updating model parameters that minimize the prediction loss. By alternating the above two process iteratively, we can train a robust model against to adversarial attacks. Since there are two inputs, i.e., adjacency matrix and attribute matrix , adversarial training can be done on them separately. To generate perturbations on the adjacency matrix, it is proposed to randomly drop edges during adversarial training (Dai et al., 2018). Though such simple strategy cannot lead to very significant improvement in classification accuracy (1% increase), it shows some effectiveness with such cheap adversarial training. Furthermore, projected gradient descent is used to generate perturbations on the discrete input structure, instead of randomly dropping edges (Xu et al., 2019b). On the other hand, an adversarial training strategy with dynamic regularization is proposed to perturb the input features (Feng et al., 2019). Specifically, it includes the divergence between the prediction of the target example and its connected examples into the objective of adversarial training, aiming to attack and reconstruct graph smoothness. Furthermore, batch virtual adversarial training (Deng et al., 2019) is proposed to promote the smoothness of GNNs and make GNNs more robust against adversarial perturbations. Several other variants of adversarial training on the input layer are introduced in (Chen et al., 2019b; Dai et al., 2019; Wang et al., 2019).
The aforementioned adversarial training strategies face two main shortcomings: (1) they generate perturbations on and separately; and (2) it is not easy to perturb the graph structure due to its discreteness. To overcome the shortcomings, instead of generating perturbation on the input, a latent adversarial training method injects perturbations on the first hidden layer (Jin and Zhang, 2019):
(11) 
where denotes the representation matrix of the first hidden layer and is some perturbation on . It is noted that the hidden representation is continuous and it incorporates the information from both graph structure and node attributes.
To resist graph adversarial attacks during the test phase, there is one main strategy called adversary detection. These detection models protect the GNN models by exploring the intrinsic difference between adversarial edges/nodes and the clean edges/nodes (Xu et al., 2018; Ioannidis et al., 2019). The work (Xu et al., 2018)
is the first work to propose detection approaches to find adversarial examples on graph data. It introduces four methods to distinguish adversarial edges or nodes from the clean ones including (1) link prediction (2) subgraph link prediction (3) graph generation models and (4) outlier detection. These methods have shown some help to correctly detect adversarial perturbations. The work
(Ioannidis et al., 2019) introduces a method to randomly draw subsets of nodes, and relies on graphaware criteria to judiciously filter out contaminated nodes and edges before employing a semisupervised learning (SSL) module. The proposed model can be used to detect different anomaly generation models, as well as adversarial attacks.Previous introduced adversarial training strategies are heuristic and only show experimental benefits. However, we still do not know whether there exist adversarial examples even when current attacks fail. Therefore, there are works
(Zügner and Günnemann, 2019b; Bojchevski and Günnemann, 2019; Jia et al., 2020) considering to seriously reason the safety of graph neural networks which try to certify the GNN’s robustness. As we know, GNN’s prediction on one node always depends on its neighbor nodes. In (Zügner and Günnemann, 2019b), they ask the question: which nodes in a graph are safe under the risk of any admissible perturbations of its neighboring nodes’ attributes. To answer this question, for each node and its corresponding label , they try to find an upper bound of the maximized margin loss:(12) 
where denotes the set of all allowed attributes perturbations. This upper bound is called the certificate of node , and it is tractable to calculate. Therefore, for , if , any attribute perturbation in can not change the model’s prediction, because its maximized margin loss is below 0. During the test phase, they calculate the certificate for all test nodes, thus they can know how many nodes in a graph is absolutely safe under attributes perturbation. Moreover, this certificate is trainable, directly minimizing the certificates will help more nodes become safe. However, the work (Zügner and Günnemann, 2019b) only considers the perturbations on node attributes. Analyzing certifiable robustness from a different perspective, the work (Bojchevski and Günnemann, 2019) deals with the case when the attacker only manipulates the graph structure. It derives the robustness certificates (similar to Eq. (12)) as a linear function of personalized PageRank (Jeh and Widom, 2003), which makes the optimization tractable. Besides the works concentrate on GNN node classification tasks, there are also other works studying certifiable robustness on GNN’s other applications such as community detection (Jia et al., 2020).
Both adversarial training or certifiable defense methods only target on resisting evasion attacks, which means that the attack happens during the test time. While, graph purification defense methods mainly focus on defending poisoning attacks. Since the poisoning attacks insert poisons into the training graph, purification methods first purify the perturbed graph data and then train the GNN model on the purified graph. By this way, the GNN model is trained on a clean graph. The work (Wu et al., 2019a) proposes a purification method based on two empirical observations of the attack methods: (1) Attackers usually prefer adding edges over removing edges or modifying features and (2) Attackers tend to connect dissimilar nodes. As a result, they propose a defense method by eliminating the edges whose two end nodes have small Jaccard Similarity (Said et al., ). Because these two nodes are different and it is not likely they are connected in reality, the edge between them may be adversarial. The experimental results demonstrate the effectiveness and efficiency of the proposed defense method. However, this method can only work when the node features are available. In (Entezari et al., 2020), it is observed that Nettack (Zügner et al., 2018)
generates the perturbations which mainly changes the small singular values of the graph adjacency matrix. Thus it proposes to purify the perturbed adjacency matrix by using truncated SVD to get its lowrank approximation. it further shows that only keeping the top
singular values of the adjacency matrix is able to defend Nettack and improve the performance of GNNs.Different from the purification methods which try to exclude adversarial perturbations, attentionbased defense methods aim to train a robust GNN model by penalizing model’s weights on adversarial edges or nodes. Basically, these methods learn an attention mechanism to distinguish adversarial edges and nodes from the clean ones, and then make the adversarial perturbations contribute less to the aggregation process of the GNN training. The work (Zhu et al., 2019) first assumes that adversarial nodes may have high prediction uncertainty, since adversary tends to connect the node with nodes from other communities. In order to penalize the influence from these uncertain nodes, they propose to model the th layer hidden representation
of nodes as Gaussian distribution with mean value
and variance
,(13) 
where the uncertainty can be reflected in the variance . When aggregating the information from neighbor nodes, it applies an attention mechanism to penalize the nodes with high variance,
(14) 
where is the attention score assigned to node and is a hyperparameter. Furthermore, it is verified that the attacked nodes do have higher variances than normal nodes and the proposed attention mechanism does help mitigate the impact brought by adversarial attacks.
The work in (Tang et al., 2020) suggests that to improve the robustness of one target GNN model, it is beneficial to include the information from other clean graphs, which share the similar topological distributions and node attributes with the target graph. For example, Facebook and Twitter have social network graph data that share similar domains; Yelp and Foursquare have similar coreview graph data. Thus, it first generates adversarial edges on the clean graphs, which serve as the supervision of known perturbation. With this supervision knowledge, it further designs the following loss function to reduce the attention score of adversarial edges:
(15) 
where denotes the expectation, represents normal edges in the graph, is the attention score assigned to edge and is a hyper parameter controlling the margin between the expectation of two distributions. It then adopts metaoptimization to train a model initialization and finetunes it on the target poisoned graph to get a robust GNN model.
We have developed a repository that includes the majority of the representative attack and defense algorithms on graphs^{2}^{2}2https://github.com/DSEMSU/DeepRobust/tree/master/deeprobust/graph. The repository enables us to deepen our understandings on graph attacks and defends via empirical study. Next we first introduce the experimental settings and then present the empirical results and findings.
Different attack and defense methods have been designed under different settings. Due to the page limitation, we perform the experiments with one of the most popular settings – the untargeted poisoning setting. Correspondingly we choose representative attack and defense methods that have been designed for this setting. Three representative attack methods are adopted to generate perturbations including DICE (Waniek et al., 2018), Metattack (Zügner and Günnemann, 2019a) and Topology attack (Xu et al., 2019b). It is noted that DICE is a whitebox attack which randomly connects nodes with different labels or drops edges between nodes sharing the same label. To evaluate the performance of different defense methods under adversarial attacks, we compare the robustness of the natural trained GCN (Kipf and Welling, 2016) and four defense methods on those attacked graphs, i.e., GCN (Kipf and Welling, 2016), GCNJaccard (Wu et al., 2019a), GCNSVD (Entezari et al., 2020), RGCN (Zhu et al., 2019) and GAT (Veličković et al., 2017). Following (Zügner and Günnemann, 2019a), we use three datasets: Cora, Citeseer (Sen et al., 2008) and Polblogs (Adamic and Glance, 2005). For each dataset, we randomly choose 10% of nodes for training, 10% of nodes for validation and the remaining 80% for test. We repeat each experiment for 5 times and report the average performance. On Cora and Citeseer datasets, the most destructive variant CEminmax (Xu et al., 2019b) is adopted to implement Topology attack. But CEminmax cannot converge on Polblogs dataset, we adopt another variant called CEPGD (Xu et al., 2019b) on this dataset.
One way to understand the behaviors of attacking methods is to compare the properties of the clean graph and the attacked graph. In this subsection, we perform this analysis from both global and local perspectives. Global Measure We have collected five global properties from both clean graphs and perturbed graphs generated by the three attacks on the three datasets. These properties include the number of added edges, the number of deleted edges, the number of edges, the rank of the adjacent matrix, and clustering coefficient. We only show the results of Metattack in Table 3. Results for DICE and Topology attacks can be found in Appendix A. Note that we vary the perturbations from to with a step of and perturbation denotes the original clean graph. It can be observed from the table:
Attackers favor adding edges over deleting edges.
Attacks are likely to increase the rank of the adjacency matrix.
Attacks are likely to reduce the connectivity of a graph. The clustering coefficients of a perturbed graph decrease with the increase of the perturbation rate.
Dataset  (%)  edge+  edge  edges  ranks 


Cora  0  0  0  5069  2192  0.2376  
5  226  27  5268  2263  0.2228  
10  408  98  5380  2278  0.2132  
15  604  156  5518  2300  0.2071  
20  788  245  5633  2305  0.1983  
25  981  287  5763  2321  0.1943  
Citeseer  0  0  0  3668  1778  0.1711  
5  181  2  3847  1850  0.1616  
1  341  25  3985  1874  0.1565  
15  485  65  4089  1890  0.1523  
20  614  119  4164  1902  0.1483  
25  743  174  4236  1888  0.1467  
Polblogs  0  0  0  16714  1060  0.3203  
5  732  103  17343  1133  0.2719  
10  1347  324  17737  1170  0.2825  
15  1915  592  18038  1193  0.2851  
20  2304  1038  17980  1193  0.2877  
25  2500  1678  17536  1197  0.2723  
Local Measure We have also studied two local properties including the feature similarity and label equality between two nodes connected by three kinds of edges: the newly added edges, the deleted edges and the normal edges which have not been changed by the attack methods. Since features are binary in our datasets, we use jaccard similarity as the measure for feature similarity. For label equality, we report the ratio if two nodes share the same label or have different labels. The feature similarity and label equality results are demonstrated in Figures 2 and 3, respectively. We show the results for Metattack with perturbations. Results for DICE and Topology attacks can be found in Appendix B. Note that we do not have feature similarity results on Polblogs since this dataset does not have node features. We can make the following observations from the figures.
Attackers tend to connect nodes with different labels and dissimilar features.
Attackers tend to remove edges from nodes which share similar features and same label.
In this subsection, we study how the attack methods perform and whether the defense methods can help resist to attacks. Similarly, we vary the perturbations from to with a step of . The results are demonstrated in Table 4. We show the performance for Metattack. Results for DICE and Topology attacks are shown in Appendix C. Note that we do not have the performance for Jaccard defense model in Polblogs since this mode requires node features and Polblogs does not provide node features. According to the results, we have the following observations:
With the increase of the perturbations, the performance of GCN dramatically deceases. This result suggests that Metattack can lead to a significant reduce of accuracy on the GCN model.
When the perturbations are small, we observe small performance reduction for defense methods which suggests their effectiveness. However, when the graphs are heavily poisoned, their performance also reduces significantly which indicates that efforts are needed to defend heavily poisoning attacks.
Dataset  (%)  0  5  10  15  20  25 

Cora  GCN  83.10  76.69  65.58  54.88  48.66  38.44 

Jaccard^{1}^{1}1  82.39  81.02  77.28  72.74  69.16  64.56 

SVD^{2}^{2}2  77.97  75.67  70.51  64.34  55.89  45.92 

RGCN  84.81  81.32  72.12  60.25  49.75  37.76 

GAT  81.69  74.75  61.69  52.56  45.30  38.52 
Citeseer 
GCN  74.53  72.59  63.96  61.66  50.58  44.32 

Jaccard^{1}^{1}1  74.82  73.60  73.50  72.80  72.97  72.53 

SVD^{2}^{2}2  70.32  71.30  67.58  63.86  56.91  45.28 

RGCN  74.41  72.68  71.15  69.38  67.93  67.24 

GAT  74.23  72.01  67.12  57.70  47.97  38.70 
Polblogs 
GCN  95.80  73.93  72.07  67.69  62.29  52.97 

SVD^{2}^{2}2  94.99  82.64  71.27  66.09  61.37  52.82 

RGCN  95.60  72.01  67.12  57.70  47.97  38.70 

GAT  95.40  84.83  77.03  69.94  53.62  53.76 
Jaccard: GCNJaccard defense model.
SVD: GCNSVD defense model.
In this survey, we give a comprehensive overview of an emerging research field, adversarial attacks and defenses on graph data. We investigate the taxonomy of graph adversarial attacks, and review representative adversarial attacks and the corresponding countermeasures. Furthermore, we conduct empirical study to show how different defense methods behave under different attacks, as well as the changes in important graph properties by the attacks. Via this comprehensive study, we have gained deep understandings on this area that enables us to discuss some promising research directions.
Imperceptible perturbation measure. Different from image data, humans cannot easily tell whether a perturbation on graph is imperceptible or not. The norm constraint on perturbation is definitely not enough. Currently only very few existing work study this problem, thus finding concise perturbation evaluation measure is of great urgency.
Different graph data. Existing works mainly focus on static graphs with node attributes. Complex graphs such as graphs with edge attributes and dynamic graphs are not wellstudied yet.
Existence and transferability of graph adversarial examples. There are only a few works discussing about the existence and transferability of graph adversarial examples. Studying this topic is important for us to understand our graph learning algorithm, thus helping us build robust models.
Graph structure learning. By analyzing the attacked graph, we find that attacks are likely to change certain properties of graphs. Therefore, we can learn a graph from the poisoned graphs by exploring these properties to build robust GNNs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 4558–4567. Cited by: §1.Twentyninth AAAI conference on artificial intelligence
, Cited by: §3.5.Towards deep learning models resistant to adversarial attacks
. arXiv preprint arXiv:1706.06083. Cited by: §4.1.2.Dataset  (%)  edges+  edges  edges  ranks 


Cora  0  0  0  5069  2192  0.2376  
5  255  0  5324  2292  0.2308  
10  508  0  5577  2369  0.2185  
15  762  0  5831  2417  0.2029  
20  1015  0  6084  2442  0.1875  
25  1269  0  6338  2456  0.1736  
Citeseer  0  0  0  3668  1778  0.1711  
5  185  0  3853  1914  0.1666  
10  368  0  4036  2003  0.1568  
15  552  0  4220  2058  0.1429  
20  735  0  4403  2077  0.1306  
25  918  0  4586  2087  0.1188  
Polblogs  0  0  0  16714  1060  0.3203  
5  716  96  17334  1213  0.2659  
10  1532  128  18118  1220  0.2513  
15  2320  146  18887  1221  0.2408  
20  3149  155  19708  1221  0.2317  
25  3958  163  20509  1221  0.2238  
Dataset  (%)  edge+  edge  edges  ranks 


Cora  0  0  0  5069  2192  0.2376  
5  125  128  5066  2210  0.2163  
10  251  255  5065  2238  0.1966  
15  377  383  5063  2246  0.1786  
20  504  509  5063  2261  0.1583  
25  625  642  5053  2270  0.1448  
Citeseer  0  0  0  3668  1778  0.1711  
5  91  92  3667  1803  0.1576  
10  183  183  3668  1828  0.1408  
15  276  274  3670  1840  0.1288  
20  368  365  3672  1860  0.1187  
25  462  455  36755  1871  0.1084  
Polblogs  0  0  0  16714  1060  0.3203  
5  420  415  16719  1155  0.2822  
10  846  825  16736  1192  0.2487  
15  1273  1234  16752  1208  0.2224  
20  1690  1652  16752  1214  0.2009  
25  2114  2064  16765  1217  0.1821  
Dataset  (%)  0  5  10  15  20  25 

Cora  GCN  83.10  82.20  81.15  80.54  79.40  77.78 
Jaccard^{1}^{1}1  82.39  81.66  80.94  80.24  79.41  78.31  
SVD^{2}^{2}2  77.97  76.55  74.35  72.71  59.77  70.41  
RGCN  84.81  83.87  82.72  81.64  80.77  79.53  
GAT  81.69  79.33  77.36  75.23  73.78  72.05  
Citeseer  GCN  74.53  74.21  73.90  72.36  72.27  71.50 
Jaccard^{1}^{1}1  74.82  74.56  74.14  73.51  73.22  72.22  
SVD^{2}^{2}2  70.32  70.91  70.27  69.19  67.63  66.82  
RGCN  74.41  74.72  74.22  73.42  72.71  72.16  
GAT  74.23  73.78  72.86  71.48  70.25  69.68  
Polblogs  GCN  95.80  92.78  90.78  90.12  88.28  87.79 
SVD^{2}^{2}2  94.99  93.09  92.39  91.31  90.72  90.61  
RGCN  95.60  92.72  90.70  89.80  88.34  87.28  
GAT  95.40  93.56  91.82  91.27  89.65  89.30  
Jaccard: GCNJaccard defense model.
SVD: GCNSVD defense model.
Dataset  (%)  0  5  10  15  20  25 

Cora  GCN  83.10  71.82  68.96  66.77  64.21  62.52 
Jaccard^{1}^{1}1  82.39  73.05  72.62  71.84  71.41  70.85  
SVD^{2}^{2}2  77.97  78.17  75.92  73.69  72.03  70.11  
RGCN  84.81  72.68  71.15  69.38  67.92  67.23  
GAT  81.69  71.03  68.80  65.66  64.29  62.58  
Citeseer  GCN  74.53  79.29  75.47  72.89  70.12  68.49 
Jaccard^{1}^{1}1  74.82  79.07  76.76  74.29  71.87  69.55  
SVD^{2}^{2}2  70.32  78.17  75.92  73.69  72.03  70.11  
RGCN  74.41  78.13  75.93  73.93  72.32  70.60  
GAT  74.23  77.52  74.09  71.90  69.62  66.99  
Polblogs  GCN  95.80  72.04  65.87  63.35  61.06  58.49 
SVD^{2}^{2}2  94.99  71.90  65.42  63.01  60.74  58.26  
RGCN  95.60  71.27  65.30  62.76  60.25  57.89  
GAT  95.40  72.56  65.97  63.35  60.94  58.77  
Jaccard: GCNJaccard defense model.
SVD: GCNSVD defense model.