Black-Box Adversarial Attacks on Graph Neural Networks with Limited Node Access

by   JiaQi Ma, et al.
University of Michigan

We study the black-box attacks on graph neural networks (GNNs) under a novel and realistic constraint: attackers have access to only a subset of nodes in the network, and they can only attack a small number of them. A node selection step is essential under this setup. We demonstrate that the structural inductive biases of GNN models can be an effective source for this type of attacks. Specifically, by exploiting the connection between the backward propagation of GNNs and random walks, we show that the common gradient-based white-box attacks can be generalized to the black-box setting via the connection between the gradient and an importance score similar to PageRank. In practice, we find attacks based on this importance score indeed increase the classification loss by a large margin, but they fail to significantly increase the mis-classification rate. Our theoretical and empirical analyses suggest that there is a discrepancy between the loss and mis-classification rate, as the latter presents a diminishing-return pattern when the number of attacked nodes increases. Therefore, we propose a greedy procedure to correct the importance score that takes into account of the diminishing-return pattern. Experimental results show that the proposed procedure can significantly increase the mis-classification rate of common GNNs on real-world data without access to model parameters nor predictions.



page 1

page 2

page 3

page 4


Adversarial Attack on Graph Neural Networks as An Influence Maximization Problem

Graph neural networks (GNNs) have attracted increasing interests. With b...

Black-box Gradient Attack on Graph Neural Networks: Deeper Insights in Graph-based Attack and Defense

Graph Neural Networks (GNNs) have received significant attention due to ...

Bandits for Structure Perturbation-based Black-box Attacks to Graph Neural Networks with Theoretical Guarantees

Graph neural networks (GNNs) have achieved state-of-the-art performance ...

Structack: Structure-based Adversarial Attacks on Graph Neural Networks

Recent work has shown that graph neural networks (GNNs) are vulnerable t...

Black-box Node Injection Attack for Graph Neural Networks

Graph Neural Networks (GNNs) have drawn significant attentions over the ...

Random Walks for Adversarial Meshes

A polygonal mesh is the most-commonly used representation of surfaces in...

Logic Rules Meet Deep Learning: A Novel Approach for Ship Type Classification

The shipping industry is an important component of the global trade and ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph neural networks (GNNs) (Wu et al., 2020)

, the family of deep learning models on graphs, have shown promising empirical performance on various applications of machine learning to graph data, such as recommender systems 

(Ying et al., 2018), social network analysis (Li et al., 2017), and drug discovery (Shi et al., 2020). Like other deep learning models, GNNs have also been shown to be vulnerable under adversarial attacks (Zügner et al., 2018), which has recently attracted increasing research interest (Jin et al., 2020). Indeed, adversarial attacks have been an efficient tool to analyze both the theoretical properties as well as the practical accountability of graph neural networks. As graph data have more complex structures than image or text data, researchers have come up with diverse adversarial attack setups. For example, there are different tasks (node classification and graph classification), assumptions of attacker’s knowledge (white-box, grey-box, and black-box), strategies (node feature modification and graph structure modification), and corresponding budget or other constraints (norm of feature changes or number of edge changes).

Despite these research efforts, there is still a considerable gap between the existing attack setups and the reality. It is unreasonable to assume that an attacker can alter the input of a large proportion of nodes, and even if there is a budget limit, it is unreasonable to assume that they can attack any node as they wish. For example, in a real-world social network, the attackers usually only have access to a few bot accounts, and they are unlikely to be among the top nodes in the network; it is difficult for the attackers to hack and alter the properties of celebrity accounts. Moreover, an attacker usually has limited knowledge about the underling machine learning model used by the platform (e.g., they may roughly know what types of models are used but have no access to the model parameters or training labels). Motivated by the real-world scenario of attacks, in this paper we study a new type of black-box adversarial attack for node classification tasks, which is more restricted and more realistic, assuming that the attacker has no access to the model parameters or predictions. Our setup differs from existing work with a novel constraint on node access, where attackers only have access to a subset of nodes in the graph, and they can only manipulate a small number of them.

The proposed black-box adversarial attack requires a two-step procedure: 1) selecting a small subset of nodes to attack under the limits of node access; 2) altering the node attributes or edges under a per-node budget. In this paper, we focus on the first step and study the node selection strategy. The key insight of the proposed strategy lies in the observation that, with no access to the GNN parameters or predictions, the strong structural inductive biases

of the GNN models can be exploited as an effective information source of attacks. The structural inductive biases encoded by various neural architectures (e.g., the convolution kernel in convolutional neural networks) play important roles in the success of deep learning models. GNNs have even more explicit structural inductive biases due to the graph structure and their heavy weight sharing design. Theoretical analyses have shown that the understanding of structural inductive biases could lead to better designs of GNN models 

(Xu et al., 2018b; Klicpera et al., 2018). From a new perspective, our work demonstrates that such structural inductive biases can turn into security concerns in a black-box attack, as the graph structure is usually exposed to the attackers.

Following this insight, we derive a node selection strategy with a formal analysis of the proposed black-box attack setup. By exploiting the connection between the backward propagation of GNNs and random walks, we first generalize the gradient-norm in a white-box attack into a model-independent importance score similar to the PageRank. In practice, attacking the nodes with high importance scores increases the classification loss significantly but does not generate the same effect on the mis-classification rate. Our theoretical and empirical analyses suggest that such discrepancy is due to the diminishing-return effect of the mis-classification rate. We further propose a greedy correction procedure for calculating the importance scores. Experiments on three real-world benchmark datasets and popular GNN models show that the proposed attack strategy significantly outperforms baseline methods. We summarize our main contributions as follows:

  1. We propose a novel setup of black-box attacks for GNNs with a constraint of limited node access, which is by far the most restricted and realistic compared to existing work.

  2. We demonstrate that the structural inductive biases of GNNs can be exploited as an effective information source of black-box adversarial attacks.

  3. We analyze the discrepancy between classification loss and mis-classification rate and propose a practical greedy method of adversarial attacks for node classification tasks.

  4. We empirically verify the effectiveness of the proposed method on three benchmark datasets with popular GNN models.

2 Related Work

2.1 Adversarial Attack on GNNs

The study of adversarial attacks on graph neural networks has surged recently. A taxonomy of existing work has been summarized by Jin et al. (2020), and we give a brief introduction here. First, there are two types of machine learning tasks on graphs that are commonly studied, node-level classification and graph-level classification. We focus on the node-level classification in this paper. Next, there are a couple of choices of the attack form. For example, the attack can happen either during model training (poisoning) or during model testing (evasion); the attacker may aim to mislead the prediction on specific nodes (targeted attack) (Zügner et al., 2018) or damage the overall task performance (untargeted attack) (Zügner and Günnemann, 2019); the adversarial perturbation can be done by modifying node features, adding or deleting edges, or injecting new nodes (Sun et al., 2019). Our work belongs to untargeted evasion attacks. For the adversarial perturbation, most existing works of untargeted attacks apply global constraints on the proportion of node features or the number of edges to be altered. Our work sets a novel local constraint on node access, which is more realistic in practice: perturbation on top (e.g., celebrity) nodes is prohibited and only a small number of nodes can be perturbed. Finally, depending on the attacker’s knowledge about the GNN model, existing work can be split into three categories: white-box attacks (Xu et al., 2019; Chen et al., 2018; Wu et al., 2019) have access to full information about the model, including model parameters, input data, and labels; grey-box attacks (Zügner and Günnemann, 2019; Zügner et al., 2018; Sun et al., 2019) have partial information about the model and the exact setups vary in a range; in the most challenging setting, black-box attacks (Dai et al., 2018; Bojchevski and Günnemann, 2018; Chang et al., 2020) can only access the input data and sometimes the black-box predictions of the model. In this work, we consider an even more strict black-box attack setup, where model predictions are invisible to the attackers. As far as we know, the only existing works that conduct untargeted black-box attacks without access to model predictions are those by Bojchevski and Günnemann (2018) and Chang et al. (2020). However both of them require the access to embeddings of nodes, which are prohibited as well in our setup.

2.2 Structural Inductive Bias of GNNs

While having an extremely restricted black-box setup, we demonstrate that effective adversarial attacks are still possible due to the strong and explicit structural inductive biases of GNNs.

Structural inductive biases refer to the structures encoded by various neural architectures, such as the weight sharing mechanisms in convolution kernels of convolutional neural networks, or the gating mechanisms in recurrent neural networks. Such neural architectures have been recognized as a key factor for the success of deep learning models 

(Zoph and Le, 2016), which (partially) motivate some recent developments of neural architecture search (Zoph and Le, 2016), Bayesian deep learning (Wilson, 2020), Lottery Ticket Hypothesis (Frankle and Carbin, 2018), etc. The natural graph structure and the heavy weight sharing mechanism grant GNN models even more explicit structural inductive biases. Indeed, GNN models have been theoretically shown to share similar behaviours as Weisfeiler-Lehman tests (Morris et al., 2019; Xu et al., 2018a) or random walks (Xu et al., 2018b). On the positive side, such theoretical analyses have led to better GNN model designs (Xu et al., 2018b; Klicpera et al., 2018).

Our work instead studies the negative impact of the structural inductive biases in the context of adversarial attacks: when the graph structure is exposed to the attacker, such structural information can turn into the knowledge source for an attack. While most existing attack strategies more-or-less utilize some structural properties of GNNs, they are utilized in a data-driven manner which requires querying the GNN model, e.g., learning to edit the graph via a trial-and-error interaction with the GNN model (Dai et al., 2018). We formally establish connections between the structural properties and attack strategies without any queries to the GNN model.

3 Principled Black-Box Attack Strategies with Limited Node Access

In this section, we derive principled strategies to attack GNNs under the novel black-box setup with limited node access. We first analyze the corresponding white-box attack problem in Section 3.2 and then adapt the theoretical insights from the white-box setup to the black-box setup and propose a black-box attack strategy in Section 3.3. Finally, in Section 3.4, we correct the proposed strategy by taking into account of the diminishing-return effect for the mis-classification rate.

3.1 Preliminary Notations

We first introduce necessary notations. We denote a graph as , where is the set of nodes, and is the set of edges. For a node classification problem, the nodes of the graph are collectively associated with node features and labels , where

is the dimensionality of the feature vectors and

is the number of classes. Each node ’s local neighborhood including itself is denoted as , and its degree as . To ease the notation, for any matrix in this paper, we refer to the transpose of the -th row of the matrix, i.e., .

GNN models. Given the graph , a GNN model is a function that maps the node features

to output logits of each node. We denote the output logits of all nodes as a matrix

and . A GNN is usually built by stacking a certain number () of layers, with the -th layer, , taking the following form:



is the hidden representation of nodes with

dimensions, output by the -th layer;

is a learnable linear transformation matrix;

is an element-wise nonlinear activation function; and different GNNs have different normalization terms

. For instance, or in Graph Convolutional Networks (GCN) (Kipf and Welling, 2016). In addition, and .

Random walks. A random walk (Lovász et al., 1993) on

is specified by the matrix of transition probabilities,

, where

Each represents the probability of transiting from to at any given step of the random walk. And powering the transition matrix by gives us the -step transition matrix .

3.2 White-Box Adversarial Attacks with Limited Node Access

Problem formulation. Given a classification loss , the problem of white-box attack with limited node access can be formulated as an optimization problem as follows:

subject to

where respectively specify the maximum number of nodes and the maximum degree of nodes that can be attacked. Intuitively, we treat high-degree nodes as a proxy of celebrity accounts in a social network. For simplicity, we have omitted the subscript

of the learned GNN classifier

. The function perturbs the feature matrix based on the selected node set (i.e., attack set). Under the white-box setup, theoretically can also be optimized to maximize the loss. However, as our goal is to study the node selection strategy under the black-box setup, we set as a pre-determined function. In particular, we define the -th row of the output of as , where is a small constant noise vector constructed by attackers’ domain knowledge about the features. In other words, the same small noise vector is added to the features of every attacked node.

We use the Carlili-Wagner loss for our analysis, a close approximation of cross-entropy loss and has been used in the analysis of adversarial attacks on image classifiers (Carlini and Wagner, 2017):


The change of loss under perturbation. Next we investigate how the overall loss changes when we select and perturb different nodes. We define the change of loss when perturbing the node as a function of the perturbed feature vector :

To concretize the analysis, we consider the GCN model with in our following derivations. Suppose is an -layer GCN. With the connection between GCN and random walk (Xu et al., 2018b) and Assumption 1 on the label distribution, we can show that, in expectation, the first-order Taylor approximation is related to the sum of the -th column of the -step random walk transition matrix . We formally summarize this finding in Proposition 1.

Assumption 1 (Label Distribution).

Assume the distribution of the labels of all nodes follows the same constant categorical distribution, i.e.,

where for and . Moreover, since the classifier has been well-trained and fixed, the prediction of should capture certain relationships among the classes. Specifically, we assume the chance for predicting any node as any class , conditioned on the node label , confines to a certain distribution , i.e.,

Proposition 1.

For an -layer GCN model, if Assumption 1 and a technical assumption about the GCN111This is an assumption made by Xu et al. (2018b), which we list as Assumption 5 in Appendix A.1. hold, then

where is a constant independent of .

3.3 Adaptation from the White-Box Setup to the Black-Box Setup

Now we turn to the black-box setup where we have no access to the model parameters or predictions. This means we are no longer able to evaluate the objective function of the optimization problem (2). Proposition 1 shows that the relative ratio of between different nodes only depends on the random walk transition matrix, which we can easily calculate based on the graph . This implies that we can still approximately optimize the problem (2) in the black-box setup.

Node selection with importance scores. Consider the change of loss under the perturbation of a set of nodes . If we write the change of loss as a function of the perturbed features and take the first order Taylor expansion, which we denote as , we have . Therefore is maximized by the set of nodes with degrees less than and the largest possible , where are the limits of node access defined in the problem (2). Therefore, we can define an importance score for each node as the sum of the -th column of , i.e., , and simply select the nodes with the highest importance scores to attack. We denote this strategy as RWCS (Random Walk Column Sum). We note that RWCS is similar to PageRank. The difference between RWCS and PageRank is that the latter uses the stationary transition matrix for a random walk with restart.

Empirically, RWCS indeed significantly increases the classification loss (as shown in Section 4.2). The nonlinear loss actually increases linearly w.r.t. the perturbation strength (the norm of the perturbation noise ) for a wide range, which indicates that is a good approximation of . Surprisingly, RWCS fails to continue to increase the mis-classification rate (which matters more in real applications) when the perturbation strength becomes larger. Details of this empirical finding are shown in Figure 1 in Section 4.2. We conduct additional formal analyses on the mis-classification rate in the following section and find a diminishing-return effect of adding more nodes to the attack set when the perturbation strength is adequate.

3.4 Diminishing-Return of Mis-classification Rate and its Correction

Analysis of the diminishing-return effect. Our analysis is based on the investigation that each target node will be mis-classified as we increase the attack set.

To assist the analysis, we first define the concepts of vulnerable function and vulnerable set below.

Definition 1 (Vulnerable Function).

We define the vulnerable function of a target node as, for a given attack set ,

Definition 2 (Vulnerable Set).

We define the vulnerable set of a target node as a set of all attack sets that could lead to being mis-classified:

We also make the following assumption about the vulnerable function.

Assumption 2.

is non-decreasing for all , i.e., if , then .

With the definitions above, the mis-classification rate can be written as the average of the vulnerable functions: . By Assumption 2, is also clearly non-decreasing.

We further define the basic vulnerable set to characterize the minimal attack sets that can lead a target node to being mis-classified.

Definition 3 (Basic Vulnerable Set).

, we call a basic vulnerable set of if,

  1. [1)]

  2. ; if ;

  3. if , for any nonempty , there exists a s.t. ;

  4. for any distinct , .

And the existence of such a basic vulnerable set is guaranteed by Proposition 2.

Proposition 2.

For any , there exists a unique .

The distribution of the sizes of the element sets of is closely related to the perturbation strength on the features. When the perturbation is small, we may have to perturb multiple nodes before the target node is mis-classified, and thus the element sets of will be large. When perturbation is relatively large, we may be able to turn a target node to be mis-classified by perturbing a single node, if chosen wisely. In this case will have a lot of singleton sets.

Our following analysis (Proposition 3) shows that has a diminishing-return effect if the vulnerable sets of nodes on the graph present homophily (Assumption 3), which is common in real-world networks, and the perturbation on features becomes considerably large (Assumption 4).

Assumption 3 (Homophily).

and , there are nodes s.t., for any node among these nodes, .

Intuitively, the vulnerable sets present strong homophily if ’s are large.

Assumption 4 (Considerable Perturbation).

and if , then there are nodes s.t., for any node among these nodes, there exists a set , , and . And .

Proposition 3.

If Assumptions 3 and 4 hold, is -approximately submodular for some , i.e., there exists a non-decreasing submodular function , s.t. ,

As greedy methods are guaranteed to enjoy a constant approximation ratio for such approximately submodular functions (Horel and Singer, 2016), Proposition 3 motivates us to develop a greedy correction procedure to compensate the diminishing-return effect when calculating the importance scores.

The greedy correction procedure. We propose an iterative node selection procedure and apply two greedy correction steps on top of the RWCS strategy, motivated by Assumption 3 and 4.

To accommodate Assumption 3, after each node is selected into the attack set, we exclude a -hop neighborhood of the selected node for next iteration, for a given constant integer . The intuition is that nodes in a local neighborhood may contribute to similar target nodes due to homophily. To accommodate Assumption 4

, we adopt an adaptive version of RWCS scores. First, we binarize the

-step random walk transition matrix as , i.e.,


where is a given constant integer. Next, we define a new adaptive influence score as a function of a matrix : In the iterative node selection procedure, we initialize as . We select the node with highest score subsequently. After each iteration, suppose we have selected the node in this iteration, we will update by setting to zero for all the rows where the elements of the -th column are . The underlying assumption of this operation is that, adding to the selected set is likely to mis-classify all the target nodes corresponding to the aforementioned rows, which complies Assumption 4. We name this iterative procedure as the GC-RWCS (Greedily Corrected RWCS) strategy, and summarize it in Algorithm 1 in Appendix A.3.

Finally, we want to mention that, while the derivation of RWCS and GC-RWCS requires the knowledge of the number of layers for GCN, we find that the empirical performance of the proposed attack strategies are not sensitive w.r.t. the choice of . Therefore, the proposed methods are applicable to the black-box setup where we do not know the exact of the model.

4 Experiments

4.1 Experiment Setup

GNN models. We evaluate the proposed attack strategies on two common GNN models, GCN (Kipf and Welling, 2016) and JK-Net (Xu et al., 2018b). For JK-Net, we test on its two variants, JKNetConcat and JKNetMaxpool, which apply concatenation and element-wise max at last layer respectively. We set the number of layers for GCN as 2 and the number of layers for both JK-Concat and JK-Maxpool as 7. The hidden size of each layer is 32. For the training, we closely follow the hyper-parameter setup in Xu et al. (2018b).

Datasets. We adopt three citation networks, Citeseer, Cora, and Pubmed, which are standard node classification benchmark datasets (Yang et al., 2016). Following the setup of JK-Net (Xu et al., 2018b), we randomly split each dataset by , , and for training, validation, and testing. And we draw 40 random splits.

Baseline methods for comparison. As we summarized in Section 2.1, our proposed black-box adversarial attack setup is by far the most restricted, and none of existing attack strategies for GNN can be applied. We compare the proposed attack strategies with baseline strategies by selecting nodes with top centrality metrics. We compare with three well-known network metrics capturing different aspects of node centrality: Degree, Betweenness, and PageRank and name the attack strategies correspondingly. In classical network analysis literature (Newman, 2018), real-world networks are shown to be fragile under attacks to high-centrality nodes. Therefore we believe these centrality metrics serve as reasonable baselines under our restricted black-box setup. For the purpose of sanity check, we also include a trivial baseline Random, which randomly selects the nodes to be attacked.

Hyper-parameters for GC-RWCS. For the proposed GC-RWCS strategy, we fix the number of step , the neighbor-hop parameter and the parameter for the binarized in Eq. (4) for all models on all datasets. Note that is different from the number of layers of both GCN and JK-Nets in our experiments. But we achieve effective attack performance. We also conduct a sensitivity analysis in Appendix A.5 and demonstrate the proposed method is not sensitive w.r.t. .

Nuisance parameters of the attack procedure. For each dataset, we fix the limit on the number of nodes to attack, , as of the graph size. After the node selection step, we also need to specify how to perturb the node features, i.e., the design of in function in the optimization problem (2). In a real-world scenario, should be designed with domain knowledge about the classification task, without access to the GNN models. In our experiments, we have to simulate the domain knowledge due to the lack of semantic meaning of each individual feature in the benchmark datasets. Formally, we construct the constant perturbation as follows, for ,


where is the magnitude of modification. We fix for all datasets. While gradients of the model are involved, we emphasize that we only use extremely limited information of the gradients: determining a few number of important features and the binary direction to perturb for each selected feature, only at the global level by averaging gradients on all nodes. We believe such coarse information is usually available from domain knowledge about the classification task. The perturbation magnitude for each feature is fixed as a constant and is irrelevant to the model. In addition, the same perturbation vector is added to the features of all the selected nodes. The construction of the perturbation is totally independent of the selected nodes.

4.2 Experiment Results

Verifying the discrepancy between the loss and the mis-classification rate. We first provide empirical evidence for the discrepancy between classification loss (cross-entropy) and mis-classification rate. We compare the RWCS strategy to baseline strategies with varying perturbation strength as measured by in Eq. (5). The results shown in Figure 1 are obtained by attacking GCN on Citeseer. First, we observe that RWCS increases the classification loss almost linearly as increases, indicating our approximation of the loss by first-order Taylor expansion actually works pretty well in practice. Not surprisingly, RWCS performs very similarly as PageRank. And RWCS performs much better than other centrality metrics in increasing the classification loss, showing the effectiveness of Proposition 1. However, we see the decrease of classification accuracy when attacked by RWCS (and PageRank) quickly saturates as increases. The GC-RWCS strategy that is proposed to correct the importance scores is able to decreases the classification accuracy the most as becomes larger, although it increases the classification loss the least.

(a) Loss on Test Set
(b) Accuracy on Test Set
Figure 1: Experiments of attacking GCN on Citeseer with increasing perturbation strength

. Results are averaged over 40 random trials and error bars indicate standard error of mean.

Full experiment results. We then provide the full experiment results of attacking GCN, JKNetConcat, and JKNetMaxpool on all three datasets in Table 1. The perturbation strength is set as . The thresholds and indicate that we set the limit on the maximum degree as the lowest degree of the top and nodes respectively.

The results clearly demonstrate the effectiveness of the proposed GC-RWCS strategy. GC-RWCS achieves the best attack performance on almost all experiment settings, and the difference to the second-best strategy is significant in almost all cases. It is also worth noting that the proposed GC-RWCS strategy is able to decrease the node classification accuracy by up to , and GC-RWCS achieves a larger decrease of the accuracy than the Random baseline in most cases (see Table 4 in Appendix A.5). And this is achieved by merely adding the same constant perturbation vector to the features of of the nodes in the graph. This verifies that the explicit structural inductive biases of GNN models make them vulnerable even in the extremely restricted black-box attack setup.

Cora Citeseer Pubmed
Method GCN JKNetConcat JKNetMaxpool GCN JKNetConcat JKNetMaxpool GCN JKNetConcat JKNetMaxpool
None 85.6 0.3 86.2 0.2 85.8 0.3 75.1 0.2 72.9 0.3 73.2 0.3 85.7 0.1 85.8 0.1 85.7 0.1
Random 81.3 0.3 68.8 0.8 68.8 1.3 71.3 0.3 60.8 0.8 61.7 0.9 82.0 0.3 75.9 0.7 75.4 0.7
Degree 78.2 0.4 60.7 1.0 59.9 1.5 67.5 0.4 52.5 0.8 53.7 1.0 78.9 0.5 63.4 1.0 63.3 1.2
Pagerank 79.4 0.4 71.6 0.6 70.0 1.0 70.1 0.3 61.5 0.5 62.6 0.6 80.3 0.3 71.3 0.8 71.2 0.8
Betweenness 79.7 0.4 60.5 0.9 60.3 1.6 68.9 0.3 53.5 0.8 55.1 1.0 78.5 0.6 67.1 1.1 66.2 1.1
RWCS 79.5 0.3 71.2 0.5 69.9 1.0 69.9 0.3 60.8 0.6 62.2 0.7 79.8 0.3 70.7 0.8 70.7 0.8
GC-RWCS 78.5 0.5 52.7 1.0* 53.3 1.9* 65.1 0.5* 46.6 0.8* 48.2 1.1* 77.3 0.7 62.1 1.2 60.6 1.4*
Random 82.6 0.4 70.7 1.1 71.8 1.1 72.6 0.3 62.7 0.8 63.9 0.8 82.6 0.2 77.3 0.4 77.4 0.5
Degree 80.7 0.4 64.9 1.4 67.0 1.5 70.4 0.4 56.9 0.8 58.7 0.9 81.5 0.4 72.4 0.7 72.3 0.7
Pagerank 82.6 0.3 79.6 0.4 79.7 0.4 72.9 0.2 70.2 0.3 70.3 0.3 83.0 0.2 79.3 0.3 79.6 0.3
Betweenness 81.8 0.4 64.1 1.3 65.9 1.4 70.7 0.3 56.3 0.8 58.3 0.9 81.3 0.3 74.1 0.5 74.6 0.5
RWCS 82.8 0.3 79.3 0.5 79.5 0.4 72.9 0.2 69.8 0.3 70.1 0.3 82.1 0.2 77.8 0.3 78.4 0.3
GC-RWCS 80.7 0.5 59.1 1.6* 61.1 1.6* 67.8 0.5* 49.0 0.9* 50.7 1.1* 80.3 0.5* 69.2 0.7* 70.0 0.7*
Table 1: Summary of the attack performance. The lower the accuracy (in ) the better the attacks. The bold

marker denotes the best performance. The asterisk (*) means the difference between the best strategy and the second-best strategy is statistically significant by a t-test at significance level 0.05. The error bar (

) denotes the standard error of the mean by 40 independent trials.

5 Conclusion

In this paper, we propose a novel black-box adversarial attack setup for GNN models with constraint of limited node access, which we believe is by far the most restricted and realistic black-box attack setup. Nonetheless, through both theoretical analyses and empirical experiments, we demonstrate that the strong and explicit structural inductive biases of GNN models make them still vulnerable to this type of adversarial attacks. We also propose a principled attack strategy, GC-RWCS, based on our theoretical analyses on the connection between the GCN model and random walk, which corrects the diminishing-return effect of the mis-classification rate. Our experimental results show that the proposed strategy significantly outperforms competing attack strategies under the same setup.

Broader Impact

For the potential positive impacts, we anticipate that the work may raise the public attention about the security and accountability issues of graph-based machine learning techniques, especially when they are applied to real-world social networks. Even without accessing any information about the model training, the graph structure alone can be exploited to damage a deep learning framework with a rather executable strategy.

On the potential negative side, as our work demonstrates that there is a chance to attack existing GNN models effectively without any knowledge but a simple graph structure, this may expose a serious alert to technology companies who maintain the platforms and operate various applications based on the graphs. However, we believe making this security concern transparent can help practitioners detect potential attack in this form and better defend the machine learning driven applications.


Appendix A Appendix

a.1 Proof of Proposition 1

We first remind the reader for some notations, a GCN model is denoted as a function , the feature matrix is , and the output logits . The -step random walk transition matrix is . More details can be found in in Section 3.1

We give in Lemma 1 the connection between GCN models and random walks. Lemma 1 relies on a technical assumption about the GCN model (Assumption 5) and the proof can be found in Xu et al. (2018b).

Assumption 5 (Xu et al. (2018b)).

All paths in the computation graph of the given GCN model are independently activated with the same probability of success .

Lemma 1.

(Xu et al. (2018b).) Given an -layer GCN with averaging as in Eq. 1, assume that all path in the computation graph of the model are activated with the same probability of success (Assumption 5). Then, for any node ,


where is the learnable parameter at -th layer.

Then we are able to prove Proposition 1 below.


First, we derive the gradient of the loss w.r.t. the feature of node ,


where is the th row of but being transposed as column vectors and is the true label of node . Note that , and .

Next, we plug Eq. 7 into . For simplicity, We write as in the rest of the proof.


Denote . From the definition of loss

we have

for . Under Assumption 1, the expectation of each element of is

which is a constant independent of and . Therefore, we can write

where is a constant vector independent of .

Taking expectation of Eq. (8) and plug in the result of Lemma 1,

where is a constant scalar independent of . ∎

a.2 Proofs for Propositions in Section 3.4

Proof of Proposition 2.


If , so . The three conditions of Definition 3 are also trivially true. Below we investigate the case .

The existence can be given by a constructive proof. We check the nonempty elements in one by one with any order. If this element is a super set of any other element in , we skip it. Otherwise, we put it into . Then we verify that the resulted is a basic vulnerable set for . . For condition 1), clearly, and if , all nonempty elements in are skipped so . For condition 2), given , for any nonempty , if , the condition holds. If , by construction, there exists a nonempty strict subset and . If , the condition holds. If , we can similarly find a nonempty strict subset and . Recursively, we can get a series . As is finite, we will have a set that no longer has strict subset so . Therefore the condition holds. Condition 3) means any set in is not a subset of another set in . This condition holds by construction.

Now we prove the uniqueness. Suppose there are two distinct basic vulnerable sets . Without loss of generality, we assume but . so . Further , hence . As , , and satisfies condition 2), there will be a nonempty s.t. . If , then condition 3) is violated for . If , there will be a nonempty s.t. . But also violates condition 3). By contradiction we prove the uniqueness. ∎

In order to prove Proposition 3, we first would like to construct a submodular function that is close to , with the help of Lemma 2 below.

Lemma 2.

If , is either empty or only contains singleton sets, then is submodular.


We first prove the case when .

First, we show that , if , for any nonempty if and only if or . On one hand, if , then . If . If , by condition 2) of the basic vulnerable set, . On the other hand, if , , by Assumption 2, , so . If , as , if , the condition 2) of Definition 3 will be violated. Therefore so . Still by Assumption 2, , so .

Define a function s.t. for any node ,

Given is either empty or only contains singleton sets for any , for any nonempty


is a constant independent of . Therefore, maximizing over with is equivalent to maximizing over with , which is a maximum coverage problem. Therefore is submodular.

The case of allowing some nodes to have empty vulnerable sets can be easily proved by removing such nodes in Eq. (9) as their corresponding vulnerable functions always equal to zero. ∎

Proof of Proposition 3. For simplicity, we assume for any . The proof below can be easily adapted to the general case without this assumption, by removing the nodes with empty vulnerable sets similarly as the proof for Lemma 2.


, define . We can then define a new group of vulnerable sets on for . Let

Then it is clear that is a valid basic vulnerable set corresponding to , for . If we define as

we can easily verify that is a valid vulnerable function corresponding to , for . Further let as

By Lemma 2, as is either empty or only contains singleton sets, we know is submodular.

Next we investigate the difference between and . First, for any , if , clearly ; if , it’s easy to show . Second, for any and , by Assumption 3, there are exactly (omitting the in ) nodes whose vulnerable set contains . Without loss of generality, let us assume the indexes of nodes are . Then, for any node , . For node , , and

By Assumption 4, there are at least (omitting the in ) nodes like s.t. . Therefore, and . Hence . ∎

a.3 Algorithm Details of GC-RWCS

We summarize the GC-RWCS strategy in Algorithm 1.

Input: number of nodes limit ; maximum degree limit ; neighbor hops ; binarized transition matrix ; the adaptive influence score function .
Output: the set to be attacked.
1 Initialize the candidate set , and the score matrix ;
2 Initialize ;
3 for  do
4       ;
5       ;
6       ;
7       ;
8       for  do
9             if  is  then
10                   ;
14return ;
Algorithm 1 The GC-RWCS Strategy for Node Selection.

a.4 Additional Experiment Details

Datasets. We adopt the Deep Graph Library (Wang et al., 2019) version of Cora, Citeseer, and Pubmed in our experiments. The summary statistics of the datasets are summarized in Table 2. The number of edges does not include self-loops.

Dataset Nodes Edges Classes Features
Citeseer 3,327 4,552 6 3,703
Cora 2,708 5,278 7 1,433
Pubmed 19,717 44,324 3 500
Table 2: Summary statistics of datasets.

a.5 Additional Experiment Results

In this section, we provide results of more experiment setups and conduct a sensitivity analysis of the hyper-parameter in GC-RWCS in Table 3. We provide a setup of threshold in addition to the and thresholds shown in Section 4.2, to give a better resolution of the results. And the results of threshold are consistent with other setups. We also show the results of GC-RWCS with . Note that GCN has 2 layers and the JK-Nets have 7 layers. The variations of GC-RWCS results with the provided range of are typically within , indicating that the proposed GC-RWCS strategy does not rely on the exact knowledge of number of layers in the GNN models to be effective.

Cora Citeseer Pubmed
Method GCN JKNetConcat JKNetMaxpool GCN JKNetConcat JKNetMaxpool GCN JKNetConcat JKNetMaxpool
None 85.6 0.3 86.2 0.2 85.8 0.3 75.1 0.2 72.9 0.3 73.2 0.3 85.7 0.1 85.8 0.1 85.7 0.1
Threshold 10%
Random 81.3 0.3 68.8 0.8 68.8 1.3 71.3 0.3 60.8 0.8 61.7 0.9 82.0 0.3 75.9 0.7 75.2 0.7
Degree 78.2 0.4 60.7 1.0 59.9 1.5 67.5 0.4 52.5 0.8 53.7 1.0 78.9 0.5 63.4 1.0 63.2 1.2
Pagerank 79.4 0.4 71.6 0.6 70.0 1.0 70.1 0.3 61.5 0.5 62.6 0.6 80.3 0.3 71.3 0.8 71.2 0.8
Betweenness 79.7 0.4 60.5 0.9 60.3 1.6 68.9 0.3 53.5 0.8 55.1 1.0 78.5 0.6 67.1 1.1 66.1 1.1
RWCS 79.4 0.4 71.7 0.5 70.3 0.9 69.9 0.3 62.4 0.4 63.1 0.6 79.8 0.3 70.7 0.8 70.7 0.8
GC-RWCS-3 78.6 0.5 52.1 1.1* 53.0 1.9* 64.8 0.5* 46.4 0.8* 48.2 1.0* 78.1 0.6 62.3 1.2 61.6 1.5
GC-RWCS-4 78.5 0.5 52.7 1.0* 53.3 1.9* 65.1 0.5* 46.6 0.8* 48.2 1.1* 77.3 0.7 62.1 1.2 60.6 1.4*
GC-RWCS-5 78.9 0.5 53.5 1.1* 54.2 1.9* 65.3 0.5* 46.6 0.8* 48.4 1.0* 78.4 0.5 64.2 1.2 62.5 1.4
GC-RWCS-6 78.5 0.5 54.3 1.1* 54.9 1.9* 65.5 0.5* 47.1 0.8 48.9 1.1* 78.0 0.6 63.7 1.1 62.6 1.4
GC-RWCS-7 78.1 0.5 54.2 1.1* 54.8 1.9* 66.1 0.4* 47.5 0.8 49.3 1.1* 78.7 0.5 64.9 1.2 63.3 1.3
Threshold 20%
Random 82.3 0.3 71.7 1.1 69.8 1.1 72.1 0.3 62.1 0.7 62.6 0.9 82.6 0.2 77.9 0.5 77.5 0.5
Degree 79.3 0.4 64.2 1.2 61.6 1.3 69.2 0.4 56.0 0.8 56.4 1.0 80.6 0.4 69.5 0.8 69.4 1.0
Pagerank 80.8 0.3 74.5 0.8 73.0 0.8 72.1 0.3 68.3 0.3 68.2 0.4 82.2 0.2 77.7 0.4 77.8 0.4
Betweenness 80.7 0.4 62.2 1.4 60.1 1.4 70.1 0.4 54.8 0.8 55.8 1.1 80.2 0.4 72.4 0.8 72.0 0.7
RWCS 81.4 0.3 76.8 0.6 76.0 0.6 72.4 0.3 68.9 0.3 69.0 0.4 81.3 0.2 76.0 0.4 76.5 0.4
GC-RWCS-3 79.4 0.5 57.5 1.6* 53.1 1.5* 67.1 0.4* 48.4 0.9* 49.3 1.2* 79.0 0.5* 67.4 0.9* 66.3 1.0*
GC-RWCS-4 79.4 0.5 57.5 1.7* 53.2 1.4* 67.3 0.5* 47.9 0.9* 48.8 1.3* 79.0 0.5* 67.4 1.0* 66.3 1.0*
GC-RWCS-5 79.4 0.5 59.0 1.7* 54.5 1.4* 67.3 0.4* 48.4 0.9* 49.4 1.3* 79.2 0.5* 68.5 0.9 68.1 0.9
GC-RWCS-6 79.5 0.5 59.3 1.7 54.9 1.5* 68.1 0.4* 49.2 0.9* 50.2 1.3* 79.1 0.5* 68.4 0.9 68.5 1.0
GC-RWCS-7 79.4 0.5 59.3 1.6 55.3 1.5* 68.1 0.4* 50.0 0.9* 50.8 1.3* 79.2 0.5* 68.7 0.9 68.2 0.8
Threshold 30%
Random 82.6 0.4 70.7 1.1 71.8 1.1 72.6 0.3 62.7 0.8 63.9 0.8 82.6 0.2 77.3 0.4 77.3 0.5
Degree 80.7 0.4 64.9 1.4 67.0 1.5 70.4 0.4 56.9 0.8 58.7 0.9 81.5 0.4 72.4 0.7 72.1 0.8
Pagerank 82.6 0.3 79.6 0.4 79.7 0.4 72.9 0.2 70.2 0.3 70.3 0.3 83.0 0.2 79.3 0.3 79.5 0.3
Betweenness 81.8 0.4 64.1 1.3 65.9 1.4 70.7 0.3 56.3 0.8 58.3 0.9 81.3 0.3 74.1 0.5 74.5 0.5
RWCS 82.9 0.3 79.7 0.4 80.0 0.4 72.9 0.2 70.2 0.3 70.4 0.3 82.1 0.2 77.8 0.3 78.4 0.3
GC-RWCS-3 80.2 0.6 57.3 1.7* 59.0 1.6* 67.9 0.5* 49.1 0.9* 50.8 1.1* 80.3 0.5* 69.0 0.7* 69.8 0.7*
GC-RWCS-4 80.7 0.5 59.1 1.6* 61.1 1.6* 67.8 0.5* 49.0 0.9* 50.7 1.1* 80.3 0.5* 69.2 0.7* 70.0 0.7*
GC-RWCS-5 80.8 0.5 59.8 1.6* 61.5 1.6* 68.4 0.5* 49.2 0.9* 51.2 1.1* 80.2 0.5* 70.4 0.6* 71.5 0.6
GC-RWCS-6 80.7 0.5 59.8 1.5* 61.4 1.5* 68.5 0.5* 50.5 0.9* 52.2 1.1* 80.2 0.5* 70.5 0.5* 71.6 0.6
GC-RWCS-7 80.7 0.5 60.2 1.5* 61.9 1.5* 68.7 0.5* 50.7 0.9* 52.6 1.1* 80.3 0.4* 70.9 0.5* 71.9 0.6
Table 3: Summary of the accuracy (in ) when . The bold number and the asterisk (*) denotes the same meaning as Table 1. The underline marker denotes the values of GC-RWCS outperforms all the baseline.

Further, we also compare the relative decrease of accuracy between the proposed GC-RWCS strategy () and the Random strategy in Table 4. GC-RWCS is able to decrease the node classification accuracy by up to , and achieves a larger decrease of the accuracy than the Random baseline in most cases. As the GC-RWCS and Random use exactly the same feature perturbation and the node selection step of Random does not include any information of the graph structure, this relative comparison can be roughly viewed as an indicator of the attack effectiveness attributed to the structural inductive biases of the GNN models.

Cora Citeseer Pubmed
Method GCN JKNetConcat JKNetMaxpool GCN JKNetConcat JKNetMaxpool GCN JKNetConcat JKNetMaxpool
Threshold 10%
Random 4.3 17.4 17 3.8 12.1 11.5 3.7 9.9 10.3
GC-RWCS 7.1 33.5 32.5 10.0 26.3 25.0 8.4 23.7 25.1
GC-RWCS/Random 165.12% 192.53% 191.18% 263.16% 217.36% 217.39% 227.03% 239.39% 243.69%
Threshold 30%
Random 3.0 15.5 14 2.5 10.2 9.3 3.1 8.5 8.3
GC-RWCS 4.9 27.1 24.7 7.3 23.9 22.5 5.4 16.6 15.7
GC-RWCS/Random 163.33% 174.84% 176.43% 292.00% 234.31% 241.94% 174.19% 195.29% 189.16%
Table 4: Accuracy decrease (in ) comparison with clean dataset