1 Introduction
Graphs are a powerful representation that can model diverse data from virtually any domain, such as biology (protein interaction networks), chemistry (molecules), or social networks (Facebook). Not surprisingly, machine learning on graph data has a longstanding history, with tasks ranging from node classification, over community detection, to generative modeling.
In this paper, we study node classification, which is an instance of semisupervised classification: given a single (attributed) network and a subset of nodes whose class labels are known (e.g., the topic of a paper in a citation graph), the goal is to infer the classes of the unlabeled nodes. While there exist many classical approaches to node classification (London & Getoor, 2014; Chapelle et al., 2006), recently deep learning on graphs has gained much attention (Monti et al., 2017; Bojchevski & Günnemann, 2018a; Battaglia et al., 2018; Perozzi et al., 2014; Bojchevski et al., 2018; Klicpera et al., 2019). Specifically, graph convolutional approaches (Kipf & Welling, 2017; Pham et al., 2017) have improved the state of the art in node classification.
However, recent works have also shown that such approaches are vulnerable to adversarial attacks both at test time (evasion) as well as training time (poisoning attacks) (Zügner et al., 2018; Dai et al., 2018). A core strength of models using graph convolution – exploiting the information in a node’s neighborhood to improve classification – is also a major vulnerability: because of these propagation effects, an attacker can change a single node’s prediction without even changing any of its attributes or edges. This is because the foundational assumption that all samples are independent of each other does not hold for node classification. Network effects such as homophily (London & Getoor, 2014) support the classification, while on the other hand they enable indirect adversarial attacks.
So far, all existing attacks on node classification models are targeted, that is, aim to provoke misclassification of a specific single node, e.g. a person in a social network. In this work, we propose the first algorithm for poisoning attacks that is able to compromise the global node classification performance of a model. We show that even under restrictive attack settings and without access to the target classifier, our attacks can render it nearuseless for use in production (i.e., on test data).
Our approach is based on the principle of meta learning, which has traditionally been used for hyperparameter optimization (Bengio, 2000), or, more recently, fewshot learning (Finn et al., 2017). In essence, we turn the gradientbased optimization procedure of deep learning models upside down and treat the input data – the graph at hand – as a hyperparameter to learn.
2 Related Work
Adversarial attacks on machine learning models have been studied both in the machine learning and security community and for many different model types (Mei & Zhu, 2015)
. It is important to distinguish attacks from outliers; while the latter naturally occur in graphs
(Bojchevski & Günnemann, 2018), adversarial examples are deliberately created with the goal to mislead machine learning models and often designed to be unnoticeable. Deep neural networks are highly sensitive to these small adversarial perturbations to the data (Szegedy et al., 2014; Goodfellow et al., 2015). The vast majority of attacks and defenses assume the data instances to be independent and continuous. This assumption clearly does not hold for node classification and many other tasks on graphs.Works on adversarial attacks for graph learning tasks are generally sparse. Chen et al. (2017) have measured the changes in the resulting graph clustering when injecting noise to a bipartite graph that represent DNS queries. However, their focus is not on generating attacks in a principled way. Torkamani & Lowd (2013) consider adversarial noise in the node features in order to improve robustness of collective classification via associative Markov networks.
Only recently researchers have started to study adversarial attacks on deep learning for graphs. Dai et al. (2018) consider testtime (i.e., evasion) attacks on graph classification (i.e., classification of graphs themselves) and node classification. However, they do not consider poisoning (i.e., trainingtime) attacks or evaluate transferability of their attacks, and restrict the attacks to edge deletions only. Moreover, they focus on targeted attacks, i.e. attacks designed to change the prediction of a single node. Zügner et al. (2018) consider both testtime and trainingtime attacks on node classification models. They circumvent explicitly tackling the bilevel optimization problem underlying poisoning attacks by performing their attacks based on a (static) surrogate model and evaluating their impact by training a classifier on the data modified by their algorithm. In contrast to Dai et al. (2018)
, their attacks can both insert and remove edges, as well as modify node attributes in the form of binary vectors. Again, their algorithm is suited only to targeted attacks on single nodes; the problem of trainingtime attacks on the overall performance of node classification models remains unexplored.
Bojchevski & Günnemann (2018b)propose poisoning attacks on a different task: unsupervised node representation learning (or node embeddings). They exploit perturbation theory to maximize the loss obtained after training DeepWalk. In this work, we focus on semisupervised learning.
Metalearning (Thrun & Pratt, 1998; Naik & Mammone, 1992), or learning to learn, is the task of optimizing the learning algorithm itself; e.g., by optimizing the hyperparameters Bengio (2000), learning to update the parameters of a neural network (Schmidhuber, 1992; Bengio et al., 1992)
, or the activation function of a model
(Agostinelli et al., 2014). Gradientbased hyperparameter optimization works by differentiating the training phase of a model to obtain the gradients w.r.t. the hyperparameters to optimize.The key idea of this work is to use metalearning for the opposite: modifying the training data to worsen the performance after training (i.e., trainingtime or poisoning attacks). MuñozGonzález et al. (2017) demonstrate that meta learning can indeed be used to create trainingtime attacks on simple, linear classification models. On continuous data, they report little success when attacking deep neural networks, and on discrete datasets, they do not consider deep learning models or problems with more than two classes. Like most works on adversarial attacks, they assume independent data instances. In this work, for the first time, we propose an algorithm for global attacks on (deep) node classification models at training time. In contrast to Zügner et al. (2018), we explicitly tackle the bilevel optimization problem of poisoning attacks using meta learning.
3 Problem Formulation
We consider the task of (semisupervised) node classification. Given a single (attributed) graph and a set of labeled nodes, the goal is to infer the class labels of the unlabeled nodes. Formally, let be an attributed graph with adjacency matrix and node attribute matrix , where is the number of nodes and the dimension of the node feature vectors. W.l.o.g., we assume the node IDs to be .
Given the set of labeled nodes , where nodes are assigned exactly one class in , the goal is to learn a function , which maps each node to exactly one of the classes in (or in a probabilistic formulation: to the Ksimplex). Note that this is an instance of transductive learning, since all test samples (i.e., the unlabeled nodes) as well as their attributes and edges (but not their class labels!) are known and used during training (Chapelle et al., 2006). The parameters of the function
are generally learned by minimizing a loss function
(e.g. crossentropy) on the labeled training nodes:(1) 
where we overload the notation of to indicate that we feed in the whole graph .
3.1 Attack Model
Adversarial attacks are small deliberate perturbations of data samples in order to achieve the outcome desired by the attacker when applied to the machine learning model at hand. The attacker is constrained in the knowledge they have about the data and the model they attack, as well as the adversarial perturbations they can perform.
Attacker’s goal. In our work, the attacker’s goal is to increase the misclassification rate (i.e., one minus the accuracy) of a node classification algorithm achieved after training on the data (i.e., graph) modified by our algorithm. In contrast to Zügner et al. (2018) and Dai et al. (2018), our algorithm is designed for global attacks reducing the overall classification performance of a model. That is, the goal is to have the test samples classified as any class different from the true class.
Attacker’s knowledge. The attacker can have different levels of knowledge about the training data, i.e. the graph , the target machine learning model , and the trained model parameters . In our work, we focus on limitedknowledge attacks where the attacker has no knowledge about the classification model and its trained weights, but the same knowledge about the data as the classifier. In other words, the attacker can observe all nodes’ attributes, the graph structure, as well as the labels of the subset and uses a surrogate model to modify the data. Besides assuming knowledge about the full data, we also perform experiments where only a subset of the data is given. Afterwards, this modified data is used to train deep neural networks to degrade their performance.
Attacker’s capability. In order to be effective and remain undiscovered, adversarial attacks should be unnoticeable. To account for this, we largely follow Zügner et al. (2018)’s attacker capabilities. First, we impose a budget constraint on the attacks, i.e. limit the number of changes (here we have since we assume the graph to be symmetric). Furthermore, we make sure that no node becomes disconnected (i.e. a singleton) during the attack. One of the most fundamental properties of a graph is its degree distribution. Any significant changes to it are very likely to be noticed; to prevent such large changes to the degree distribution, we employ Zügner et al. (2018)’s unnoticeability constraint on the degree distribution. Essentially, it ensures that the graph’s degree distribution can only marginally be modified by the attacker. The authors also derive an efficient way to check for violations of the constraint so that it adds only minimal computational overhead to the attacks. While in this work we focus on changing the graph structure only, our algorithm can easily be modified to change the node features as well. We summarize all these constraints and denote the set of admissible perturbations on the data as , where is the graph at hand.
3.2 Overall Goal
Poisoning attacks can be mathematically formulated as a bilevel optimization problem:
(2) 
is the loss function the attacker aims to optimize. In our case of global and unspecific (regarding the type of misclassification) attacks, the attacker tries to decrease the generalization performance of the model on the unlabeled nodes. Since the test data’s labels are not available, we cannot directly optimize this loss. One way to approach this is to maximize the loss on the labeled (training) nodes , arguing that if a model has a high training error, it is very likely to also generalize poorly (the opposite is not true; when overfitting on the training data, a high generalization loss can correspond to a low training loss). Thus, our first attack option is to choose .
Recall that semisupervised node classification is an instance of transductive learning: all data samples (i.e., nodes) and their attributes are known at training time (but not all labels!). We can use this insight to obtain a second variant of
. The attacker can learn a model on the labeled data to estimate the labels
of the unlabeled nodes . The attacker can now perform selflearning, i.e. use these predicted labels and compute the loss of a model on the unlabeled nodes, yielding our second option where . Note that, at all times, only the labels of the labeled nodes are used for training; is only used to estimate the generalization loss after training. In our experimental evaluation, we compare both versions of outlined above.Importantly, notice the bilevel nature of the problem formulation in Eq. (2): the attacker aims to maximize the classification loss achieved after optimizing the model parameters on the modified (poisoned) graph . Optimizing such a bilevel problem is highly challenging by itself. Even worse, in our graph setting the data and the action space of the attacker are discrete: the graph structure is , and the possible actions are edge insertions and deletions. This makes the problem even more difficult in two ways. First, the action space is vast; given a budget of perturbations, the number of possible attacks is, ignoring symmetry, and thus in ; exhaustive search is clearly infeasible. Second, a discrete data domain means that we cannot use gradientbased methods such as gradient descent to make small (realvalued) updates on the data to optimize a loss.
4 Graph Structure Poisoning via MetaLearning
4.1 Poisoning via Metagradients
In this work, we tackle the bilevel problem described in Eq. (2) using metagradients, which have traditionally been used in metalearning. The field of metalearning (or learning to learn) tries to make the process of learning machine learning models more time and/or data efficient, e.g. by finding suitable hyperparameter configurations (Bengio, 2000) or initial weights that enable rapid adaptation to new tasks or domains in fewshot learning (Finn et al., 2017).
Metagradients (e.g., gradients w.r.t. hyperparameters) are obtained by backpropagating
through the learning phase of a differentiable model (typically a neural network). The core idea behind our adversarial attack algorithm is to treat the graph structure matrix as a hyperparameter and compute the gradient of the attacker’s loss after training with respect to it:(3) 
where is a differentiable optimization procedure (e.g. gradient descent and its stochastic variants) and the training loss. Notice the similarity of the metagradient to the bilevel formulation in Eq. (2); the metagradient indicates how the attacker loss after training will change for small perturbations on the data, which is exactly what a poisoning attacker needs to know.
As an illustration, consider an example where we instantiate with vanilla gradient descent with learning rate starting from some intial parameters
(4) 
The attacker’s loss after training for steps is . The metagradient can be expressed by unrolling the training procedure:
(5) 
Note that the parameters itself depend on the graph (see Eq. 4); they are not fixed. Thus, the derivative w.r.t. the graph has to be taken into account, chaining back until . Given this, the attacker can use the metagradient to perform a meta update on the data to minimize :
(6) 
The final poisoned data is obtained after performing meta updates. A straightforward way to instantiate is (meta) gradient descent with some step size : .
It has to be noted that such a gradientbased update rule is neither possible nor wellsuited for problems with discrete data (such as graphs). Due to the discreteness, the gradients are not defined. Thus, in our approach we simply relax the data’s discreteness condition. However, we still perform discrete updates (actions) since the above simple gradient update would lead to dense (and continuous) adjacency matrices; not desired and not efficient to handle. Thus, in the following section, we propose a greedy approach to preserve the data’s sparsity and discreteness.
4.2 Greedy Poisoning Attacks via Meta Gradients
We assume that the attacker does not have access to the target classifier’s parameters, outputs, or even knowledge about its architecture; the attacker thus uses a surrogate model to perform the poisoning attacks. Afterwards the poisoned data is used to train deep learning models for node classification (e.g. a GCN) to evaluate the performance degradation due to the attack. We use the same surrogate model as Zügner et al. (2018), which is a linearized twolayer graph convolutional network:
(7) 
where , , is the adjacency matrix, are the node features, the diagonal matrix of the node degrees, and the set of learnable parameters. In contrast to Zügner et al. (2018)
we do not linearize the output (softmax) layer.
Note that we only perform changes to the graph structure , hence we treat the node attributes as a constant during our attacks. For clarity, we replace with in the meta gradient formulation.
We define a score function that assigns each possible action a numerical value indicating its (estimated) impact on the attacker objective . Given the metagradient for a node pair , we define where is the entry at position in the adjacency matrix . We essentially flip the sign of the metagradients for connected node pairs as this yields the gradient for a change in the negative direction (i.e., removing the edge).
We greedily pick the perturbation with the highest score one at a time
(8) 
where ensures that we only perform changes compliant with our attack constraints (e.g., unnoticeability). The meta update function inserts the edge by setting if nodes are currently not connected and otherwise deletes the edge by setting .
4.3 Approximating MetaGradients
Computing the meta gradients is expensive both from a computational and a memory pointofview. To alleviate this issue, Finn et al. (2017) propose a firstorder approximation, leading to
(9) 
We denote by the parameters at time independent of the data (and ), i.e. ; the gradient is thus not propagated through . This corresponds to taking the gradient of the attack loss w.r.t. the data, after training the model for steps. We compare against this baseline in our experiments; as also done in Zügner et al. (2018). However, unlike the metagradient, this approximation completely disregards the training dynamics.
Nichol & Schulman (2018)
propose a heuristic of the meta gradient in which they update the initial weights
on a line towards the local optimum to achieve faster convergence in a multitask learning setting: . Again, they assume to be independent of . While there is no direct connection to the formulation of the meta gradient in Eq. (5), there is an intuition behind it: the heuristic meta gradient is the direction, in which, on average, we have observed the strongest increase in the training loss during the training procedure. The authors’ experimental evaluation further indicates that this heuristic achieves similar results as the meta gradient while being much more efficient to compute (see Appendix C for a discussion on complexity).CoraML  Citeseer  

GCN  CLN  GCN  CLN  
Clean  
AMetaTrain  
AMetaSelf  
AMetaBoth 
Adapted to our adversarial attack setting on graphs, we get . We can view this as a heuristic of the meta gradient when . Likewise, again taking the transductive learning setting into account, we can use selflearning to estimate the loss on the unlabeled nodes, replacing by . Indeed, we combine these two views
(10) 
where can be used to weight the two objectives. This approximation has a much smaller memory footprint than the exact meta gradient since we don’t have to store the whole training trajectory in memory; additionally, there there are no secondorder derivatives to be computed. A summary of our algorithm can be found in Appendix A.
5 Experiments
Setup. We evaluate our approach on the wellknown Citeseer (Sen et al., 2008), CoraML (McCallum et al., 2000), and PolBlogs (Adamic & Glance, 2005) datasets; an overview is given in Table 4. We split the datasets into labeled (10%) and unlabeled (90%) nodes. The labels of the unlabeled nodes are never visible to the attacker or the classifiers and are only used to evaluate the generalization performance of the models. Our code is available at https://www.kdd.in.tum.de/gnnmetaattack.
We evaluate the transferability of adversarial attacks by training deep node classification models on the modified (poisoned) data. For this purpose, we use Graph Convolutional Networks (GCN) (Kipf & Welling, 2017) and Column Networks (CLN) (Pham et al., 2017)
. Both are models utilizing the message passing framework (a.k.a. graph convolution) and trained in a semisupervised way. We further evaluate the node classification performance achieved by training a standard logistic regression model on the node embeddings learned by DeepWalk
(Perozzi et al., 2014). DeepWalk itself is trained in an unsupervised way and without node attributes or graph convolutions; thus, this is arguably an even more difficult transfer task.We repeat all of our attacks on five different splits of labeled/unlabeled nodes and train all target classifiers ten times per attack (using the split that was used to create the attack). In our tables, the uncertainty indicates confidence intervals of the mean obtained via bootstrapping. For our metagradient approaches, we compute the metagradient by using gradient descent with momentum for 100 iterations. We refer to our metagradient approach with selftraining as MetaSelf and to the variant without selftraining as MetaTrain. Similarly, we refer to our approximations as AMetaSelf (with ), AMetaTrain (), and AMetaBoth ().
Comparing metagradient heuristics. First, we analyze the different meta gradient heuristics described in Section 4.3. The results can be seen in Table 1. All principles successfully increase the misclassification rate (i.e., on unlabeled nodes) obtained on the test data, compared to the results obtained with the unperturbed graph. Since AMetaSelf consistently shows a weaker performance than AMetaBoth, we do not further consider AMetaSelf in the following.
Cora  Citeseer  PolBlogs  Avg.  
Attack  GCN  CLN  DeepWalk  GCN  CLN  DeepWalk  GCN  CLN  DeepWalk  rank 
Clean  
DICE  
Firstorder  
Nettack                
AMetaTrain  
AMetaBoth  
MetaTrain  
MetaSelf  
Meta w/ Oracle  
Did not finish within three days on CoraML and PolBlogs 
heightadjust=object
Comparison with competing methods. We compare our metagradient approach as well as its approximations with various baselines and Nettack (Zügner et al., 2018). DICE (‘delete internally, connect externally’) is a baseline where, for each perturbation, we randomly choose whether to insert or remove an edge. Edges are only removed between nodes from the same class, and only inserted between nodes from different classes. This baseline has all true class labels (train and test) available and thus more knowledge than all competing methods. Firstorder refers to the approximation proposed by Finn et al. (2017), i.e. ignoring all secondorder derivatives. Note that Nettack is not designed for global attacks. In order to be able to compare to them, for each perturbation we randomly select one target node from the unlabeled nodes and attack it using Nettack while considering all nodes in the network. In this case, its time and memory complexity is and thus it was not feasible to run it on any but the sparsest dataset. Meta w/ Oracle corresponds to our metagradient approach when supplied with all true class labels on the test data – this only serves as a reference point since it cannot be carried out in real scenarios where the test nodes’ labels are unknown. For all methods, we enforce the unnoticeability constraint introduced by Zügner et al. (2018), which ensures that the graph’s degree distribution changes only slightly. In Appendix D we show that the unnoticeability constraint does not significantly limit the impact of our attacks.
In Table 2 we see the misclassification rates (i.e., 1  accuracy on unlabeled nodes) achieved by changing 5% of edges according to the different methods (larger is better, except for the average rank). That is, each method is allowed to modify 5% of , i.e. the number of edges present in the graph before the attack. We present similar tables for 1% and 10% changes in Appendix F. Our metagradient with selftraining (MetaSelf) produces the strongest drop in performance across all models and datasets as indicated by the average rank. Changing only 5% of the edges leads to a relative increase of up to 48% in the misclassification rate of GCN on CoraML.
Remarkably, our memory efficient metagradient approximations lead to strong increases in misclassifications as well. They outperform both baselines and are in many cases even on par with the more expensive metagradient. In Appendix F, Table 9 we also show that using only training iterations of the surrogate models for computing the meta gradient (or its approximations) can significantly hurt the performance across models and datasets. Moreover, in Table 6 in Appendix F we show that our heuristic is successful at attacking a dataset with roughly 20K nodes.
While the focus of our work is poisoning attacks by modifying the graph structure, our method can be applied to node feature attacks as well. In Appendix E we show a proof of concept that our attacks are also effective when attacking by perturbing both node features and the graph structure.
In Fig. 2 we see the drop in classification performance of GCN on CoraML for increasing numbers of edge insertions/deletions (similar plots for the remaining datasets and models are provided in Appendix F). MetaSelf is even able to reduce the classification accuracy below 50%. Fig. 2 shows the classification accuracy of GCN and CLN as well as a baseline operating on the node attributes only, i.e. ignoring the graph. Not surprisingly, deep models achieve higher accuracy than the baseline when trained on the clean Citeseer graph – exploiting the network information improves classification. However, by only perturbing 5% of the edges, we obtain the opposite: GCN and CLN perform worse than the baseline – the graph structure now hurts classification.
Impact of graph structure and trained weights. Another interesting property of our attacks can be seen in Table 4, where and correspond to the weights trained on the clean CoraML network and a version poisoned by our algorithm (here with even modified edges), respectively. Note that the classification accuracy barely changes when modifying the underlying network for a given set of trained weights; even when applying the clean weights on the highly corrupted , the performance drops only marginally. Likewise, even the clean graph only leads to a low accuracy when using it with the weights . This result emphasizes the importance of the training procedure for the performance of graph models and shows that our poisoning attack works by derailing the training procedure from the start, i.e. leading to ‘bad’ weights.
DEL  15.3  3.9 

INS  9.4  71.4 
Analysis of attacks. An interesting question to ask is why the adversarial changes created by our metagradient approach are so destructive, and what patterns they follow. If we can find out what makes an edge insertion or deletion a strong adversarial change, we can circumvent expensive metagradient computations or even use this knowledge to detect adversarial attacks.
In Fig. 5 we compare edges inserted by our metagradient approach to the edges originally present in the CoraML network. Fig. 5 (a) shows the shortest path lengths between nodes pairs before being connected by adversarially inserted edges vs. shortest path lengths between all nodes in the original graph. In Fig. 5 (b) we compare the edge betweenness centrality () of adversarially inserted edges to the centrality of edges present in the original graph. In (c) we see the node degree distributions of the original graph and the node degrees of the nodes that are picked for adversarial edges. For all three measures no clear distinction can be made. There is a slight tendency for the algorithm to connect nodes that have higherthanaverage shortest paths and low degrees, though.
As we can see in Table 3, roughly 80% of our meta attack’s perturbations are edge insertions (INS). As expected by the homophily assumption, in most cases edges inserted connect nodes from different classes and edges deleted connect sameclass nodes. However, as the comparison with the DICE baseline shows, this by itself can also not explain the destructive performance of the metagradient.
Limited knowledge about the graph structure. In the experiments described above, the attacker has full knowledge about the graph structure and all node attributes (as typical in a transductive setting). We also tested our algorithm on a subgraph of CoraML and Citeseer. That is, we select the 10% labeled nodes and randomly select neighbors of these until we have a subgraph with number of nodes . We run our attacks on this small subgraph, and afterwards plug in the perturbations into the original graphs to train GCN and CLN as before. Table 4 summarizes the results: Even in this highly restricted setting, our attacks consistently increase misclassification rate across datasets and models, highlighting the effectiveness of our method.
6 Conclusion
We propose an algorithm for trainingtime adversarial attacks on (attributed) graphs, focusing on the task of node classification. We use metagradients to solve the bilevel optimization problem underlying the challenging class of poisoning adversarial attacks. Our experiments show that attacks created using our metagradient approach consistently lead to a strong decrease in classification performance of graph convolutional models and even transfer to unsupervised models. Remarkably, even small perturbations to a graph based on our approach can lead to graph neural networks performing worse than a baseline ignoring all relational information. We further propose approximations of the metagradients that are less expensive to compute and, in many cases, have a similarly destructive impact on the training of node classification models. While we are able to show small statistical differences of adversarial and ‘normal’ edges, it is still an open question what makes the edges inserted/removed by our algorithm so destructive, which could then be used to detect or defend against attacks.
Acknowledgements
This research was supported by the German Research Foundation, grant GU 1409/21.
References
 Adamic & Glance (2005) Lada A Adamic and Natalie Glance. The political blogosphere and the 2004 US election: divided they blog. In International workshop on Link discovery, pp. 36–43, 2005.
 Agostinelli et al. (2014) Forest Agostinelli, Matthew Hoffman, Peter Sadowski, and Pierre Baldi. Learning activation functions to improve deep neural networks. arXiv preprint arXiv:1412.6830, 2014.
 Battaglia et al. (2018) Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro SanchezGonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
 Bengio et al. (1992) Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan Gecsei. On the optimization of a synaptic learning rule. In Preprints Conf. Optimality in Artificial and Biological Neural Networks, pp. 6–8. Univ. of Texas, 1992.
 Bengio (2000) Yoshua Bengio. Gradientbased optimization of hyperparameters. Neural computation, 12(8):1889–1900, 2000.
 Bojchevski & Günnemann (2018a) Aleksandar Bojchevski and Stephan Günnemann. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. In ICLR, 2018a.
 Bojchevski & Günnemann (2018b) Aleksandar Bojchevski and Stephan Günnemann. Adversarial attacks on node embeddings. arXiv preprint arXiv:1809.01093, 2018b.
 Bojchevski & Günnemann (2018) Aleksandar Bojchevski and Stephan Günnemann. Bayesian robust attributed graph clustering: Joint learning of partial anomalies and group structure. In AAAI, pp. 2738–2745, 2018.
 Bojchevski et al. (2018) Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. NetGAN: Generating graphs via random walks. In ICML, 2018.
 Chapelle et al. (2006) Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. SemiSupervised Learning. Adaptive Computation and Machine Learning series. The MIT Press, 2006.
 Chen et al. (2017) Yizheng Chen, Yacin Nadji, Athanasios Kountouras, Fabian Monrose, Roberto Perdisci, Manos Antonakakis, and Nikolaos Vasiloglou. Practical attacks against graphbased clustering. arXiv preprint arXiv:1708.09056, 2017.
 Dai et al. (2018) Hanjun Dai, Hui Li, Tian Tian, Xin Huang, Lin Wang, Jun Zhu, and Le Song. Adversarial attack on graph structured data. In ICML, 2018.
 Finn et al. (2017) Chelsea Finn, Pieter Abbeel, and Sergey Levine. Modelagnostic metalearning for fast adaptation of deep networks. In ICML, 2017.
 Goodfellow et al. (2015) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
 Kipf & Welling (2017) Thomas N Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
 Klicpera et al. (2019) Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Predict then propagate: Graph neural networks meet personalized pagerank. In International Conference on Learning Representations (ICLR), 2019.
 London & Getoor (2014) Ben London and Lise Getoor. Collective classification of network data. Data Classification: Algorithms and Applications, 399, 2014.
 McCallum et al. (2000) Andrew Kachites McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. Automating the construction of internet portals with machine learning. Information Retrieval, 3(2):127–163, 2000.
 Mei & Zhu (2015) Shike Mei and Xiaojin Zhu. Using machine teaching to identify optimal trainingset attacks on machine learners. In AAAI, pp. 2871–2877, 2015.
 Monti et al. (2017) Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In CVPR, volume 1, pp. 3, 2017.

MuñozGonzález et al. (2017)
Luis MuñozGonzález, Battista Biggio, Ambra Demontis, Andrea Paudice,
Vasin Wongrassamee, Emil C Lupu, and Fabio Roli.
Towards poisoning of deep learning algorithms with backgradient
optimization.
In
10th ACM Workshop on Artificial Intelligence and Security
, pp. 27–38, 2017.  Naik & Mammone (1992) Devang K Naik and RJ Mammone. Metaneural networks that learn by learning. In IJCNN, pp. 437–442, 1992.
 Nichol & Schulman (2018) Alex Nichol and John Schulman. On firstorder metalearning algorithms. arXiv preprint arXiv:1803.02999, 2018.
 Perozzi et al. (2014) Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In SIGKDD, pp. 701–710, 2014.
 Pham et al. (2017) Trang Pham, Truyen Tran, Dinh Q. Phung, and Svetha Venkatesh. Column networks for collective classification. In AAAI, pp. 2485–2491, 2017.
 Schmidhuber (1992) Jürgen Schmidhuber. Learning to control fastweight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131–139, 1992.
 Sen et al. (2008) Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina EliassiRad. Collective classification in network data. AI magazine, 29(3):93, 2008.
 Szegedy et al. (2014) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Google Inc, Joan Bruna, Dumitru Erhan, Google Inc, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014.
 Thrun & Pratt (1998) Sebastian Thrun and Lorien Pratt. Learning to learn: Introduction and overview. In Learning to learn, pp. 3–17. Springer, 1998.
 Torkamani & Lowd (2013) Mohamad Ali Torkamani and Daniel Lowd. Convex adversarial collective classification. In ICML, pp. 642–650, 2013.
 Zügner et al. (2018) Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. Adversarial attacks on neural networks for graph data. In SIGKDD, pp. 2847–2856, 2018.
Appendix A Algorithm
Appendix B Dataset statistics
Dataset  

CoraML  2,810  7,981  2,879  7 
Citeseer  2,110  3,757  3,703  6 
PolBlogs  1,222  16,714    2 
Pubmed  19,717  44,324  500  3 
Appendix C Complexity analysis
In our attack we handle both edge insertions and deletions, i.e. each element in the adjacency matrix can be changed. This means that without further optimization, the (approximate) meta gradient for each node pair has to be computed, leading to a baseline memory and computational complexity of . For the meta gradient computation we additionally have to store the entire weight trajectory during training, adding to the memory cost, where is the number of inner training steps and the number of weights. Thus, memory complexity of our meta gradient attack is . The secondorder derivatives at each step in the meta gradient formulation can be computed in using Hessianvector products, leading to a computational complexity of .
For the meta gradient heuristics, the computational complexity is similar since we have to evaluate the gradient w.r.t. the adjacency matrix at every training step. However, the training trajectory of the weights does not have to be kept in memory, yielding a memory complexity of . This is highly beneficial, as memory (especially on GPUs) is limited.
The computational and memory complexity of our adversarial attacks implies that (asis) it can be executed for graphs with roughly nodes using a commodity GPU. The complexity, however, can be drastically reduced by prefiltering the elements in the adjacency matrix for which the (meta) gradient needs to be computed, since only a fraction of entries in the adjacency matrix are promising candidate perturbations. We leave such performance optimization for future work.
Appendix D Unnoticeability constraint
In all our experiments, we enforce the unnoticeability constraint on the degree distribution proposed by (Zügner et al., 2018). In Fig. 6 we show that this constraint does not significantly limit the destructive performance of our attacks. Thus we conclude that these constraints should always be enforced, since they improve unnoticeability while at the same time our attacks remain effective.
Appendix E Attacks with changes to the node features
While the focus of our work is poisoning attacks by modifying the graph structure, our method can be applied to node feature attacks as well. The most straightforward case is when the node features are binary, since then we can use the same greedy algorithm as for the graph structure (ignoring the degree distribution constraints). Among the datasets we evaluated, Citeseer has binary node features, hence in Table 5 we display the results when attacking both node features and graph structure (while the total number of perturbations stays the same). We can observe that the impact of the combined attacks is slightly lower than the structureonly attack. We attribute this to the fact that we assign the same ‘cost’ to structure and feature changes, but arguably we expect a structure perturbation to have a stronger effect on performance than a feature perturbation. Future work can provide a framework where structure and feature changes impose a different ‘cost’ on the attacker. When the node features are continuous, there also needs to be some tuning of the meta step size and considerations whether multiple features per instance can be changed in a single step.
Citeseer  

GCN  CLN  
Clean  
MetaSelf with features  
MetaSelf 
Appendix F Additional Results
In this section we present additional results of our experiments. In Table 6 we see that our heuristic is successful at attacking Pubmed, a dataset with roughly 20K nodes. Tables 7 and 9 show misclassification rates with 1% and 10% perturbed edges, respectively. Table 9 displays results when training the surrogate model for iterations to obtain the (meta) gradients. Finally, Figures 8 through 14 show how the respective models’ classification accuracies change for different attack methods and datasets.
Pubmed  

GCN  CLN  DeepWalk  
Clean  
DICE  
AMetaSelf 
Cora  Citeseer  PolBlogs  Avg.  
Attack  GCN  CLN  DeepWalk  GCN  CLN  DeepWalk  GCN  CLN  DeepWalk  rank 
Clean  
DICE  
Firstorder  
Nettack                
AMetaTrain  
AMetaBoth  
MetaTrain  
MetaSelf  
Meta with Oracle  
Did not finish within three days on CoraML and PolBlogs 
Cora  Citeseer  PolBlogs  Avg.  
Attack  GCN  CLN  DeepWalk  GCN  CLN  DeepWalk  GCN  CLN  DeepWalk  rank 
Clean  
DICE  
Firstorder  
Nettack                     
AMetaTrain  
AMetaBoth  
MetaTrain  
MetaSelf  
Meta with Oracle  
Did not finish within three days for any dataset. 
Cora  Citeseer  PolBlogs  Avg.  

Attack  GCN  CLN  DeepWalk  GCN  CLN  DeepWalk  GCN  CLN  DeepWalk  rank 
Clean  
AMetaBoth  
MetaSelf 
heightadjust=object
heightadjust=object
heightadjust=object