1 Introduction
Node representations, which represent each node with a lowdimensional vector, have been proved effective in a variety of applications such as node classification
(Perozzi et al., 2014), link prediction (Grover & Leskovec, 2016), and visualization (Tang et al., 2016). Some popular node embedding methods include DeepWalk (Perozzi et al., 2014), LINE (Tang et al., 2015), and node2vec (Grover & Leskovec, 2016). These methods learn the node embeddings by preserving graph structures, which do not depend on specific tasks. As a result, the learned node embeddings are very general and can be potentially useful to multiple downstream tasks.However, although these methods are very effective and have been used for a variety of tasks, none of the existing work has studied the robustness of these methods. As a result, these methods are susceptible to a risk of being maliciously attacked. Take the task of link prediction in a social network (e.g., Twitter) as an example, which is one of the most important applications of node embedding methods. A malicious party may create malicious users in a social network and attack the graph structures (e.g., adding and removing edges) so that the effectiveness of node embedding methods is maximally degraded. For example, the attacker may slightly change the graph structures (e.g., following more users) so that the probability of a specific user to be recommended/linked can be significantly increased or decreased. Such a kind of attack is known as
data poisoning. In this paper we are interested in the robustness of the node embedding methods w.r.t. data poisoning, and their vulnerability to the adversarial attacks in the worst case.We are inspired by existing literature on adversarial attack, which has been extensively studied for different machine learning systems
(Szegedy et al., 2013; Goodfellow et al., 2014; MoosaviDezfooli et al., 2016; Carlini & Wagner, 2017; Xiao et al., 2018c, b, a; Xie et al., 2017; Cisse et al., 2017; Yang et al., 2018). Specifically, it has been shown that deep neural networks are very sensitive to adversarial attacks, which can significantly change the prediction results by slightly perturbing the input data. However, most of existing work on adversarial attack focus on image
(Szegedy et al., 2013; Goodfellow et al., 2014; MoosaviDezfooli et al., 2016; Carlini & Wagner, 2017; Xiao et al., 2018c) and text data (Cheng et al., 2018; Jia & Liang, 2017), which are independently distributed while this work focuses on graph data. There are some very recent work which studied adversarial attack for graph data (Dai et al., 2018; Zugner et al., 2018). However, these work mainly studied graph neural networks, which are supervised methods, and the gradients for changing the output label can be leveraged. Therefore, in this paper we are looking for an approach that is able to attack the unsupervised node embedding methods for graphs.In this paper, we introduce a systematic approach to adversarial attacks against unsupervised node embedding methods. We assume that the attacker can poison the graph structures by either removing or adding edges. Two types of adversarial goals are studied including integrity attack, which aims to attack the probabilities of specific links, and availability attack, which aims to increase overall prediction errors. We propose a unified optimization framework based on projected gradient descent to optimally attack both goals. In addition, we conduct a case study on a coauthor network to better understand our attack method. To summarize, we make the following contributions:

We formulate the problem of attacking unsupervised node embeddings for the task of link prediction and introduce a complete characterization of attacker utilities.

We propose an efficient algorithm based on projected gradient descent to attack unsupervised node embedding algorithms, specifically DeepWalk and LINE, based on the first order Karush Kuhn Tucker (KKT) conditions.

We conduct extensive experiments on realworld graphs to show the efficacy of our proposed attack model on the task of link prediction. Moreover, results show that our proposed attack model is transferable across different node embedding methods.

Finally, we conduct a case study on a coauthor network and give an intuitive understanding of our attack method.
2 Related Work
Adversarial attack against image classification has been extensively studied in recent years (Szegedy et al., 2013; Goodfellow et al., 2014; MoosaviDezfooli et al., 2016; Carlini & Wagner, 2017; Xiao et al., 2018c, b). However, adversarial attacks against graph have rarely been investigated before. Existing work (Dai et al., 2018; Zugner et al., 2018) on adversarial attacks on graph are limited to graph neural networks (Kipf & Welling, 2017)
, a supervised learning method. Our work, instead, shows the vulnerabilities of unsupervised methods on graph. Here we briefly summarize previous work on graph embedding methods and then we will give an overview of adversarial attacks on graph data. Last, we will show the related work on the connection to matrix factorization.
Unsupervised Learning on Graph
Previous work on knowledge mining in graph has mainly focused on embedding methods, where the goal is to learn a latent embedding for each node in the graph. DeepWalk (Perozzi et al., 2014), LINE (Tang et al., 2015) and Node2vec (Grover & Leskovec, 2016) are the three most representative unsupervised methods on graph.
Adversarial Attack on Graph
There are a few work on adversarial attack on graph before. Test time attack on graph convolutional network has been investigated (Dai et al., 2018). Also, poisoning attack against graph is also studied (Zugner et al., 2018). However, they only consider the attack against graph convolutional network.
Matrix Factorization
Skipgram model from the NLP community has been shown to be doing implicit matrix factorization (Levy & Goldberg, 2014). Recently, based on the previous work, it has been shown that most of the popular unsupervised methods for graph is doing implicit matrix factorization (Qiu et al., 2018). Moreover, poisoning attack has been demonstrated for matrix factorization problem (Li et al., 2016).
3 Preliminaries
We first introduce the graph embedding problem and link prediction problem. Then we will give an overview of the two existing algorithms for computing the embedding. Given a graph , where is the node set and is the edge set, the goal of graph embedding methods is to learn a mapping from to which maps each node in the graph to a ddimensional vector. We use to denote the adjacency matrix of the graph . For each node , we use to denote node ’s neighbor. represents the learnt node embedding matrix where is the embedding of node .
In link prediction, the goal is to predict the missing edges or the edges that are most likely to emerge in the future. Formally, given a set of node pair
, the task is to predict a score for each node pair. In this paper, we compute the score of each edge from the cosine similarity matrix
.Now we briefly review two popular graph embedding methods: DeepWalk (Perozzi et al., 2014) and LINE (Tang et al., 2015). DeepWalk extends the idea of Word2vec (Mikolov et al., 2013) to graph, where it views each node as a word and use the generated random walks on graph as sentences. Then it uses Word2vec to get the node embeddings. LINE learns the node embeddings by keeping both the firstorder proximity (LINE), which describes local pairwise proximity, and the secondorder proximity (LINE) for sampled node pairs. For DeepWalk and LINE, there is a context embedding matrix computed together with the node embedding matrix. We use to denote the context embedding matrix.
Previous work (Qiu et al., 2018) has shown that DeepWalk and LINE is implicitly doing matrix factorization.

DeepWalk is solving the following matrix factorization problem:
(1) 
LINE is solving the following matrix factorization problem:
(2)
where is the volume of graph , is the diagonal matrix where each element represents the degree of the corresponding node, is the context window size and is the number of negative samples. We use to denote the matrix that DeepWalk and LINE is factorizing. We denote as the observable elements in when solving matrix factorization and as the nonzero elements in row of . With these notations defined, we now give a unified formulation for DeepWalk and LINE:
(3) 
where is if and otherwise, denotes the squared Frobenious norm of matrix .
4 Problem Statement
In this section we introduce the attack model, including attacker’s action, attacker’s utilities and the constraints on the attacker. We assume that the attacker can manipulate the poisoned graph by adding or deleting edges. In this paper, we consider these two type of manipulation: adding edges and deleting edges respectively. We use to denote the poisoned graph.
We characterize two kinds of adversarial goals:
Integrity attack: Here the attacker’s goal is either to increase or decrease the probability (similarity score) of a target node pair. For example, in social network, the attacker may be interested in increasing (or decreasing) the probability that a friendship occurs between two people. Also, in recommendation system, an attacker associated with the producer of a product may be interested to increase the probability of recommending the users with that specific product. Specifically, the attacker aims to change the probability of the edge connected with a pair of nodes whose embedding is learnt from the poisoned graph .
For integrity attack, we consider two kinds of constraints on the attacker: 1. Direct Attack: the attacker can only manipulate edges adjacent to the target node pair; 2. Indirect Attack: the attacker can only manipulate edges without connecting to the target node pair.
Availability attack Here the adversarial goal of availability attack is to reduce the prediction performance over a test set consisting of a set of node pairs . (Here consists of both positive examples indicating the existence of edges, and negative examples for the absence of edges). In this paper, we choose average precision score (AP score) to evaluate the attack performance. Specifically, we consider the attacker whose goal is to decrease the AP score over by adding small perturbation to a given graph.
5 Attacking Unsupervised Graph Embedding
In this section we show our algorithm for computing the adversarial strategy. Given that DeepWalk and LINE is implicitly doing matrix factorization, we can directly derive the backpropogated gradient based on the first order KKT condition. Our algorithm has two steps: 1. Projected Gradient Descent (PGD) step: gradient descent on the weighted adjacency matrix. 2. Projection step: projection of weighted adjacency matrix onto . We first describe the projected gradient descent step and then we describe the projection method we use to choose which edge to add or delete.
5.1 Projected Gradient Descent (PGD)
Based on the matrix factorization formulation above, we describe the algorithm we use to generate the adversarial graph. The core part of our method is projected gradient descent (PGD) step. In this step, the adjacency matrix is continuous since we view the graph as a weighted graph, which allows us to use gradient descent.
First we describe the loss function we use. We use
to denote the loss function. For integrity attack, is where is the target node pair. (Here the or sign depends on whether the attacker wants to increase or decrease the score of the target edge.) For availability attack, the loss function is . The update of the weighted adjacency matrix in iteration is as follows:(4) 
Here is the projection function which projects the matrix to space and is the step size in iteration . The nontrivial part is to compute . We note that
(5) 
The computation of is trivial. Now to compute
, using the chain rule, we have:
. First we show how to compute .Next we show how to compute for DeepWalk and LINE separately. In the derivation of , we view and as constant.
DeepWalk We show how to compute for DeepWalk. For DeepWalk, From Eq. 1:
(8) 
Now to compute , let , then we only need to derive for each . Note that . Then since computing and is easy, once we have , we can compute .
LINE We show the derivation of for LINE:
(9) 
since where . We have:
(10) 
Once we have and , we can compute .
5.2 Projection
In the Projected Gradient Descent step, we compute a weighted adjacency matrix . We use to denote the adjacency matrix of the clean graph. Therefore we need to project it back to . Now we describe the projection method we use, which is straightforward. First, we show our projection method for an attacker that can add edges. To add edges, the attacker needs to choose some cells in and turn it into 1. Our projection strategy is that the attacker chooses the cells in where is closest to 1 as the candidate set of edges to add. For deleting edges, it works in a similar way. The only difference is that we start from the cells that are originally 1 in and choose the cells that are closest to 0 in as the candidate set of edges to delete.
Now we briefly discuss when to use the projection step. A natural choice is to project once after the projected gradient descent step. Another choice is to incorporate the projection step into the gradient descent computation where we project every iterations. The second projection strategy induces less loss in the projection step and is more accurate for computation but can take more iterations to converge than the first projection strategy. In our experiments, we choose the first projection strategy for its ease of computation.
6 Experiments
In this section, we show the results of poisoning attack against DeepWalk and LINE on realworld graph datasets. In the experiments, we denote our poisoning attack method as ‘Optattack’. We evaluate our attack method on three realworld graph datasets: 1). Facebook (Leskovec & Mcauley, 2012): a social networks with 4039 nodes and 88234 edges. 2). Cora (Sen et al., 2008): a citation network with 2708 nodes and 2708 edges. 3). Citeseer (Giles et al., 1998): a citation network with 2110 nodes and 7336 edges.
Baselines We compare with several baselines: (1) random attack: this baseline is general and used for both integrity attack and availability attack. We randomly add or remove edges; (2) personalized PageRank (Bahmani et al., 2010): this baseline is only used for integrity attack. Given a target edge , we use personalized PageRank to calculate the importance of the nodes. Given a list of nodes ranked by their importance, e.g., , we select the edges which connect the top ranked nodes to A or B, i.e., ; (3) degree sum: this baseline is used for availability attack. We rank the node pair by the sum of degree of its two nodes. Then we add or delete the node pair with the largest degree sum. (4) shortest path: this baseline is used for availability attack. We rank the edge by the number of times that it is on the shortest paths between two nodes in graph. Then we delete the important edges measured by the number of shortest paths in graph that go through this edge.
In our experiments, we choose 128 as the latent embedding dimension. We use the default parameter settings for DeepWalk and LINE. We generate the test set and validation set with a proportion of 2:1 where positive examples are sampled by removing 15% of the edges from the graph and negative examples are got by sampling an equal number of node pairs from the graph which has no edge connecting them. For both our attack and random attack, we guarantee that the attacker can’t directly modify any node pairs in the target set.
6.1 Integrity Attack
For each attack scenario, we choose 32 different target node pairs and use our algorithm to generate the adversarial graph. For increasing the score of the target edge, the target node pair is randomly sampled from the negative examples in the test set. For decreasing the score of the target edge, the target node pair is randomly sampled from the positive examples in the test set. We report the average score increase and compare it with the random attack baseline. We consider two kinds of attacker’s actions: adding or deleting edges and two constraints: direct attack and indirect attack. Results for Citeseer dataset are deferred to the appendix.
Adding edges
We consider the adversary which can only add edges. Figure 1 shows the results under direct attack setting. In Figure 0(a) 0(c) 0(e) 0(g), when the adversarial goal is to increase the score of the target node pair, we find that our attack method outperforms the random attack baseline by a significant margin. We also plot the personalized pagerank baseline. We can see this second baseline we propose is a very strong baseline, with attack performance the same level as our proposed method. To further understand it, we analyze the edges we add in this attack scenario. We find the following pattern: if the target node pair is , then our attack tends to add edges from node to the neighbors of node and also from node to the neighbors of node . This is intuitive because connecting to other node’s neighbors can increase the similarity score of two nodes. In Figure 0(d) 0(h), when the adversarial goal is to decrease the score of target node pair, our method is better than the random attack baseline for attacking LINE. For attacking DeepWalk (figure 0(b) 0(f)), the algorithm is able to outperform, on Facebook dataset (figure 0(b)), our attack is better than the random baseline when the number of added edges is large. Although for Cora (figure 0(f)) our attack is close to the random attack baseline, we note that in this attack case, random attack is already powerful and can lead to large drop (e.g. 0.8) in similarity score.
Figure 2 shows our result of indirect attack. We can see that for DeepWalk, even if the attacker can’t modify the edges adjacent to the target node pair, it can still manipulate the score of the target edge with a few edges added. We also analyze the edges our algorithm chooses when the adversarial goal is to increase the score of the target node pair (the case in figure 1(a) 1(c)), we find that our attack tends to add edges between the neighbors of node and the neighbors of node . It also follows the intuition that connecting the neighbor of two nodes can increase the similarity of two nodes. When the goal is to decrease the score (figure 1(b) 1(d)), our attack is still better than random baseline by a noticeable margin.
Deleting edges
Now we consider the adversary which can delete existing edges. Figure 3 summarizes our result for direct attack. We can see that our attack method works well for attacking DeepWalk (figure 2(a) 2(b) 2(e) 2(f)
). The large variance may be because that different edges have different sensitivity to deleting edges. Also, we notice that LINE is more robust to deleting edges, with average magnitude of score increase(decrease) lower than DeepWalk. Figure
4 summarizes our results for indirect attack. Still, on average, our attack is able to outperform the random attack baseline.6.2 Availability Attack
In this part, we show the results for availability attack. We report our results on two dataset: Cora and Citeseer. Results for Citeseer are deferred to the appendix. For both datasets, we choose the test set to be attack. Table 1 summarizes our result. We can see that our attack almost always outperforms the baselines. When adding edges, our optimization attack outperforms all other baselines by a significant margin. When deleting edges, we can see that LINE is more robust than DeepWalk. In general, we notice the that adding edges is more powerful than deleting edges in our attack.
6.3 Transferability
In this part, we show that our attack can be transferred across different embedding methods. Besides DeepWalk and LINE, we choose another three embedding methods to test the transferability of our approach: 1. Variational Graph Autoencoder(GAE)
(Kipf & Welling, 2016); 2. Spectral Clustering
(Tang & Liu, 2011); 3. Node2Vec (Grover & Leskovec, 2016). For GAE, we use the default setting as in the original paper. For Node2Vec, we first tune the parameters , on a validation set and use the best , for Node2Vec.Figure 5 shows our result for transferability test of our attack, where the number of added(deleted) edges is 200. Results when the number of added(deleted) edges is 100 and 300 are deferred to the appendix. Comparing Figure 4(a) 4(c) and Figure 4(b) 4(d), we can see that adding edges is more effective than deleting edges in our attack. The attack on DeepWalk has higher transferability compared with other four methods (including LINE). Comparing the decrease of AP score for all five methods, we can see that GAE is more robust against transferability based attacks.
7 Case Study: Attack Deepwalk on Coauthor Network
In this section, we conduct a case study on a realworld coauthor network extracted from DBLP (Tang et al., 2008). We construct a coauthor network from two different research communities: machine learning & data mining (ML&DM), and security (Security). For each research community, we select some conferences in each field: ICML, ICLR, NIPS, KDD, WWW, ICWD, ICDM from ML&DM and IEEE S&P, CCS, Usenix and NDSS from Security. We sort the authors according to the number of published papers and keep the top500 authors that publish most papers in each community, which eventually yields a coauthor network with 1,000 nodes in total. The constructed coauthor graph contains 27260 infield edges and 1014 crossfield edges. We analyze both integrity attack and availability attack on this coauthor network.
Integrity Attack We consider an indirect attack setting where the adversarial goal is to decrease the score of a target node pair and the attacker’s action is deleting existing edges. We show the subgraph that contains the nodes in the target edge and 5 edges chosen by our algorithm, as well as the nodes that coauthored more than 3 papers with them, e.g. frequent collaborators. We visualize the original graph in Figure 5(a). Green nodes represent ML&DM authors and blue nodes denote authors from Security community.
In Figure 5(b), we show the target node pair A and B. Node A denotes John C. Mitchell, a professor in Stanford University from the security community and node B denotes Trevor Hastie, also a Stanford professor, from the ML & DM community. After the attack, the similarity score of the target node pair is reduced from 0.67 to 0.37. We make the following observations: 1) We find that the top 2 edges (d, e) and (e,f) chosen by our attack lie on the shortest path between A and B, which corresponds to the intuitive understanding that cutting the paths connecting A with B makes it less likely to predict that an edge exists between A and B; 2) We find that many of the edges chosen by our algorithm are crossfield edges (figure 5(c)): edge (i, j) and edge (d, e). Considering how small a proportion the crossfield edges consist of( of all edges), we hypothesize that cutting the crossfield edges could impede the information flow between two communities and therefore making the similarity score of crossfield link lower.
Availability Attack For availability attack, we analyze the adversary that add edges. It turns out that our algorithm tends to add crossfield edges: for the top 40 added edges chosen by our attack method, 39 are crossfield edges. We hypothesize that this is because adding more edges between two communities can disrupt the existing information flow and therefore lead the learnt embedding to carry incorrect information about the network structure.
8 Conclusion
In this paper, we investigate data poisoning attack against unsupervised node embedding methods and take the task of link prediction as an example. We study two types of data poisoning attacks including integrity attack and availability attack. We propose a unified optimization framework to optimally attack the node embedding methods for both types of attacks. Experimental results on several realworld graphs show that our proposed approach can effectively attack the results of link prediction by adding or removing a few edges. Results also show that the adversarial examples discovered by our proposed approach are transferable across different node embedding methods. Finally, we conduct a case study analysis to better understand our attack method. In the future, we plan to study how to design effective defense strategies for node embedding methods.
References
 Bahmani et al. (2010) Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. In VLDB, 2010.
 Carlini & Wagner (2017) Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017, 2017.
 Cheng et al. (2018) Minhao Cheng, Jinfeng Yi, Huan Zhang, PinYu Chen, and ChoJui Hsieh. Seq2sick: Evaluating the robustness of sequencetosequence models with adversarial examples. arXiv preprint arXiv:1803.01128, 2018.
 Cisse et al. (2017) Moustapha Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet. Houdini: Fooling deep structured prediction models. arXiv preprint arXiv:1707.05373, 2017.
 Dai et al. (2018) Hanjun Dai, Hui Li, Tian Tian, Xin Huang, Lin Wang, Jun Zhu, and Le Song. Adversarial attack on graph structured data. In ICML, 2018.
 Giles et al. (1998) C. Lee Giles, Kurt D. Bollacker, and Steve Lawrence. Citeseer: an automatic citation indexing system. In The Third ACM Conference on Digital Libraries, 1998.
 Goodfellow et al. (2014) Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
 Grover & Leskovec (2016) Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In KDD, 2016.
 Jia & Liang (2017) Robin Jia and Percy Liang. Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328, 2017.
 Kipf & Welling (2016) Thomas Kipf and Max Welling. Variational graph autoencoders. arXiv preprint arXiv:1611.07308, 2016.
 Kipf & Welling (2017) Thomas Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
 Leskovec & Mcauley (2012) Jure Leskovec and Julian J. Mcauley. Learning to discover social circles in ego networks. In NIPS. 2012.
 Levy & Goldberg (2014) Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. In NIPS, 2014.
 Li et al. (2016) Bo Li, Yining Wang, Aarti Singh, and Yevgeniy Vorobeychik. Data poisoning attacks on factorizationbased collaborative filtering. In NIPS, 2016.
 Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. NIPS, 2013.
 MoosaviDezfooli et al. (2016) SeyedMohsen MoosaviDezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, 2016.
 Perozzi et al. (2014) Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In KDD, 2014.
 Qiu et al. (2018) Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network embedding as matrix factorization. In WSDM, 2018.
 Sen et al. (2008) Prithviraj Sen, Galileo Mark Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina EliassiRad. Collective classification in network data. AI Magazine, 29(3):93–106, 2008.
 Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 Tang et al. (2015) Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Largescale information network embedding. In WWW, 2015.

Tang et al. (2016)
Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei.
Visualizing largescale and highdimensional data.
In WWW, 2016.  Tang et al. (2008) Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Arnetminer: extraction and mining of academic social networks. In KDD, 2008.
 Tang & Liu (2011) Lei Tang and Huan Liu. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3):447–478, November 2011.
 Xiao et al. (2018a) Chaowei Xiao, Ruizhi Deng, Bo Li, Fisher Yu, Dawn Song, et al. Characterizing adversarial examples based on spatial consistency information for semantic segmentation. In ECCV, 2018a.
 Xiao et al. (2018b) Chaowei Xiao, Bo Li, JunYan Zhu, Warren He, Mingyan Liu, and Dawn Song. Generating adversarial examples with adversarial networks. In IJCAI, 2018b.
 Xiao et al. (2018c) Chaowei Xiao, JunYan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. Spatially transformed adversarial examples. In ICLR, 2018c.
 Xie et al. (2017) Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples for semantic segmentation and object detection. In ICCV, 2017.
 Yang et al. (2018) Dawei Yang, Chaowei Xiao, Bo Li, Jia Deng, and Mingyan Liu. Realistic adversarial examples in 3d meshes. arXiv preprint arXiv:1810.05206, 2018.
 Zugner et al. (2018) Daniel Zugner, Amir Akbarnejad, and Stephan Gunnemann. Adversarial attacks on neural networks for graph data. In KDD, 2018.
Appendix A Implementation Detail
In this part, we discuss the initialization of weighted adjacency matrix in the projected gradient descent step. From the formulation in section 5.1, if we initialize all cells which are initially 0 to 0. Then there won’t be backpropagated gradient on these cells. (This is because won’t contain these cells.) To handle this issue, we initialize these cells with a small value, for example 0.001, which allows the gradient on these cells to be efficiently computed.
Appendix B Additional Experimental Results
b.1 Integrity attack
Here we show additional experimental results. Figure 7, 8 summarize the results of integrity attack on Citeseer dataset. Figure 7 shows our results for direct integrity attack and Figure 8 shows our results for indirect integrity attack.
Appendix C Availability attack
Here we show additional results for availability attack. Table 2 shows the results of availability attack on Citeseer dataset. Figure 9, 10 show additional results of transferability analysis.