The General Black-box Attack Method for Graph Neural Networks

08/04/2019 ∙ by Heng Chang, et al. ∙ The University of Texas at Arlington Tsinghua University Georgia Institute of Technology Tencent 0

With the great success of Graph Neural Networks (GNNs) towards representation learning on graph-structure data, the robustness of GNNs against adversarial attack inevitably becomes a central problem in graph learning domain. Regardless of the fruitful progress, current works suffer from two main limitations: First, the attack method required to be developed case by case; Second, most of them are restricted to the white-box attack. This paper promotes current frameworks in a more general and flexible sense -- we demand only one single method to attack various kinds of GNNs and this attacker is black box driven. To this end, we begin by investigating the theoretical connections between different kinds of GNNs in a principled way and integrate different GNN models into a unified framework, dubbed as General Spectral Graph Convolution. As such, a generalized adversarial attacker is proposed towards two families of GNNs: Convolution-based model and sampling-based model. More interestingly, our attacker does not require any knowledge of the target classifiers used in GNNs. Extensive experimental results validate the effectiveness of our method on several benchmark datasets. Particularly by using our attack, even small graph perturbations like one-edge flip is able to consistently make a strong attack in performance to different GNN models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph Neural Networks (GNNs) scarselli2009GNN

, which elaborate the expressive power of deep learning on graph-sturcture data, have achieved promising success in various domains, such as predicting properties over molecules 

duvenaud2015convolutional , biology analysis Hamilton2017Inductive and financial surveillance paranjape2017motifs to name a few. That being said, a bunch of recent works have posed the risk of GNNs against adversarial attacks, just like what the researchers are anxious for on conventional deep neural networks akhtar2018threat . For instance, the works by ICML2018Adversarial ; KDD2018Adversarial have already shown that convolution-based graph models are vulnerable to adversarial attacks at both test time (i.e. evasion) and training time (i.e. poisoning).

Regarding the amount of information required for the generation of adversarial examples, the attack methods on graphs broadly fall into three categories ICML2018Adversarial (arranged in an ascending order of difficulties):

  • [noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt]

  • White-box attack (WBA): can access any information of the target model.

  • Practical attack (PBA): can access the prediction of the target model.

  • Restrict black-box attack(RBA): no access to any information of the target model.

While existing works KDD2018Adversarial ; ICML2018Adversarial ; sun2018adversarial on both WBA and PBA are fruitful, performing RBA is more challenging yet meaningful in practice due to avoidance of looking into the model internals, which motivates the study of this work.

By reviewing various kinds of GNNs, Graph Convolution Network (GCN) ICLR2017SemiGCN and the DeepWalk model perozzi2014deepwalk are two representatives of two graph learning models families, respectively. At first glance, GCN and DeepWalk are quite different from each other: GCN requires the attributes of the vertices and the learning weights are shared over graphs, while DeepWalk is attribute-agnostic and learning parameters are the vertices embeddings themselves. Owing to these differences, previous attack approaches are only applicable to either of them but not both. For example, the methods by KDD2018Adversarial ; ICML2018Adversarial ; ICLR2019Meta are feasible for attacking GCN, and the model by arXiv2018Adversarial aims at perturbing DeepWalk. How to explore the generalization ability of adversarial samples learned for one model to others is still an open question.

GNN Model Graph-shift filter Polynomial Function Input Signal Parameters
GCN ICLR2017SemiGCN X Any
SGC sgc_icml19 X Any
ChebyNet Defferrard2016ChebNet X Any
LINE WWW2015Line
DeepWalk perozzi2014deepwalk

Table 1: The theoretical connection betwen different GNN models

In this paper, we propose a more general and flexible framework that uses one attacker to attack two families of GNN models on node classification tasks under the RBA setting. We first investigate the theoretical connection between different kinds of GNN models from the point of view of graph signal processing. [id=RR]We propose a General Spectral Graph Convolution model which detaches the trainable parameter from graph filtering procedure. We show the equivalence between two convolution-based models and further prove that the sampling-based model, such as LINE WWW2015Line and DeepWalk perozzi2014deepwalk , can be modeled as the graph filtering with given vertex features and fixed parameters (see Table 1 for a summary). Accordingly, we establish a general optimization problem for restricted black-box attack on GNNs and derive an effective algorithm to solve it. Four typical GNN models, convolution-based GNNs (GCN and SGC) and sampling-based GNNs (DeepWalk and LINE), are chosen for the illustration of the algorithm and the evaluation of the attack performance. Empirical results show that our general attacking method is able to effectively propose adversarial attacks on real-world datasets without access to the classifier by considering both evasion and poisoning attacks.

2 Related work

For explanation of graph neural network families, xu2018how and WSDM2018NetworkEmbedding show some insights on the understanding of convolution-based and sampling-based GNNs, respectively. However, they focus on proposing new GNN frameworks in each type of GNN rather than building up a theoretical connection.

Only recently adversarial attacks on deep learning for graphs have drawn unprecedented attention from researchers. ICML2018Adversarial

considers evasion attacks on both graph classification and node classification and exploit a reinforcement learning based framework under RBA setting. However, they restrict their attacks on edge deletions only for node classification and do not consider the harder poison attacks or the transferability.

KDD2018Adversarial proposes both evasion and poison attacks based on a surrogate model and they can do both edge insertion/deletion in contrast to ICML2018Adversarial . But their method assumes full knowledge about the model and is under white-box attack setting. Further, ICLR2019Meta utilizes meta-gradients to conduct poison attacks under black-box setting by assuming the attacker uses a surrogate model as KDD2018Adversarial . Their performance highly depends on the assumption of the surrogate model and they focus on the global attack setting. arXiv2018Adversarial considers a different poison attack task on node embeddings. Inspired by WSDM2018NetworkEmbedding

, they maximize the loss obtained by DeepWalk by eigenvalue perturbation theory. In contrast, we focus on semi-supervised learning on node classification. Remarkably, despite all above-introduced works except

ICML2018Adversarial show the existence of transferability in GNNs by experiments, they all lack theoretical analysis on the implicit connection. In this work, for the first time, we theoretically connect different kinds of GNNs and propose a general optimization problem from parametric graph signal processing. An effective algorithm is developed afterward considering both poisoning and evasion attack under the RBA setting accordingly.

3 [id=TX]Background and Preliminary

[id=TX]We begin with some notations and basic definitions. Let [id=RR] be an attributed graph, where is a vertex set with size and is an edge set. Denote as an adjacent matrix containing information of edge connections and as a feature matrix with dimension [id==RR] for vertices. refers the degree matrix. denotes the volume of .

3.1 Graph Neural Networks

[id=TX]Graph Neural Networks (GNNs) are proposed to collectively aggregate information from graph structure as an embedded representation for each vertex. Concretely, GNNs aim To cope with the data with graph structure in ML tasks, Graph Neural Networks aim to encode sufficient features in graphs. Concretely, given a graph , the goal is to learn an [id=TX]embeddingmapping function [id=TX] on graph that represent graph vertex into a

-dimensional vector space with the preservation of structural (

) and non-structural () properties[id=TX] as much as possible. Current GNN model can be divided into two categories: [id=RR]convolution-based GNN Defferrard2016ChebNet ; ICLR2017SemiGCN and sampling-based GNN WWW2015Line ; perozzi2014deepwalk .sampling-based GNN and convolution-based GNN.

3.1.1 Convolution-based GNN

Convolution-based GNN extends the definition of convolution to the irregular graph structure and learns a representation vector of a vertex with feature matrix

. Namely, we generalize the Fourier transform to graphs to define the convolution operation:

. To accelerate calculation, ChebyNet Defferrard2016ChebNet proposed a polynomial filter and approximated by a truncated expansion concerning Chebyshev polynomials :

(1)

where and is the largest eigenvalue of Laplacian matrix . is now the parameter of Chebyshev polynomials . denotes the order polynomial in Laplacian.

GCN ICLR2017SemiGCN constructed the layer-wise model which only considers Chebyshev polynomial and introduce the re-normalization trick to avoid gradient exploding/vanishing:

(2)

where is the parameters in the -th layer and is a nonlinear function, i.e.

the ReLU function. SGC

sgc_icml19

utilizes a single linear transformation to achieve computationally efficient graph convolution, i.e.,

in SGC is a linear activation function.

3.1.2 Sampling-based GNN

Sampling-based GNN learns vertex representations according to sampled vertices, vertex sequences, or network motifs. For instance, LINE WWW2015Line with second order proximity intends to learn two graph representation matrices , by maximizing the NEG loss of the skip-gram model:

(3)

where , are rows of , respectively;

is the sigmoid function;

is the negative sampling parameter; denotes the noise distribution generating negative samples. DeepWalk perozzi2014deepwalk

adopt the similar loss function except that

is replaced with an indicator function which indicates whether vertex and are sampled in the same sequence within given window size . Most of sampling-based GNNs only consider the structural information and ignore the vertex feature matrix . [id==RR]The output representation matrix is purely learned from the graph topology. The output representations are subsequently used for the task such as node classification and link prediction.

3.2 Adversarial Attack on GNN

Given a GNN model parameterized by and a graph , the adversarial attack on graph aims to perturb the learned vertex representation to damage the performance of the downstream learning tasks. There are three components in graphs that can be attacked as targets:

  • [noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt]

  • Attack on : Add or delete vertices in graphs. This operation may change the dimension of the adjacency matrix .

  • Attack on : Add or delete edges in graphs. This operation would lead to the changes of entries in the adjacency matrix . This kind of attack is also known as structural attack.

  • Attack on : Modify the attributes attached on vertices.

Here, we mainly focus on adversarial attacks on graph structure under RBA setting, since attacking is more practical than others in real applications CIKM2012Gelling . Our attack model can be easily extended to attack vertices and feature matrix.

4 General Spectral Graph Convolution

In this section, we provide theoretical motivation for our adversarial attack model. Graph Signal Processing (GSP) focuses on analyzing and processing data points whose relations are modeled as graph shuman2013GSP ; ortega2018graph . Similar to Discrete Signal Processing, these data points can be treated as signals. Thus the definition of graph signal is a mapping from vertex set to real numbers . In this sense, the feature matrix can be treated as graph signals with channels.

Inspired from GSP, we aim to formulate GNN model as the generalization of signal processing. Namely, GNN model can be treated as producing the new graph signals according to graph filter together with feature convolution:

(4)

where denotes a graph signal filter, denotes the activation function of neural networks, and denotes a convolution filter from input channels to output channels. We construct by a polynomial function with graph-shift filter , i.e., . We call this general model General Spectral Graph Convolution (GSGC). GSGC introduces the trainable weight matrix to enable stronger expressiveness which can fuse the structural and non-structural information. In the following, we show that the different kinds of GNN models can be formulate as GSGC with different graph signal filer .

[id=TX] Graph signal filtering: In (LABEL:equ.GSGC1), we produce the new graph signals according to . is a linear, shift-invariant graph filter which is constructed by a polynomial function with graph-shift filter , i.e., .

Feature convolution: In (LABEL:equ.GSGC2), the output of graph signal filtering is passed into a convolution filter with activation function. is the activation function. The parameter matrix is convolution filters which accept a graph signal with input channels and produce the output signal with channels.

4.1 Graph Convolution as GSGC

In Graph Signal Processing, graph convolution is a special case of signal filtering because filtering a signal is equivalent to multiplying the spectrum of the signal by the frequency responses of the filter sandryhaila2013discrete . In this sense, we can detach the parameter from frequency response () and formulate different graph convolutional models with different graph filters (). For example, it’s straightforward to represent single layer GCN/SGC and ChebyNet as:

Lemma 1.

The single-layer GCN/SGC with activation function and weight matrix is equivalent to filter graph signal with graph-shift filter :

Lemma 2.

The -localized single-layer ChebyNet with activation function and weight matrix is equivalent to filter graph signal with a polynomial filter with graph-shift filter . represents Chebyshev polynomial of order :

Proof.

Please refer to the Appendix. ∎

According to Lemma 1 and 2, we can observe that: similar to graph signal processing, the graph-shift filter plays an important role in constructing graph convolution models. From this point of view, we can build the connection between -layer GCN/SGC and -localized single-layer ChebyNet.

Theorem 3.

The -layer SGC is equivalent to the -localized single-layer ChebyNet with order polynomials of the graph-shift filter .

Proof.

Please refer to the Appendix. ∎

Theorem 3 clearly explains why multi-layer SGC can preserve the same higher-order proximity of graphs as multi-layer GCN described in huang2018adaptive , since the graph filter of multi-layer SGC is a -localized graph filter. Though non-linearity disturbs the explicit expression of graph-shift filter of -layer GCN, the spectral analysis from sgc_icml19 demonstrated that both GCN and SGC share similar graph filtering behavior. Further, our general attack model for multi-layer SGC also shows excellent performance on multi-layer GCN in practice. Thus, we conclude -layer GCN in same framework of SGC as Theorem 3 but with non-linear activation functions.

4.2 LINE & DeepWalk as GSGC

In this section, we show the theoretical connection between the sampling-based GNN the convolution-based GNN. From the perspective of sampling-based embedding GNN, the embedded matrix is obtained by generating training corpus for the skip-gram model from adjacent matrix or a set of random walks. yang2015Comprehend ; WSDM2018NetworkEmbedding show that Point-wise Mutual Information (PMI) matrices are implicitly factorized in sampling-based embedding approaches. It indicates that sampling-based model can be rewritten in a matrix factorization form. Inspired by this insight, we prove that DeepWalk can be viewed from a convolutional manner as well:

Theorem 4.

DeepWalk is the special case of general spectral graph convolution, with a polynomial filter . is constructed by graph-shift filter , and is random walk normalized Lapacian.

Proof.

Please refer to the Appendix. ∎

Note that DeepWalk is formulated from an optimized unsupervised NEG loss of skip-gram model. Thus, the parameter and value of the NCG loss in Theorem 4 have been fixed at the optimal point of the model with given graph signals.

Corollary 1.

The output of -window DeepWalk with negative samples is equivalent to filtering a set of graph signals , with given parameters . Equation (4) can be rewritten as:

Since LINE is the special case of DeepWalk when WSDM2018NetworkEmbedding , it’s straightforward to rewrite LINE to a convolutional style as:

From Section 4.1 and Section 4.2, we can theoretically conclude that the transferability of adversarial attack method between convolution-based and sampling-based GNNs comes naturally from their underlying graph filtering connection. This connection inspires us to propose our general attack model on GSGC in the following section.

5 The Attack Model on GSGC

In Section 4, we have built a unified framework GSGC to understand different GNN models. Next, we introduce how to perform the attack to these models through GSGC. Concretely, given a fixed budget indicating that the attacker is only allowed to modify entries in (undirected), the graph adversarial attack asks to solve the following problem by modifying the graph from to :

where is always the embedding output of the model and is the loss function for the model to minimize . This is a bi-level optimization problem. We ease this problem by investigating under evasion attack scenario, where are learned on the clean graph and remains unchanged during attack. Poisoning attack is also analyzed through experiments.

[id=TX]However, since the model is trained before the graph adversarial attack, the optimal parameters and the optimal embedding output are settled in problem  (LABEL:equ.problempri). Then, problem (LABEL:equ.problempri) is rewritten as:

(5)
s.t.

Typically, the attacking model of different GNN models is different. In this section, we will utilize the theoretical connection between different models based on GSGC, and propose a uniform attacker as a single model to attack various kinds of GNN models.

By detaching the parameter of GSGC, the restricted black-box attack on GSGC can be easily performed according to directly attack the graph filter combined with the signal , a.k.a, . By evaluating the quality of output embedding WSDM2018NetworkEmbedding , we specifically establish the general optimization problem as a -rank approximation problem:

(6)

where is the polynomial graph filter, is the graph shift filter constructed from the perturbed adjacency matrix . is the -rank approximation of . According to low-rank approximation, can be rewritten as:

(7)

where is the number of vertices. is the eigen-decomposition of the graph filter . is a symmetric matrix. ,

are the eigenvalue and eigenvector of graph filter

, respectively, in order of . is the corresponding eigenvalue after perturbation. While is hard to optimized, from (7), we can compute the upper bound instead of minimizing the loss directly. Accordingly, the goal of adversarial attack is to maximize the upper bound of the loss reversely. Thus the overall adversarial attack model for GSGC is:

(8)

This adversarial attack model is a general attacker since different GNN models can be attacked by (8) theoretically if they can be formulated as a variant of GSGC. Furthermore, the results of eigen-decomposition of can be easily computed by the eigen-decomposition of due to the linearity and the shift invariance of the polynomial graph filter, . Namely, . In this sense, our attacker is suitable for attacking complex multi-layer GNN models.

More important, according to the constructed connections between different graph-shift filters, we can even build a uniform model to attack kinds of GNNs. In the following context, we will choose two GNN models: multi-layer SGC and DeepWalk, as examples to illustrate the power of our general attack model.

Multi-layer SGC. As stated in Theorem 3, the graph-shift filter of SGC is defined as , where denotes the normalized adjacent matrix. Thus, for -layer SGC, we can decompose the graph filter as . The corresponding adversarial attack loss for order SGC can be rewritten as:

(9)

where refers to the th largest eigenvalue of after perturbation.

DeepWalk. As stated in Theorem 4, the graph-shift filter of DeepWalk is defined as . Therefore, graph filter of the -window DeepWalk can be decomposed as .

In order to uniformly establish adversarial attack loss for SGC and DeepWalk, the following Lemma 5 provides bounds for the eigenvalue of w.r.t the eigenvalue of normalized adjacent matrix :

Lemma 5.

WSDM2018NetworkEmbedding Let and be the graph-shift filter of DeepWalk. The decreasing order eigenvalue of are bounded as: , where is a permutation of ensuring the eigenvalue in the non-increasing order and is the smallest degree in ; and the smallest eigenvalue of is bounded as:

For proof of Lemma 5, please kindly refer to WSDM2018NetworkEmbedding . From Lemma 5, we can find that both the magnitude of eigenvalues and smallest eigenvalue of are always bounded by the corresponding value of . Thus, can be also well approximated using the eigenvalues of .

Thus, the corresponding adversarial attack loss of order DeepWalk can be rewritten as:

(10)

In order to maximize two losses, (9) and (10

), we need to estimate the

s after the perturbation. To estimate , we directly derive the explicit formulation of perturbed by on the adjacent matrix as follows the approach in zhu2018high :

Theorem 6.

Let be a perturbation version of by adding/removing the edges and be the respective change in the degree matrix. and are an eigen-pair of eigenvalue and eigenvector solving the generalized eigen-problem . Then the perturbed generalized eigenvalue is approximately:

Proof.

Please refer to the Appendix. ∎

Now the general attack loss is established. For a given target vertex, we employ the target attack by sequentially calculating the corresponding loss w.r.t graph-shift filter by adding/deleting an edge with all the other nodes in the graph and select the corresponding action with maximum. Further, we adopt the hierarchical strategy in ICML2018Adversarial to decompose the single edge selection into two ends of this edge in practice.

6 Experiments

Datasets. We evaluate our approach on three real-world datasets: Cora Dataset2000Cora , Citeseer and Pubmed Dataset2008Citeseer . In all three citation network datasets, vertices are documents with corresponding bag-of-words features and edges are citation links. The data preprocessing settings are closely followed the benchmark setup in ICLR2017SemiGCN . Statistical overview of datasets is given in Appendix. Following the setting in KDD2018Adversarial , we split the network into labeled (20%) and unlabeled vertices (80%). Further, the labeled vertices are splitted into equal parts for training and validation. Note that the labels and classifiers is invisible to the attacker due to the RBA setting. The attack performance is evaluated by the decrease of classification accuracy following ICML2018Adversarial .

Baselines. In current literature, many models utilize the information of classifier and perform the white-box attack, such as NETTACK KDD2018Adversarial , GradArgmax and GeneticAlg ICML2018Adversarial . Since our model aims to the restricted black-box attack, it’s unfair to compare the performance with white-box attack models. Hence, we compare three baselines with the proposed method as follows:

  • [noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt]

  • Random ICML2018Adversarial : the simplest method that, for each perturbation, randomly chooses insertion or removing of an edge in graph . We report averages over five different seeds to alleviate the influence of randomness.

  • Degree CIKM2012Gelling : a degree based method that inserts or removes an edge based on degree centrality, which is equivalent to the sum of degrees in original graph .

  • RL-S2V ICML2018Adversarial : a reinforcement learning based attack method, which learns the generalizable attack policy for GCN under RBA scenario.

Target Models. To validate the generalization ability of our proposed attacker, we choose four popular GNN models: GCN ICLR2017SemiGCN , SGC sgc_icml19 , DeepWalk perozzi2014deepwalk and LINE WWW2015Line

for evaluation. First two of them are convolution-based and the others are sampling-based. For DeepWalk, the hyperparameters are set to commonly used values : K = 5, b = 5 and top-128 largest singular values/vectors. A logistic regression classifier is connected to the output embeddings of sampling-based methods for classification. Unless otherwise stated, all convolution-based models contain two layers.

Attack Configuration. A small budget is applied to regulate all the attackers. To make this attacking task more challenging, is set to 1. Specifically, the attacker is limited to only add/delete a single edge given a target vertex . For our method, we set the only parameter in our general attack model as , which means that we choose the top- smallest eigenvalues. Unless otherwise indicated, the order of GSGC in attack model is set to .

Method Cora Citeseer Pubmed
Model GCN SGC GCN SGC GCN SGC
(unattacked) 80.20% 78.82% 72.50% 69.68% 80.40% 80.21%
Random 78.30% 77.60% 69.64% 68.21% 78.65% 78.44%
Degree 77.99% 74.40% 67.82% 63.49% 76.56% 75.77%
RL-S2V 75.00% 73.20% 66.00% 65.60% 74.00% 74.10%
GSGC 72.60% 69.09% 64.72% 63.49% 72.44% 73.01%
Table 2: Summary of single edge perturbation on convolution-based model under RBA setting.
Method Cora Citeseer Pubmed
Model DeepWalk LINE DeepWalk LINE DeepWalk LINE
(unattacked) 77.23% 76.75% 69.68% 65.15% 78.69% 72.12%
Random 75.47% 74.91% 63.06% 63.33% 77.44% 72.11%
Degree 74.15% 64.35% 60.01% 45.60% 76.26% 59.07%
RL-S2V 71.94% 66.37% 57.55% 45.05% 72.59% 58.91%
GSGC 71.92% 63.48% 58.18% 43.04% 71.26% 57.96%
Table 3: Summary of single edge perturbation on sampling-based model under RBA setting.

6.1 Attack Performance Evaluation

In the section, we evaluate the overall attack performance of different attackers.

Attack on Convolution-based Models. Table 2 summaries the attack results of different attackers on convolution-based models. Our GSGC attacker outperforms other attackers on all datasets and all models. Moreover, GSGC performs quite well on 2-layer GCN with nonlinear activation. This implies the generalization ability of our attacker on convolution-based models.

Attack on Sampling-based Models. Table 3 summaries the attack results of different attackers on sampling-based models. As expected, our attacker achieves the best performance. It validates the effectiveness of our method on attacking sampling-based models. Another interesting observation is that the attack performance on LINE is much better than that on DeepWalk. This result may due to the deterministic structure of LINE, while the random sampling procedure in DeepWalk may help raise the resistance to adversarial attack.

6.2 Evaluation of Multi-layer Convolution-based Models

To further inspect the transferability of our attacker, we conduct attack towards multi-layer convolution-based models w.r.t the order in GSGC model. Figure 1 presents the attacking results on , and layer GCN and SGC with different orders, and the number followed by GSGC indicates the graph-shift filter order in Theorem 3. From Figure 1, we can obtain three major observations. First, the transferability of our general model is demonstrated, since all filters with different orders are capable with effective attack. Second, GSGC-4 achieves almost the best attacking performance. It implies that the higher order of filter contains higher order information and has positive effects on attack to simpler models. Third, the attacking performance on SGC is always better than GCN under all settings. We conjecture that the non-linearity between layers in GCN successively adding robustness to GCN.

Figure 1: Comparison between order of GSGC and number of layers in GCN/SGC on Citeseer.

6.3 Evaluation under Multi-edge and Poisoning Settings

The results of multiple edges perturbations under RBA setting are reported in Figure 2 for demonstration. Figure 2(a) shows the multi-edge evasion attack on Cora dataset. Clearly, with the increasing of the number of perturbed edges, the attacking performance gets better for each attacker. And GSGC outperforms the other methods in each number of edge perturbations.

(a) Evasion attack on Cora.
(b) Poisoning attack on Citeseer.
Figure 2: Multiple-edge perturbation under RBA setting.

Aside from the evasion attack in consistent with RL-S2V, we also investigate the performance of our method under poisoning settings with multiple edges perturbation on Citeseer. Since RL-S2V is an evasion method, we choose to only compare with the Random method. As shown in Figure 2(b), we observe that even under the setting of more difficult poisoning attacks, our method still performs very well with nearly half accuracy dropped.

7 Conclusion

In this paper, we consider the adversarial attack on different kinds of graph neural networks under restrict black-box attack scenario. From graph signal processing of view, we first investigate the theoretical connections between two families of GNN models and propose General Spectral Graph Convolution model. Thereby, a general optimization problem is constructed by considering both evasion and poisoning attacks and an effective algorithm is derived accordingly to solve it. Experiments show the vulnerability of different kinds of GNNs to our attack model.

References

  • (1) F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Transactions on Neural Networks 20 no. 1, (2009) 61–80.
  • (2) D. K. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, R. Gómez-Bombarelli, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” neural information processing systems (2015) 2224–2232.
  • (3) W. L. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” neural information processing systems (2017) 1024–1034.
  • (4) A. Paranjape, A. R. Benson, and J. Leskovec, “Motifs in temporal networks,” in Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 601–610. 2017.
  • (5)

    N. Akhtar and A. S. Mian, “Threat of adversarial attacks on deep learning in computer vision: A survey,”

    IEEE Access 6 (2018) 14410–14430.
  • (6) H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song, “Adversarial attack on graph structured data,”

    international conference on machine learning

    (2018) 1115–1124.
  • (7) D. Zügner, A. Akbarnejad, and S. Günnemann, “Adversarial attacks on neural networks for graph data,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2847–2856. 2018.
  • (8) L. Sun, J. Wang, P. S. Yu, and B. Li, “Adversarial attack and defense on graph data: A survey.” arXiv preprint arXiv:1812.10528 (2018) .
  • (9) T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” international conference on learning representations (2017) .
  • (10) B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710, ACM. 2014.
  • (11) D. Zügner and S. Günnemann, “Adversarial attacks on graph neural networks via meta learning,” international conference on learning representations (2019) .
  • (12) A. Bojchevski and S. Günnemann, “Adversarial attacks on node embeddings,” arXiv: Learning (2018) .
  • (13) F. Wu, T. Zhang, A. H. Souza Jr., C. Fifty, T. Yu, and K. Q. Weinberger, “Simplifying graph convolutional networks,” in Proceedings of the 36th International Conference on Machine Learning (ICML). 2019.
  • (14)

    M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,”

    neural information processing systems (2016) 3844–3852.
  • (15) J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. 2015.
  • (16) K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks,” arXiv preprint arXiv:1810.00826 (2018) .
  • (17) J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang, “Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec,” in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467. 2018.
  • (18) H. Tong, B. A. Prakash, T. Eliassi-Rad, M. Faloutsos, and C. Faloutsos, “Gelling, and melting, large graphs by edge manipulation,” in Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 245–254. 2012.
  • (19)

    D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,”

    IEEE Signal Processing Magazine 30 no. 3, (2013) 83–98.
  • (20) A. Ortega, P. Frossard, J. Kovačević, J. M. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges, and applications,” Proceedings of the IEEE 106 no. 5, (2018) 808–828.
  • (21) A. Sandryhaila and J. M. Moura, “Discrete signal processing on graphs,” IEEE transactions on signal processing 61 no. 7, (2013) 1644–1656.
  • (22) W. Huang, T. Zhang, Y. Rong, and J. Huang, “Adaptive sampling towards fast graph representation learning,” in Advances in Neural Information Processing Systems, pp. 4563–4572. 2018.
  • (23) C. Yang and Z. Liu, “Comprehend deepwalk as matrix factorization.” arXiv preprint arXiv:1501.00358 (2015) .
  • (24) D. Zhu, P. Cui, Z. Zhang, J. Pei, and W. Zhu, “High-order proximity preserved embedding for dynamic networks,” IEEE Transactions on Knowledge and Data Engineering 30 (2018) 2134–2144.
  • (25) A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, “Automating the construction of internet portals with machine learning,” Information Retrieval 3 no. 2, (2000) 127–163.
  • (26) P. Sen, G. M. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad, “Collective classification in network data,” Ai Magazine 29 no. 3, (2008) 93–106.
  • (27) G. H. Golub and C. F. V. Loan, “Matrix computations (3rd ed.),”.

8 Appendix

A Overview of GSGC

Figure 3: An overview.

Figure 3 provides an illustration of overview of our proposed GSGC model.

B Dataset Statistics

Dataset N E Classes Features
Cora 2,485 5,069 7 1,433
Citeseer 2,110 3,757 6 3,703
Pubmed 19,717 44,325 3 500
Table 4: Dataset Statistics. Only the largest connected component (LCC) is considered.

In Table 4, we can find the characteristics of the datasets used in this paper.

C Proofs and derivations

Proof of Lemma 1 & 2.

Lemma 1 is straightforward derived from the expression of single-layer GCN. For Lemma 2, the -localized single-layer ChebyNet with activation function is . Thus, we can directly write graph-shift filter as and linear / shift-invariant filter . ∎

Proof of Theorem 3.

We can write the -layer SGC as . Since are the learned parameters by the neural network, we can employ the reparameteration trick to use approximate the same order polynomials with new . Then we rewrite the -layer SGC by polynomial expansion as . Therefore, we can directly write the graph-shift filter with the same linear / shift-invariant filter as -localized single-layer ChebyNet. ∎

Lemma 7.

WSDM2018NetworkEmbedding Given context window size and number of negative sampling in skip-gram, the result of DeepWalk in matrix form is equivalent to factorizing matrix :

(11)

where and denotes the volume of graph . [id=TX] and are the context window size and the number of negative sampling in skip-gram, respectively.

Proof of Theorem 4.

With Lemma 7, we can explicitly write DeepWalk as , where . Therefore, we have graph-shift filter and linear / shift-invariant filter . ∎

Proof of Theorem 6.

Since is an eigenvalue of the normalized adjacent matrix with the eigenvector if and only if and solve the generalized eigen-problem , we can transfer the original estimating eigenvalue of into the above generalized eigen-problem .

We denote and as the change in eigenvalues and eigenvectors, respectively. Thus, for a specific eigen-pair we can have:

By using the fact that , we can have:

According to golub1996matrix , the higher order terms can be removed since they have limited effects on the solution. Then we can have:

Utilizing the symmetric characteristic of and we can have , we can have:

By solving this problem, we can obtain the result as: