Graph Structure Learning for Robust Graph Neural Networks

05/20/2020 ∙ by Wei Jin, et al. ∙ Penn State University Microsoft Michigan State University 5

Graph Neural Networks (GNNs) are powerful tools in representation learning for graphs. However, recent studies show that GNNs are vulnerable to carefully-crafted perturbations, called adversarial attacks. Adversarial attacks can easily fool GNNs in making predictions for downstream tasks. The vulnerability to adversarial attacks has raised increasing concerns for applying GNNs in safety-critical applications. Therefore, developing robust algorithms to defend adversarial attacks is of great significance. A natural idea to defend adversarial attacks is to clean the perturbed graph. It is evident that real-world graphs share some intrinsic properties. For example, many real-world graphs are low-rank and sparse, and the features of two adjacent nodes tend to be similar. In fact, we find that adversarial attacks are likely to violate these graph properties. Therefore, in this paper, we explore these properties to defend adversarial attacks on graphs. In particular, we propose a general framework Pro-GNN, which can jointly learn a structural graph and a robust graph neural network model from the perturbed graph guided by these properties. Extensive experiments on real-world graphs demonstrate that the proposed framework achieves significantly better performance compared with the state-of-the-art defense methods, even when the graph is heavily perturbed. We release the implementation of Pro-GNN to our DeepRobust repository for adversarial attacks and defenses (footnote: https://github.com/DSE-MSU/DeepRobust). The specific experimental settings to reproduce our results can be found in https://github.com/ChandlerBang/Pro-GNN.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Graphs are ubiquitous data structures in numerous domains, such as chemistry (molecules), finance (trading networks) and social media (the Facebook friend network). With their prevalence, it is particularly important to learn effective representations of graphs and then apply them to solve downstream tasks. Recent years have witnessed great success from Graph Neural Networks (GNNs) (Li et al., 2015; Hamilton et al., 2017; Kipf and Welling, 2016a; Veličković et al., 2018) in representation learning of graphs. GNNs follow a message-passing scheme (Gilmer et al., 2017), where the node embedding is obtained by aggregating and transforming the embeddings of its neighbors. Due to the good performance, GNNs have been applied to various analytical tasks including node classification (Kipf and Welling, 2016a), link prediction (Kipf and Welling, 2016b), and recommender systems (Ying et al., 2018).

Although promising results have been achieved, recent studies have shown that GNNs are vulnerable to adversarial attacks (Jin et al., 2020; Zügner et al., 2018; Zügner and Günnemann, 2019a; Dai et al., 2018; Wu et al., 2019b). In other words, the performance of GNNs can greatly degrade under an unnoticeable perturbation in graphs. The lack of robustness of these models can lead to severe consequences for critical applications pertaining to the safety and privacy. For example, in credit card fraud detection, fraudsters can create several transactions with only a few high-credit users to disguise themselves, thus escaping from the detection based on GNNs. Hence, developing robust GNN models to resist adversarial attacks is of significant importance. Modifying graph data can perturb either node features or graph structures. However, given the complexity of structural information, the majority of existing adversarial attacks on graph data have focused on modifying graph structure especially adding/deleting/rewiring edges (Xu et al., 2019). Thus, in this work, we aim to defend against the most common setting of adversarial attacks on graph data, i.e., poisoning adversarial attacks on graph structure. Under this setting, the graph structure has already been perturbed by modifying edges before training GNNs while node features are not changed.

(a) Singular Values
(b) Rank Growth
(c) Rank Decrease Rate
(d) Feature Smoothness
Figure 1. An illustrative example on the property changes of the adjacency matrix by adversarial attacks

One perspective to design an effective defense algorithm is to clean the perturbed graph such as removing the adversarial edges and restoring the deleted edges (Zhu et al., 2019; Tang et al., 2019). The key challenge from this perspective is what criteria we should follow to clean the perturbed graph. It is well known that real-world graphs often share certain properties. First, many real-world clean graphs are low-rank and sparse (Zhou et al., 2013). For instance, in a social network, most individuals are connected with only a small number of neighbors and there are only a few factors influencing the connections among users (Zhou et al., 2013; Fortunato, 2010). Second, connected nodes in a clean graph are likely to share similar features or attributes (or feature smoothness) (McPherson et al., 2001). For example, in a citation network, two connected publications often share similar topics (Kipf and Welling, 2016a). Figure 1 demonstrates these properties of clean and poisoned graphs. Specifically, we apply the state-of-the-art graph poisoning attack, metattack (Zügner and Günnemann, 2019a), to perturb the graph data and visualize the graph properties before and after mettack. As shown in Figure (a)a, metattack enlarges the singular values of the adjacency matrix and Figure (b)b illustrates that metattack quickly increases the rank of adjacency matrix. Moreover, when we remove the adversarial and normal edges from the perturbed graph respectively, we observe that removing adversarial edges reduces the rank faster than removing normal edges as demonstrated in Figure (c)c. In addition, we depict the density distribution of feature difference of connected nodes of the attacked graph in Figure (d)d. It is observed that metattack tends to connect nodes with large feature difference. Observations from Figure 1 indicate that adversarial attacks could violate these properties. Thus, these properties have the potential to serve as the guidance to clean the perturbed graph. However, work of exploring these properties to build robust graph neural networks is rather limited.

In this paper, we target on exploring graph properties of sparsity, low rank and feature smoothness to design robust graph neural networks. Note that there could be more properties to be explored and we would like to leave it as future work. In essence, we are faced with two challenges: (i) how to learn clean graph structure from poisoned graph data guided by these properties; and (ii) how to jointly learn parameters for robust graph neural network and the clean structure. To solve these two challenges, we propose a general framework Property GNN (Pro-GNN) to simultaneously learn the clean graph structure from perturbed graph and GNN parameters to defend against adversarial attacks. Extensive experiments on a variety of real-world graphs demonstrate that our proposed model can effectively defend against different types of adversarial attacks and outperforms the state-of-the-art defense methods.

The rest of the paper is organized as follows. In Section , we review some of the related work. In Section , we introduce notations and formally define the problem. We explain our proposed framework in Section and report our experimental results in Section . Finally, we conclude the work with future directions in Section .

2. Related Work

In line with the focus of our work, we briefly describe related work on GNNs, and adversarial attacks and defense for graph data.

2.1. Graph Neural Networks

Over the past few years, graph neural networks have achieved great success in solving machine learning problems on graph data. To learn effective representation of graph data, two main families of GNNs have been proposed, i.e., spectral methods and spatial methods. The first family learns node representation based on graph spectral theory 

(Kipf and Welling, 2016a; Bruna et al., 2013; Defferrard et al., 2016). Bruna et al. (Bruna et al., 2013) generalize the convolution operation from Euclidean data to non-Euclidean data by using the Fourier basis of a given graph. To simplify spectral GNNs, Defferrard et al. (Defferrard et al., 2016) propose ChebNet and utilize Chebyshev polynomials as the convolution filter. Kipf et al. (Kipf and Welling, 2016a) propose GCN and simplify ChebNet by using its first-order approximation. Further, Simple Graph Convolution (SGC) (Wu et al., 2019a) reduces the graph convolution to a linear model but still achieves competitive performance. The second family of models define graph convolutions in the spatial domain as aggregating and transforming local information (Hamilton et al., 2017; Gilmer et al., 2017; Veličković et al., 2018). For instance, DCNN (Atwood and Towsley, 2016)

treats graph convolutions as a diffusion process and assigns a certain transition probability for information transferred from one node to the adjacent node. Hamilton et al. 

(Hamilton et al., 2017) propose to learn aggregators by sampling and aggregating neighbor information. Veličković et al. (Veličković et al., 2018) propose graph attention network (GAT) to learn different attention scores for neighbors when aggregating information. To further improve the training efficiency, FastGCN (Chen et al., 2018) interprets graph convolutions as integral transforms of embedding functions under probability measures and performs importance sampling to sample a fixed number of nodes for each layer. For a thorough review, we please refer the reader to recent surveys  (Zhou et al., 2018; Wu et al., 2019c; Battaglia et al., 2018).

2.2. Adversarial Attacks and Defense for GNNs

Extensive studies have demonstrated that deep learning models are vulnerable to adversarial attacks. In other words, slight or unnoticeable perturbations to the input can fool a neural network to output a wrong prediction. GNNs also suffer this problem 

(Jin et al., 2020; Zügner et al., 2018; Dai et al., 2018; Zügner and Günnemann, 2019a; Ma et al., 2019; Liu et al., 2019; Bojchevski and Günnemann, 2019; Wu et al., 2019b). Different from image data, the graph structure is discrete and the nodes are dependent of each other, thus making it far more challenging. The nettack  (Zügner et al., 2018) generates unnoticeable perturbations by preserving degree distribution and imposing constraints on feature co-occurrence. RL-S2V (Dai et al., 2018)

employs reinforcement learning to generate adversarial attacks. However, both of the two methods are designed for targeted attack and can only degrade the performance of GNN on target nodes. To perturb the graph globally,

metattack  (Zügner and Günnemann, 2019a) is proposed to generate poisoning attacks based on meta-learning. Although increasing efforts have been devoted to developing adversarial attacks on graph data, the research about improving the robustness of GNNs has just started recently (Zhu et al., 2019; Wu et al., 2019b; Tang et al., 2019; Zügner and Günnemann, 2019b). One way to solve the problem is to learn a robust network by penalizing the attention scores of adversarial edges. RGCN (Zhu et al., 2019)

is to model Gaussian distributions as hidden layers to absorb the effects of adversarial attacks in the variances. PA-GNN 

(Tang et al., 2019) leverages supervision knowledge from clean graphs and applies a meta-optimization way to learn attention scores for robust graph neural networks. However, it requires additional graph data from similar domain. The other way is to preprocess the perturbed graphs to get clean graphs and train GNNs on the clean ones. Wu et. al (Wu et al., 2019b) have found that attackers tend to connect to nodes with different features and they propose to remove the links between dissimilar nodes. Entezari et al. (Entezari et al., 2020) have observed that nettack results in changes in high-rank spectrum of the graph and propose to preprocess the graph with its low-rank approximations. However, due to the simplicity of two-stage preprocessing methods, they may fail to counteract complex global attacks.

Different from the aforementioned defense methods, we aim to explore important graph properties to recover the clean graph while learning the GNN parameters simultaneously, which enables the proposed model to extract intrinsic structure from perturbed graph under different attacks.

3. Problem Statement

Before we present the problem statement, we first introduce some notations and basic concepts. The Frobenius norm of a matrix is defined by . The norm of a matrix is given by and the nuclear norm of a matrix is defined as , where is the -th singular value of . denotes the element-wise positive part of matrix where and indicates the sign matrix of where or if ¿0, =0, or ¡0, respectively. We use to denote Hadamard product of matrices. Finally, we use to indicate the trace of matrix , i.e., .

Let be a graph, where is the set of nodes and is the set of edges. The edges describe the relations between nodes and can also be represented by an adjacency matrix where denotes the relation between nodes and . Furthermore, we use to denote the node feature matrix where

is the feature vector of the node

. Thus a graph can also be denoted as . Following the common node classification setting, only a part of nodes are associated with corresponding labels where denotes the label of .

Given a graph and the partial labels , the goal of node classification for GNN is to learn a function that maps the nodes to the set of labels so that can predict labels of unlabeled nodes. The objective function can be formulated as

(1)

where is the parameters of , is the prediction of node and is to measure the difference between prediction and true label such as cross entropy. Though there exist a number of different GNN methods, in this work, we focus on Graph Convolution Network (GCN) in (Kipf and Welling, 2016a). Note that it is straightforward to extend the proposed framework to other GNN models. Specifically, a two-layer GCN with implements as

(2)

where and is the diagonal matrix of with .

is the activation function such as ReLU.

With aforementioned notations and definitions, the problem we aim to study in this work can be formally stated as: Given and partial node label with being poisoned by adversarial edges and feature matrix unperturbed, simultaneously learn a clean graph structure with the graph adjacency matrix and the GNN parameters to improve node classification performance for unlabeled nodes.

4. The Proposed Framework

Adversarial attacks generate carefully-crafted perturbation on graph data. We refer to the carefully-crafted perturbation as adversarial structure. Adversarial structure can cause the performance of GNNs to drop rapidly. Thus, to defend adversarial attacks, one natural strategy is to eliminate the crafted adversarial structure, while maintaining the intrinsic graph structure. In this work, we aim to achieve the goal by exploring graph structure properties of low rank, sparsity and feature smoothness. The illustration of the framework is shown in Figure 2, where edges in black are normal edges and edges in red are adversarial edges introduced by an attacker to reduce the node classification performance. To defend against the attacks, Pro-GNN iteratively reconstructs the clean graph by preserving the low rank, sparsity, and feature smoothness properties of a graph so as to reduce the negative effects of adversarial structure. Meanwhile, to make sure that the reconstructed graph can help node classification, Pro-GNN simultaneously updates the GNN parameters on the reconstructed graph by solving the optimization problem in an alternating schema. In the following subsections, we will give the details of the proposed framework.

Figure 2. Overall framework of Pro-GNN. Dash lines indicate smaller weights.

4.1. Exploring Low rank and Sparsity Properties

Many real-world graphs are naturally low-rank and sparse as the entities usually tend to form communities and would only be connected with a small number of neighbors (Zhou et al., 2013). Adversarial attacks on GCNs tend to add adversarial edges that link nodes of different communities as this is more efficient to reduce node classification performance of GCN. Introducing links connecting nodes of different communities in a sparse graph can significantly increase the rank of the adjacency matrix and enlarge the singular values, thus damaging the low rank and sparsity properties of graphs, which is verified in Figure (a)a and Figure (b)b. Thus, to recover the clean graph structure from the noisy and perturbed graph, one potential way is to learn a clean adjacency matrix close to the adjacency matrix of the poisoned graph by enforcing the new adjacency matrix with the properties of low rank and sparsity. As demonstrated in Figure(c)c, the rank decreases much faster by removing adversarial edges than by removing normal edges. This implies that the low rank and sparsity constraint can remove the adversarial edges instead of normal edges. Given the adjacency matrix of a poisoned graph, we can formulate the above process as follows:

(3)

Since adversarial attacks target on performing unnoticeable perturbations to graphs, the first term ensures that the new adjacency matrix should be close to . As we assume that the graph are undirected, the new adjacency matrix should be symmetric, i.e., . denotes the constraints on to enforce the properties of low rank and sparsity. According to (Candès and Recht, 2009; Koltchinskii et al., 2011; Richard et al., 2012), minimizing the norm and the nuclear norm of a matrix can force the matrix to be sparse and low-rank, respectively. Hence, to ensure a sparse and low-rank graph, we want to minimize the norm and the nuclear norm of . Eq. (3) can be rewritten as:

(4)

where and are predefined parameters that control the contributions of the properties of sparsity and low rank, respectively. One important benefit to minimize the nuclear norm is that we can reduce every singular value, thus alleviating the impact of enlarging singular values from adversarial attacks.

4.2. Exploring Feature Smoothness

It is evident that connected nodes in a graph are likely to share similar features. In fact, this observation has been made on graphs from numerous domains. For example, two connected users in a social graph are likely to share similar attributes (McPherson et al., 2001), two linked web pages in the webpage graph tend to have similar contents (Vaughan et al., 2007) and two connected papers in the citation network usually have similar topics (Kipf and Welling, 2016a). Meanwhile, recently it is demonstrated that adversarial attacks on graphs tend to connect nodes with distinct features (Wu et al., 2019b). Thus, we aim to ensure the feature smoothness in the learned graph. The feature smoothness can be captured by the following term :

(5)

where is the new adjacency matrix, indicates the connection of and in the learned graph and measures the feature difference between and . can be rewritten as:

(6)

where is the graph Laplacian matrix of and is the diagonal matrix of . In this work, we use normalized Laplacian matrix instead of to make feature smoothness independent on the degrees of the graph nodes (Ando and Zhang, 2007), i.e.,

(7)

where denotes the degree of in the learned graph. In the learned graph, if and are connected (i.e., ), we expect that the feature difference should be small. In other words, if the features between two connected node are quite different, would be very large. Therefore, the smaller is, the smoother features are on the graph . Thus, to fulfill the feature smoothness in the learned graph, we should minimize . Therefore, we can add the feature smoothness term to the objective function of Eq. (4) to penalize rapid changes in features between adjacent nodes as:

(8)

where is a predefined parameter to control the contribution from feature smoothness.

4.3. Objective Function of Pro-GNN

Intuitively, we can follow the preprocessing strategy (Wu et al., 2019b; Entezari et al., 2020) to defend against adversarial attacks – we first learn a graph from the poisoned graph via Eq. (8) and then train a GNN model based on the learned graph. However, with such a two-stage strategy, the learned graph may be suboptimal for the GNN model on the given task. Thus, we propose a better strategy to jointly learn the graph structure and the GNN model for a specific downstream task. We empirically show that jointly learning GNN model and the adjacency matrix is better than two stage one in Sec 5.4.2. The final objective function of Pro-GNN is given as

(9)

where

is a loss function for the GNN model that is controlled by a predefined parameter

. Another benefit of this formulation is that the information from can also guide the graph learning process to defend against adversarial attacks since the goal of graph adversarial attacks is to maximize .

4.4. An Optimization Algorithm

Jointly optimizing and in Eq.(9) is challenging. The constraints on further exacerbate the difficulty. Thus, in this work, we use an alternating optimization schema to iteratively update and . Update . To update , we fix and remove terms that are irrelevant to , then the objective function in Eq.(9) reduces to:

(10)

which is a typical GNN optimization problem and we can learn

via stochastic gradient descent.

Update . Similarly, to update , we fix and arrive at

(11)

where is defined as

(12)

Note that both norm and nuclear norm are non-differentiable. For optimization problem with only one non-diffiential regularizer , we can use Forward-Backward splitting methods (Combettes and Pesquet, 2011). The idea is to alternate a gradient descent step and a proximal step as:

(13)

where is the learning rate, is the proximal operator as:

(14)

In particular, the proximal operator of norm and nuclear norm can be represented as (Richard et al., 2012; Beck and Teboulle, 2009),

(15)
(16)

where

is the singular value decomposition of

. To optimize objective function with two non-differentiable regularizers, Richard et al. (Raguet et al., 2013) introduce the Incremental Proximal Descent method based on the introduced proximal operators. By iterating the updating process in a cyclic manner, we can update as follows,

(17)

After we learn a relaxed , we project to satisfy the constraints. For the symmetric constraint, we let . For the constraint , we project to and to . We denote these projection operations as . Training Algorithm. With these updating and projection rules, the optimization algorithm is shown in Algorithm 1

. In line 1, we first initialize the estimated graph

as the poisoned graph . In line 2, we randomly initialize the GNN parameters. From lines 3 to 10, we update and the GNN parameters alternatively and iteratively. Specifically, we train the GNN parameters in each iteration while training the graph reconstruction model every iterations.

Data: Adjacency matrix , Attribute matrix , Labels , Hyper-parameters , Learning rate
Result: Learned adjacency , GNN parameters
1 Initialize
2 Randomly initialize
3 while Stopping condition is not met do
4      
5      
6      
7      
8       for i=1 to  do
9            
10            
11      
Return
Algorithm 1 Pro-GNN

5. Experiments

In this section, we evaluate the effectiveness of Pro-GNN against different graph adversarial attacks. In particular, we aim to answer the following questions:

  • [leftmargin=*]

  • RQ1 How does Pro-GNN perform compared to the state-of-the-art defense methods under different adversarial attacks?

  • RQ2 Does the learned graph work as expected?

  • RQ3 How do different properties affect performance of Pro-GNN.

Before presenting our experimental results and observations, we first introduce the experimental settings.

5.1. Experimental settings

5.1.1. Datasets

Following (Zügner et al., 2018; Zügner and Günnemann, 2019a), we validate the proposed approach on four benchmark datasets, including three citation graphs, i.e., Cora, Citeseer and Pubmed, and one blog graph, i.e., Polblogs. The statistics of the datasets are shown in Table 1. Note that in the Polblogs graph, node features are not available. In this case, we set the attribute matrix to identity matrix.

Classes Features
Cora 2,485 5,069 7 1,433
Citeseer 2,110 3,668 6 3,703
Polblogs 1,222 16,714 2 /
Pubmed 19,717 44,338 3 500
Table 1. Dataset Statistics. Following (Zügner et al., 2018; Zügner and Günnemann, 2019a; Entezari et al., 2020), we only consider the largest connected component (LCC).
Dataset Ptb Rate (%) GCN GAT RGCN GCN-Jaccard222 GCN-SVD Pro-GNN-fs Pro-GNN333
Cora 0 83.500.44 83.970.65 83.090.44 82.050.51 80.630.45 83.420.52 82.980.23
5 76.550.79 80.440.74 77.420.39 79.130.59 78.390.54 82.780.39 82.270.45
10 70.391.28 75.610.59 72.220.38 75.160.76 71.470.83 77.910.86 79.030.59
15 65.100.71 69.781.28 66.820.39 71.030.64 66.691.18 76.011.12 76.401.27
20 59.562.72 59.940.92 59.270.37 65.710.89 58.941.13 68.785.84 73.321.56
25 47.531.96 54.780.74 50.510.78 60.821.08 52.061.19 56.542.58 69.721.69
Citeseer 0 71.960.55 73.260.83 71.200.83 72.100.63 70.650.32 73.260.38 73.280.69
5 70.880.62 72.890.83 70.500.43 70.510.97 68.840.72 73.090.34 72.930.57
10 67.550.89 70.630.48 67.710.30 69.540.56 68.870.62 72.430.52 72.510.75
15 64.521.11 69.021.09 65.690.37 65.950.94 63.260.96 70.820.87 72.031.11
20 62.033.49 61.041.52 62.491.22 59.301.40 58.551.09 66.192.38 70.022.28
25 56.942.09 61.851.12 55.350.66 59.891.47 57.181.87 66.402.57 68.952.78
Polblogs 0 95.690.38 95.350.20 95.220.14 - 95.310.18 93.200.64 -
5 73.070.80 83.691.45 74.340.19 - 89.090.22 93.290.18 -
10 70.721.13 76.320.85 71.040.34 - 81.240.49 89.421.09 -
15 64.961.91 68.801.14 67.280.38 - 68.103.73 86.042.21 -
20 51.271.23 51.501.63 59.890.34 - 57.333.15 79.565.68 -
25 49.231.36 51.191.49 56.020.56 - 48.669.93 63.184.40 -
Pubmed 0 87.190.09 83.730.40 86.160.18 87.060.06 83.440.21 87.330.18 87.260.23
5 83.090.13 78.000.44 81.080.20 86.390.06 83.410.15 87.250.09 87.230.13
10 81.210.09 74.930.38 77.510.27 85.700.07 83.270.21 87.250.09 87.210.13
15 78.660.12 71.130.51 73.910.25 84.760.08 83.100.18 87.200.09 87.200.15
20 77.350.19 68.210.96 71.180.31 83.880.05 83.010.22 87.090.10 87.150.15
25 75.500.17 65.410.77 67.950.15 83.660.06 82.720.18 86.710.09 86.760.19
  • JaccardGCN and Pro-GNN cannot be directly applied to datasets where node features are not available.

Table 2. Node classification performance (AccuracyStd) under non-targeted attack (metattack ).

5.1.2. Baselines

To evaluate the effectiveness of Pro-GNN, we compare it with the state-of-the-art GNN and defense models by using the adversarial attack repository DeepRobust (Li et al., 2020):

  • [leftmargin=*]

  • GCN (Kipf and Welling, 2016a): while there exist a number of different Graph Convolutional Networks (GCN) models, we focus on the most representative one (Kipf and Welling, 2016a).

  • GAT (Veličković et al., 2018): Graph Attention Netowork (GAT) is composed of attention layers which can learn different weights to different nodes in the neighborhood. It is often used as a baseline to defend against adversarial attacks.

  • RGCN (Zhu et al., 2019): RGCN models node representations as gaussian distributions to absorb effects of adversarial attacks. It also employs attention mechanism to penalize nodes with high variance.

  • GCN-Jaccard (Wu et al., 2019b): Since attackers tend to connect nodes with dissimilar features or different labels, GCN-Jaccard preprocesses the network by eliminating edges that connect nodes with jaccard similarity of features smaller than threshold . Note that this method only works when node features are available.

  • GCN-SVD (Entezari et al., 2020): This is another preprocessing method to resist adversarial attacks. It is noted that nettack is a high-rank attack, thus GCN-SVD proposes to vaccinate GCN with the low-rank approximation of the perturbed graph. Note that it originally targets at defending against nettack, however, it is straightforward to extend it to non-targeted and random attacks.

In addition to representative baselines, we also include one variant of the proposed framework, Pro-GNN-fs, which is the variant by eliminating the feature smoothness term (or setting ).

5.1.3. Parameter Settings

For each graph, we randomly choose 10% of nodes for training, 10% of nodes for validation and the remaining 80% of nodes for testing. For each experiment, we report the average performance of 10 runs. The hyper-parameters of all the models are tuned based on the loss and accuracy on validation set. For GCN and GAT, we adopt the default parameter setting in the author’s implementation. For RGCN, the number of hidden units are tuned from . For GCN-Jaccard, the threshold of similarity for removing dissimilar edges is chosen from . For GCN-SVD , the reduced rank of the perturbed graph is tuned from .

5.2. Defense Performance

To answer the first question, we evaluate the node classification performance of Pro-GNN against three types of attacks, i.e., non-targeted attack, targeted attack and random attack:

  • [leftmargin=*]

  • Targeted Attack: Targeted attack generates attacks on specific nodes and aims to fool GNNs on these target nodes. We adopt nettack (Zügner et al., 2018) for the targeted attack method, which is the state-of-the-art targeted attack on graph data.

  • Non-targeted Attack: Different from targeted attack, the goal of non-targeted attack is to degrade the overall performance of GNNs on the whole graph. We adopt one representative non-targeted attack, metattack (Zügner and Günnemann, 2019a) .

  • Random Attack: It randomly injects fake edges into the graph. It can also be viewed as adding random noise to the clean graph.

(a) Cora
(b) Citeseer
(c) Polblogs
(d) Pubmed
Figure 3. Results of different models under nettack
(a) Cora
(b) Citeseer
(c) Polblogs
(d) Pubmed
Figure 4. Results of different models under random attack

We first use the attack method to poison the graph. We then train Pro-GNN and baselines on the poisoned graph and evaluate the node classification performance achieved by these methods.

5.2.1. Against Non-targeted Adversarial Attacks

We first evaluate the node classification accuracy of different methods against non-targeted adversarial attack. Specifically, we adopt metattack and keep all the default parameter settings in the authors’ original implementation. The metattack has several variants. For Cora, Citeseer and Polblogs datasets, we apply Meta-Self since it is the most destructive attack variant; while for Pubmed, the approximate version of Meta-Self, A-Meta-Self is applied to save memory and time. We vary the perturbation rate, i.e., the ratio of changed edges, from to with a step of

. As mentioned before, all the experiments are conducted 10 times and we report the average accuracy with standard deviation in Table 

2. The best performance is highlighted in bold. From the table, we make the following observations:

  • [leftmargin=*]

  • Our method consistently outperforms other methods under different perturbation rates. For instance, on Polblogs dataset our model improves GCN over 20% at 5% perturbation rate. Even under large perturbation, our method outperforms other baselines by a larger margin. Specifically, under the perturbation rate on the three datasets, vanilla GCN performs very poorly and our model improves GCN by 22%, 12% and 14%, respectively.

  • Although GCN-SVD also employs SVD to get low-rank approximation of the graph, the performance of GCN-SVD drops rapidly. This is because GCN-SVD is designed for targeted attack, it cannot adapt well to the non-targeted adversarial attack. Similarly, GCN-Jaccard does not perform as well as Pro-GNN under different perturbation rates. This is because simply preprocessing the perturbed graph once cannot recover the complex intrinsic graph structure from the carefully-crafted adversarial noises. On the contrary, simultaneously updating the graph structure and GNN parameters with the low rank, sparsity and feature smoothness constraints helps recover better graph structure and learn robust GNN parameters.

  • Pro-GNN achieves higher accuracy than Pro-GNN-fs especially when the perturbation rate is large, which demonstrates the effectiveness of feature smoothing in removing adversarial edges.

5.2.2. Against Targeted Adversarial Attack

In this experiment, nettack is adopted as the targeted-attack method and we use the default parameter settings in the authors’ original implementation. Following (Zhu et al., 2019), we vary the number of perturbations made on every targeted node from to with a step size of . The nodes in test set with degree larger than are set as target nodes. For Pubmed dataset, we only sample 10% of them to reduce the running time of nettack while in other datasets we use all the target nodes. The node classification accuracy on target nodes is shown in Figure 3. From the figure, we can observe that when the number of perturbation increases, the performance of our method is better than other methods on the attacked target nodes in most cases. For instance, on Citeseer dataset at 5 perturbation per targeted node, our model improves vanilla GCN by 23% and outperforms other defense methods by 11%. It demonstrates that our method can also resist the targeted adversarial attack.

5.2.3. Against Random Attack

In this subsection, we evaluate how Pro-GNN behaves under different ratios of random noises from to with a step size of . The results are reported in Figure 4. The figure shows that Pro-GNN consistently outperforms all other baselines and successfully resists random attack. Together with observations from Sections 5.2.1 and 5.2.2, we can conclude that Pro-GNN is able to defend various types of adversarial attacks. This is a desired property in practice since attackers can adopt any kinds of attacks to fool the system.

5.3. Importance of Graph Structure Learning

In the previous subsection, we have demonstrated the effectiveness of the proposed framework. In this section, we aim to understand the graph we learned and answer the second question.

(a) Pubmed
(b) Polblogs
Figure 5. Weight density distributions of normal and adversarial edges on the learned graph.

5.3.1. Normal Edges Against Adversarial Edges

Based on the fact that adversary tends to add edges over delete edges (Wu et al., 2019b; Zügner and Günnemann, 2019a), if the model tends to learn a clean graph structure, the impact of the adversarial edges should be mitigated from the poisoned graph. Thus, we investigate the weights of normal and adversarial edges in the learned adjacency matrix . We visualize the weight density distribution of normal and perturbed edges of in Figure 5. Due to the limit of space, we only show results on Pubmed and Polblogs under metattack. As we can see in the figure, in both datasets, the weights of adversarial edges are much smaller than those of normal edges, which shows that Pro-GNN can alleviate the effect of adversarial edges and thus learn robust GNN parameters.

5.3.2. Performance on Heavily Poisoned Graph

In this subsection, we study the performance when the graph is heavily poisoned. In particular, we poison the graph with 25% perturbation by metattack

. If a graph is heavily poisoned, the performance of GCN will degrade a lot. One straightforward solution is to remove the poisoned graph structure. Specifically, when removing the graph structure, the adjacency matrix will be all zeros and GCN normalizes the zero matrix into identity matrix and then makes prediction totally by node features. Under this circumstance, GCN actually becomes a feed-forward neural network. We denote it as GCN-NoGraph. We report the performance of GCN, GCN-NoGraph and Pro-GNN when the graph is heavily poisoned in Table 

3.

From the table, we first observe that when the graph structure is heavily poisoned, by removing the graph structure, GCN-NoGraph outperforms GCN. This observation suggests the necessity to defend poisoning attacks on graphs because the poisoned graph structure are useless or even hurt the prediction performance. We also note that Pro-GNN obtains much better results than GCN-NoGraph. This observation suggests that Pro-GNN can learn useful graph structural information even when the graph is heavily poisoned.

GCN GCN-NoGraph Pro-GNN
Cora 47.531.96 62.121.55 69.721.69
Citeseer 56.942.09 63.753.23 68.952.78
Polblogs 49.231.36 51.790.62 63.184.40
Pubmed 75.500.17 84.140.11 86.860.19
Table 3. Node classification accuracy given the graph under 25% perturbation by metattack.

5.4. Ablation Study

To get a better understanding of how different components help our model defend against adversarial attacks, we conduct ablation studies and answer the third question in this subsection.

5.4.1. Regularizers

There are four key predefined parameters, i.e., , , and , which control the contributions for sparsity, low rank, GNN loss and feature smoothness, respectively. To understand the impact of each component, we vary the values of one parameter and set other parameters to zero, and then check how the performance changes. Correspondingly, four model variants are created: Pro-GNN-, Pro-GNN-, Pro-GNN- and Pro-GNN-. For example, Pro-GNN- denotes that we vary the values of while setting , and to zero. We only report results on Cora and Citeseer, since similar patterns are observed in other cases, shown in Figure 6.

From the figure we can see Pro-GNN- does not boost the model’s performance too much with small perturbations. But when the perturbation becomes large, Pro-GNN- outperforms vanilla GCN because it can learn a graph structure better than a heavily poisoned adjacency graph as shown in Section 5.3.2. Also, Pro-GNN- and Pro-GNN- perform much better than vanilla GCN. It is worth noting that, Pro-GNN- outperforms all other variants except Pro-GNN, indicating that nuclear norm is of great significance in reducing the impact of adversarial attacks. It is in line with our observation that adversarial attacks increase the rank of the graph and enlarge the singular values. Another observation from the figure is that, Pro-GNN- works better under small perturbation and when the perturbation rate increases, its performance degrades. From the above observations, different components play different roles in defending adversarial attacks. By incorporating these components, Pro-GNN can explore the graph properties and thus consistently outperform state-of-the-art baselines.

5.4.2. Two-Stage vs One-Stage

Ptb Rate (%) 0 5 10 15 20 25
Pro-GNN-two 73.310.71 73.701.02 73.690.81 75.381.10 73.221.08 70.570.61
Pro-GNN 82.980.23 82.270.45 79.030.59 76.401.27 73.321.56 69.721.69
Table 4. Classification performance of Pro-GNN-two and Pro-GNN on Cora dataset

To study the contribution of jointly learning structure and GNN parameters, we conduct experiments with the variant Pro-GNN-two under metattack . Pro-GNN-two is the two stage variant of Pro-GNN where we first obtain the clean graph and then train a GNN model based on it. We only show the results on Cora in Table 4 due to the page limitation. We can observe from the results that although Pro-GNN-two can achieve good performance under large perturbation, it fails to defend the attacks when the perturbation rate is relatively low. The results demonstrate that jointly learning structure and GNN parameters can actually help defend attacks.

(a) Cora
(b) Citeseer
Figure 6. Classification performance of Pro-GNN variants.

5.5. Parameter Analysis

In this subsection, we explore the sensitivity of hyper-parameters and for Pro-GNN. In the experiments, we alter the value of and to see how they affect the performance of our model. More specifically, we vary from to in a log scale of base 2, from to , from to in a log scale of base 2 and from to in a log scale of base 2. We only report the results on Cora dataset with the perturbation rate of by metattack since similar observations are made in other settings.

The performance change of Pro-GNN is illustrated in Figure 7. As we can see, the accuracy of Pro-GNN can be boosted when choosing appropriate values for all the hyper-parameters. Different from , appropriate values of and can boost the performance but large values will greatly hurt the performance. This is because focusing on sparsity and feature smoothness will result in inaccurate estimation on the graph structure. For example, if we set and to , we will get a trivial solution of the new adjacency matrix, i.e, . It is worth noting that, appropriate value of can greatly increase the model’s performance (more than 10%) compared with the variant without , while too large or too small value of will hurt the performance. This is also consistent with our observation in Section 5.4.1 that the low rank property plays an important role in defending adversarial attacks.

(a)
(b)
(c)
(d)
Figure 7. Results of parameter analysis on Cora dataset

6. Conclusion

Graph neural networks can be easily fooled by graph adversarial attacks. To defend against different types of graph adversarial attacks, we introduced a novel defense approach Pro-GNN that learns the graph structure and the GNN parameters simultaneously. Our experiments show that our model consistently outperforms state-of-the-art baselines and improves the overall robustness under various adversarial attacks. In the future, we aim to explore more properties to further improve the robustness of GNNs.

7. Acknowledgements

This research is supported by the National Science Foundation (NSF) under grant numbers IIS1907704, IIS1928278, IIS1714741, IIS1715940, IIS1845081, IIS1909702 and CNS1815636.

References

  • R. K. Ando and T. Zhang (2007) Learning on graph with laplacian regularization. In NeurIPS, Cited by: §4.2.
  • J. Atwood and D. Towsley (2016)

    Diffusion-convolutional neural networks

    .
    In NeurIPS, Cited by: §2.1.
  • P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, et al. (2018) Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261. Cited by: §2.1.
  • A. Beck and M. Teboulle (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences. Cited by: §4.4.
  • A. Bojchevski and S. Günnemann (2019) Adversarial attacks on node embeddings via graph poisoning. In ICML, Cited by: §2.2.
  • J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun (2013) Spectral networks and locally connected networks on graphs. External Links: 1312.6203 Cited by: §2.1.
  • E. J. Candès and B. Recht (2009) Exact matrix completion via convex optimization. Foundations of Computational mathematics 9 (6), pp. 717. Cited by: §4.1.
  • J. Chen, T. Ma, and C. Xiao (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling. In ICLR, Cited by: §2.1.
  • P. L. Combettes and J. Pesquet (2011) Proximal splitting methods in signal processing. In Fixed-point algorithms for inverse problems in science and engineering, pp. 185–212. Cited by: §4.4.
  • H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song (2018) Adversarial attack on graph structured data. In ICML, Cited by: §1, §2.2.
  • M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In NeurIPS, Cited by: §2.1.
  • N. Entezari, S. A. Al-Sayouri, A. Darvishzadeh, and E. E. Papalexakis (2020) All you need is low (rank) defending against adversarial attacks on graphs. In WSDM, Cited by: §2.2, §4.3, 5th item, Table 1.
  • S. Fortunato (2010) Community detection in graphs. Physics reports. Cited by: §1.
  • J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl (2017) Neural message passing for quantum chemistry. pp. 1263–1272. Cited by: §1, §2.1.
  • W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In NeurIPS, Cited by: §1, §2.1.
  • W. Jin, Y. Li, H. Xu, Y. Wang, and J. Tang (2020) Adversarial attacks and defenses on graphs: a review and empirical study. External Links: 2003.00653 Cited by: §1, §2.2.
  • T. N. Kipf and M. Welling (2016a) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §1, §2.1, §3, §4.2, 1st item.
  • T. N. Kipf and M. Welling (2016b) Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. Cited by: §1.
  • V. Koltchinskii, K. Lounici, A. B. Tsybakov, et al. (2011) Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. The Annals of Statistics 39 (5), pp. 2302–2329. Cited by: §4.1.
  • Y. Li, W. Jin, H. Xu, and J. Tang (2020)

    DeepRobust: a pytorch library for adversarial attacks and defenses

    .
    External Links: 2005.06149 Cited by: §5.1.2.
  • Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §1.
  • X. Liu, S. Si, J. Zhu, Y. Li, and C. Hsieh (2019)

    A unified framework for data poisoning attack to graph-based semi-supervised learning

    .
    In NeurIPS, Cited by: §2.2.
  • Y. Ma, S. Wang, L. Wu, and J. Tang (2019) Attacking graph convolutional networks via rewiring. arXiv preprint arXiv:1906.03750. Cited by: §2.2.
  • M. McPherson, L. Smith-Lovin, and J. M. Cook (2001) Birds of a feather: homophily in social networks. Annual review of sociology 27 (1), pp. 415–444. Cited by: §1, §4.2.
  • H. Raguet, J. Fadili, and G. Peyré (2013) A generalized forward-backward splitting. SIAM Journal on Imaging Sciences 6 (3), pp. 1199–1226. Cited by: §4.4.
  • E. Richard, P. Savalle, and N. Vayatis (2012) Estimation of simultaneously sparse and low rank matrices. In ICML, Cited by: §4.1, §4.4.
  • X. Tang, Y. Li, Y. Sun, H. Yao, P. Mitra, and S. Wang (2019)

    Robust graph neural network against poisoning attacks via transfer learning

    .
    arXiv preprint arXiv:1908.07558. Cited by: §1, §2.2.
  • L. Vaughan, M. E. Kipp, and Y. Gao (2007) Why are websites co-linked? the case of canadian universities. Scientometrics 72 (1), pp. 81–92. Cited by: §4.2.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2018) Graph attention networks. Cited by: §1, §2.1, 2nd item.
  • F. Wu, T. Zhang, A. H. d. Souza Jr, C. Fifty, T. Yu, and K. Q. Weinberger (2019a) Simplifying graph convolutional networks. arXiv preprint arXiv:1902.07153. Cited by: §2.1.
  • H. Wu, C. Wang, Y. Tyshetskiy, A. Docherty, K. Lu, and L. Zhu (2019b) Adversarial examples for graph data: deep insights into attack and defense. In IJCAI, Cited by: §1, §2.2, §4.2, §4.3, 4th item, §5.3.1.
  • Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu (2019c) A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596. Cited by: §2.1.
  • H. Xu, Y. Ma, H. Liu, D. Deb, H. Liu, J. Tang, and A. Jain (2019) Adversarial attacks and defenses in images, graphs and text: a review. arXiv preprint arXiv:1909.08072. Cited by: §1.
  • R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec (2018) Graph convolutional neural networks for web-scale recommender systems. In KDD, Cited by: §1.
  • J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and M. Sun (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §2.1.
  • K. Zhou, H. Zha, and L. Song (2013) Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes. In Artificial Intelligence and Statistics, pp. 641–649. Cited by: §1, §4.1.
  • D. Zhu, Z. Zhang, P. Cui, and W. Zhu (2019) Robust graph convolutional networks against adversarial attacks. In KDD, Cited by: §1, §2.2, 3rd item, §5.2.2.
  • D. Zügner, A. Akbarnejad, and S. Günnemann (2018) Adversarial attacks on neural networks for graph data. In KDD, Cited by: §1, §2.2, 1st item, §5.1.1, Table 1.
  • D. Zügner and S. Günnemann (2019a) Adversarial attacks on graph neural networks via meta learning. arXiv preprint arXiv:1902.08412. Cited by: §1, §1, §2.2, 2nd item, §5.1.1, §5.3.1, Table 1.
  • D. Zügner and S. Günnemann (2019b) Certifiable robustness and robust training for graph convolutional networks. In KDD, Cited by: §2.2.