Graph Neural Networks (or GNNs) [kipf2017semi, hamilton2017inductive, velickovic2018graph]
play an important role in deep learning-based graph representation learning. By extending convolution operation to graph-structured data, graph neural networks show excellent performance in many applications, such as node classification[yang2016revisiting, kipf2017semi], link prediction [grover2016node2vec, schlichtkrull2018modeling], and graph classification [niepert2016learning, gilmer2017neural].
Despite the success of GNNs in graph structure learning, recent studies have shown that GNNs are vulnerable to adversarial attacks, i.e., a small perturbation on the graph will lead to a drastic performance degradation on graph structure learning [szegedy2013intriguing, goodfellow2014explaining]. Specifically, by injecting imperceptible perturbation into node feature or graph structure [dai2018adversarial, zugner2018adversarial, zugner2018adversarial_, xu2019topology], it can easily manipulate the prediction of GNNs. Therefore, the robustness of GNNs has received increasing attention from the community. Existing methods focusing on the robustness of GNNs can be divided into the following categories: adversarial training, robustness certification, and structure learning. This paper focus on the structure learning.
However, the low-order structure of the graph is vulnerable for defensing adversarial attacks, and the structure learning-based methods aim to mitigate the impact of adversarial attacks and help GNNs learn the true distribution of graph structures [zheng2020robust, jin2020graph, zhang2020gnnguard]. Compared to the initial structure, the high-order graph structure, which is reflected in the powers of the adjacency matrix, is naturally a more robust structure. Although adversarial attacks may perturb the low-order graph structures, they could hardly affect the high-order graph structures as the attacks can only change a small fraction of paths within the high-order graphs. As a result, the high-order graph structure information can naturally act as a denoising filter to make the low-order graph feature smoother.
Inspired by this, we propose a novel method that incorporates the high-order structural information into the learning process for robust graph structure learning. In this paper, we first analyze the graph adversarial attack from the perspective of graph feature smoothness, which is defined as the distance between connected nodes. And we both theoretically and empirically show that the adversarial structure perturbation essentially increases the local feature smoothness. We then devise a novel method to explore the high-order structural information for graph structure learning. Intuitively, the high-order adjacency matrix reflects the common neighbors between two nodes that could guide the structure learning. Though the adversarial attack may perturb some graph edges, it’s less likely to have much perturbation on the overall distribution of high-order graphs, and thus the high-order structure can be used to alleviate the influence of adversarial perturbations. Furthermore, we also theoretically show that high-order graph is very effective in smoothing the graph and eliminating the influence of adversarial perturbations. Our main contributions are as follows:
We analyze the graph adversarial attack from the perspective of feature smoothness, i.e., high-order graph structure is a smoother filter whose overall distribution is less influenced by the adversarial attack.
We explore the high-order graph structure to alleviate the influence of adversarial attack, which can be formulated as the normalized adjacency matrix regularization to guide the graph structure learning.
We conduct extensive experiments on several popular datasets using different types of attacks, and analyze the performance and sensitivity under different attack settings. Experimental results demonstrate that our method is a universal method to defense graph adversarial attacks.
2 Related Work
In this section, we briefly review recent works on graph neural networks and adversarial attacks or defense for graph neural networks.
2.1 Graph Neural Networks.
Graph Neural Networks are deep learning based methods for processing graph-structured data, and have shown excellent performance in many realistic tasks [yang2016revisiting, grover2016node2vec, niepert2016learning]
. Generally Graph Neural Networks could be classified into spectral and spacial methods. The spectral methods generally learn graph filters based on graph spectral theory[estrach2014spectral]defferrard2016convolutional] utilizes Chebyshev polynomials as the fast localized spectral filter for computational efficiency. Graph Convolution Network [kipf2017semi] exploits a localized first-order approximation of spectral filters to further simply the filtering operation. The spatial methods directly propagate information based message passing in spatial domain. GraphSAGE [hamilton2017inductive] proposes an inductive learning framework on graphs that generates embeddings by sampling and aggregating neighbor information. Graph Attention Network [velickovic2018graph] proposes to apply attention mechanism on graph so as to learn different aggregation weight for neighbors based on their dependency. There are also many state-of-the-art methods recently.
2.2 Adversarial Attack and Defense.
Deep learning models have been shown to be vulnerable to adversarial perturbation [szegedy2013intriguing, goodfellow2014explaining], and so as Graph Neural Networks. A large amount researches have focus on the adversarial attack and defenses recently[zheng2021graph, sun2018adversarial, he2020robustness]. Generally, the attack models could be categorized into black box attack and white box attack based on the information the attacker could access about the model. Typical attack methods are as follows, where Nettack [zugner2018adversarial] derives an incremental attack method which utilizes approximation, RL-S2V [dai2018adversarial] uses Q-learning to add or drop edges from the graph sequentially, NIPA [sun2020non] proposes a more practical node injection attack, GF-Attack [chang2020restricted] aims to attack the graph filter of given models, Metattack [zugner2018adversarial]
treats the graph structure matrix as a hyperparameter to learn, IG-JSMA[wu2019adversarial] introduces integrated gradients [sundararajan2017axiomatic] based methods, PGD [xu2019topology] utilizes projected gradient descent from a perspective of first-order optimization.
At the other end of the scale, defense methods aim to eliminate the influence of adversarial attack as much as possible. Based on the design principle, the defense models could generally be divided into certificate methods, adversarial training, and structure learning. The general idea of certificate methods is to ensure the prediction of the model not change in a certified perturbation radius, typical methods include attribute oriented [zugner2019certifiable], structure oriented [bojchevski2019certifiable], sparsity-aware certificate [bojchevski2020efficient], collective certificate [schuchardt2020collective], and so on. Adversarial training aims to directly improves the robustness of the model by training with adversarial examples, typical methods include RAWEN [ding2020improving], DWNS [dai2019adversarial]. And Structure learning methods aim to learn graph structure from the perturbed graph, typical methods include NeuralSparse [zheng2020robust] that learns to remove potentially task-irrelevant edges from input graphs, Pro-GNN [jin2020graph] that explores graph properties of sparsity, low rank and feature smoothness, and GNNGUARD [zhang2020gnnguard] that exploits the relationship between the graph structure and node features to mitigate negative effects of attack.
In this section, we first introduce the notations used in this paper. We then analyze how high-order structure helps defending adversarial attacks on graph. Lastly, we introduce the proposed high-order structure learning.
We introduce the notations used in this paper as follows. Specifically, we use bold upper case letters for matrix, bold lower case letters for vector, and regular letters for scalar. Given a graph, where is the set of nodes, is the set of edges, let denote the number of nodes. For each node , we have a corresponding attribute feature vector , and all nodes them form an attribute feature matrix , where denotes the dimension of the attribute. In addition, the adjacency matrix of is defined as , where indicates an edge between node and , and indicates no edge. For the node classification task, GNNs can be seen a function with parameters that map the nodes into different classes, using the adjacency matrix and the feature matrix as inputs. A general two-layer graph convolution network can be formulated as , where , and is a diagonal matrix where , and indicate trainable weight matrices, and maas2013rectifier].
Given an adjacency matrix poisoned by the adversarial attack, our goal is to learn the estimated adjacency matrix that could alleviate the influence of attack, and the GNN parameters for downstream tasks. Consistent with previous symbols, we use to denote the normalized estimated adjacency matrix. The Laplacian of the graph is defined as .
3.2 Adversarial Attacks Increase Smoothness
As illustrated in previous work [zhu2021improving], the attack losses increase only when removing a homophilous edge, or adding a heterophilous edge to node . In this subsection, we analyze the adversarial attack problem from the perspective of feature smoothness. We both theoretically and empirically show that the above-mentioned attacks actually increase the feature smoothness of graph.
Similar to [jin2020graph], we define the average graph feature smoothness as follows:
where the and are the feature of nodes and , respectively. This formula utilizes the difference of node features to depict the overall graph smoothness, where larger distance indicates larger smoothness.
Let be a graph with adjacency matrix and feature matrix . The node features of is , where is a vector with all elements equal to , and is the uniform noise strength. We also assume a homophilous edge fraction of each neighbors that belongs to the same class and holds. The edge set can be divided into two sets and , where and . The adversarial attack leads to the increase of feature smoothness.
We consider a targeted attack, and the local smoothness is defined as:
where is the degree of node , where , and where . We then add adversarial perturbation on the graph. To simplify the graph without loss of generality, we only modify one edge to see how the smoothness changes as follows.
Add a heterophilous edge. The local feature smoothness of node becomes:
Delete a homophilous edge. The local feature smoothness of node becomes:
Therefore, we have that either adding a heterophilous edge or deleting a homophilous edge will increase of local feature smoothness.
3.2.1 Empirical Evidence.
Intuitively, adding a heterophilous edge will cause the feature of node aggregated with features different from the node, which makes the feature move towards the features of different label to cause the misclassification, and deleting a homophilous edge is on the contrary. To validate our analysis, we conduct some experiments and calculate the graph smoothness by . Here we conduct attack experiments with different perturbation rate on several datasets. As shown in Figure 2, we find that the feature smoothness increases when increasing the perturbation rate, which is consistent with our proposition. This phenomenon also indicates that the adversarial attack essentially increases the local smoothness of graph to fool the model into making wrong prediction.
3.3 High-Order Graph is a Smooth Filter
High-order graph indicates the powers of the normalized matrix. Intuitively, the perturbation of high-order structure is much less than the initial graph structure. We have the normalized graph Laplacian as follows:
where the symmetric graph Laplacian is positive semi-definite.
Let be the adjacency matrix with self-loop. Denote the eigenvalues of
be the adjacency matrix with self-loop. Denote the eigenvalues ofas . For each eigenvalue , we have that the eigenvalues satisfy , i.e., for any .
Let be any real vector of unit norm, then we have
Therefore, we have that the largest eigenvalue of is upper bounded by 2, i.e., the eigenvalue of satisfy and for any . Note that the is the largest eigenvalue of , so is the eigenvalue of , and for the rest of the eigenvalues we have .
From Theorem 1, we know that by taking powers , the spectrum allows the filter to act as a low-pass type filter. This filter makes the frequencies of graph signal become lower, i.e., the local graph feature becomes smoother.
3.4 High-Order Structure Learning
In this subsection, we explore the high-order normalized adjacency matrix for graph structure learning, which can naturally preserve the high-order structural similarity and smooth the graph. The high-order structure learning can be formulated as:
where the first term minimizes the difference between the learned adjacency matrix and the perturbed adjacency matrix. High-order terms are used to improve the structure learning. One notable difference between second- and third-order is the symbol of eigenvalue, where the second-order filter has only positive eigenvalues. Though the filter becomes smoother with becoming larger, it also leads to the over-smoothing problem [li2018deeper], where the representations become too smooth to distinguish. In practice, we use or . From the geometric view, the element of high-order normalized adjacency matrix denotes the
-hop transition probability between two nodesand , where node pairs with larger connectivity are assigned with larger weights. Intuitively, though the adversarial attack may add/delete a edge between node and , it has less influence on the distribution of the high-order adjacency power. Therefore, exploring the high-order adjacency matrix can be helpful to alleviate the influence of adversarial attack.
Let be the graph Laplacian defined as , and be the 2-order graph Laplacian. For an identity feature matrix , we have .
According to the properties of Laplacian matrix, we have:
Here the second last equation is because that from the diagonal elements of is the paths number from node to itself, apparently that in a graph with self-loops there are paths, where is the degree of node . For an identity feature matrix , we have similar to , so that .
From the above corollary, we see that the trace of is larger than , which indicates the second-order filter smoothes the local structure.
3.4.1 Overall Loss Function
Since the natural graphs always exhibit sparsity and low-rank properties [jin2020graph]
, we also add sparsity and low-rank regularization terms on structure learning. And we could also add the feature smoothness term when features are available. So that the overall loss function could be formulated by:
where the term is the classification loss of the GNN such as cross entropy, is the nuclear norm. We iteratively optimize the parameters of GNN and the learned adjacency matrix . The optimization algorithm is shown in Algorithm 1, where we use proximal operator for the optimization of norm and nuclear norm [beck2009fast, richard2012estimation], is a projection function that project to and to .
In this section, we evaluate the effectiveness of our proposed method against different types of adversarial attacks. We compare our method with several state-of-the-art methods, and analyze the parameter sensitivity of our method.
4.1 Experimental Settings
We evaluate our method on three benchmark datasets, i.e., Cora, Citeseer and Polblogs, as shown in Table 1. Specifically, Cora and Citeseer are citation networks and they all have features, while Polblogs is a web network dataset whose features are not available.
4.1.2 Implementation Details
To validate the effectiveness of our method, we compare it with several state-of-the-art defense methods. Our experiments are conducted based on the adversarial attack repository DeepRobust [li2020deeprobust]. Following the classical semi-supervised classification setting, we randomly choose of the nodes for training, for validation and
for testing for each graph. For each experiment, we run 10 times and report the mean performance and variance of each method. We adopt the default parameter settings for the baseline methods. We usefor all the datasets for implementation.
4.2 Defense Performance
We conduct experiments on three attack settings, including non-targeted attack, targeted attack and random attack.
Non-Targeted Attack. It considers the whole graph and aims to degrade the overall performance. We adopt a recent state-of-the-art attack, Metattack [zugner2018adversarial_].
Targeted Attack. It aims to attack specific nodes. We adopt a recent state-of-the-art attack, Nettack [zugner2018adversarial].
Random Attack. It randomly perturbs the graph structure, which injects a random noise.
4.2.1 Non-targeted Attack
We first evaluate the performance of each method against non-targeted attack. In this experiment, we adopt the Metatack for implementation, and keep all the parameter settings in original paper. To test the performance under different degrees of perturbation, we vary the perturbation rate from to with the step size of . For each experiment, we run times with different seeds and report the mean accuracy and variance of each method, which is shown in Table 2. We highlight the best performance in bold under each degree of perturbation rate. From this table, we could make the following observations:
Our method constantly outperforms other methods under different attack degrees. For example, on each dataset, our method have different degrees of improvement compared to ProGNN. And in larger perturbation rate, our method have more distinctive improvement. Compared to vanilla GCN, our method improves , and on cora, citeseer and polblogs respectively.
On datasets without features, our method improves more than the baseline methods. For example, on polblogs, our method improves rather greatly compared to these methods. Especially under perturbation rate, our method improves compared to baselines.
4.2.2 Targeted Attack
We then evaluate the performance of each method against targeted attack. In this experiment, we adopt the Nettack for attack implementation and use the default parameter settings. We vary the number of perturbations on every targeted node from 1 to 5 with the step size of 1, and the nodes with degree larger than are choosed as targets.
As shown in Figure 3, we find that our method out performs the baseline methods under different degree of attacks. Moreover, compared to baselines, our method reduces the decline rate of performance with the increase of perturbation numbers.
4.2.3 Random Attack
We evaluate the performance of each method against random attack. In this experiment, we vary the random noise from to with the step size of . As shown in Figure 4, we find that our method also performs well under random attack. Still, our method performs better under larger perturbation rates, which indicates that our method can better defense against random noise.
4.3 Parameter Analysis
In this part, we explore the sensitivity of hyper-parameters of each order for our method. We first set the fixed and see how well they performed with the change of perturbation rate. We then set the perturbation rate fixed and check the sensitivity of each hyper-parameter.
4.3.1 Parameter Effect
In this experiment, we solely use one order of regularization term, and see how they affect the performance of our method. The experimental result is shown in Figure 5. As shown in Figure 5, we can observe that when the perturbation rate is small, the low-order structure hyper-parameter contributes more for the classification performance. As the perturbation rates increases, the effect of drops sharply, and the high-order structure hyper-parameter and gradually performs better than the low-order structure. For the high-order parameter, performs better when the perturbation is at a small level, while has better performance when the perturbation is relatively large.
4.3.2 Parameter Sensitivity
We then analyze the sensitivity of different hyper-parameters. In this part, we set the perturbation rate fixed, and analyze the performance by adjusting the value of hyper-parameters. For illustration, we use as the perturbation rate for all datasets, and tune the hyper-parameter to see the performance. The result is shown in Figure 6.
As shown in Figure 6, we could observe that the accuracy of can be boosted when choosing appropriate values for all the hyper-parameters. For all of these three datasets, always get the best performance; is more sensitive to the value of hyper-parameter; seems contribute less at a high-level perturbation compared to the other two parameters. Specifically, seems to help very little on Citeseer dataset at a high perturbation level, while and have similar effects.
Graph Neural Networks are vulnerable to the adversarial attack, where a small structure perturbation can fool GNNs into making wrong predictions. To improve the robustness of GNNs, we analyze the adversarial attack problem from the perspective of feature smoothing. We prove that high-order graph is smooth filter, which can be used to defense the adversarial attacks on graph. Therefore, we propose a novel structure learning method which explores the high-order structure of graph to help the learning process. Extensive experiments on defensing graph adversarial attacks show that our method can effectively improve the robustness of GNNs, and the proposed method outperforms recent state-of-the-art methods with a clear margin.