Exploring High-Order Structure for Robust Graph Structure Learning

03/22/2022
by   Guangqian Yang, et al.
0

Recent studies show that Graph Neural Networks (GNNs) are vulnerable to adversarial attack, i.e., an imperceptible structure perturbation can fool GNNs to make wrong predictions. Some researches explore specific properties of clean graphs such as the feature smoothness to defense the attack, but the analysis of it has not been well-studied. In this paper, we analyze the adversarial attack on graphs from the perspective of feature smoothness which further contributes to an efficient new adversarial defensive algorithm for GNNs. We discover that the effect of the high-order graph structure is a smoother filter for processing graph structures. Intuitively, the high-order graph structure denotes the path number between nodes, where larger number indicates closer connection, so it naturally contributes to defense the adversarial perturbation. Further, we propose a novel algorithm that incorporates the high-order structural information into the graph structure learning. We perform experiments on three popular benchmark datasets, Cora, Citeseer and Polblogs. Extensive experiments demonstrate the effectiveness of our method for defending against graph adversarial attacks.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

01/30/2022

GARNET: Reduced-Rank Topology Learning for Robust and Scalable Graph Neural Networks

Graph neural networks (GNNs) have been increasingly deployed in various ...
07/23/2021

Structack: Structure-based Adversarial Attacks on Graph Neural Networks

Recent work has shown that graph neural networks (GNNs) are vulnerable t...
06/10/2019

Attacking Graph Convolutional Networks via Rewiring

Graph Neural Networks (GNNs) have boosted the performance of many graph ...
04/30/2021

Black-box Gradient Attack on Graph Neural Networks: Deeper Insights in Graph-based Attack and Defense

Graph Neural Networks (GNNs) have received significant attention due to ...
10/24/2020

Graph Information Bottleneck

Representation learning of graph-structured data is challenging because ...
12/16/2021

Graph Structure Learning with Variational Information Bottleneck

Graph Neural Networks (GNNs) have shown promising results on a broad spe...
02/25/2022

Projective Ranking-based GNN Evasion Attacks

Graph neural networks (GNNs) offer promising learning methods for graph-...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph Neural Networks (or GNNs) [kipf2017semi, hamilton2017inductive, velickovic2018graph]

play an important role in deep learning-based graph representation learning. By extending convolution operation to graph-structured data, graph neural networks show excellent performance in many applications, such as node classification 

[yang2016revisiting, kipf2017semi], link prediction [grover2016node2vec, schlichtkrull2018modeling], and graph classification [niepert2016learning, gilmer2017neural].

Despite the success of GNNs in graph structure learning, recent studies have shown that GNNs are vulnerable to adversarial attacks, i.e., a small perturbation on the graph will lead to a drastic performance degradation on graph structure learning [szegedy2013intriguing, goodfellow2014explaining]. Specifically, by injecting imperceptible perturbation into node feature or graph structure [dai2018adversarial, zugner2018adversarial, zugner2018adversarial_, xu2019topology], it can easily manipulate the prediction of GNNs. Therefore, the robustness of GNNs has received increasing attention from the community. Existing methods focusing on the robustness of GNNs can be divided into the following categories: adversarial training, robustness certification, and structure learning. This paper focus on the structure learning.

However, the low-order structure of the graph is vulnerable for defensing adversarial attacks, and the structure learning-based methods aim to mitigate the impact of adversarial attacks and help GNNs learn the true distribution of graph structures [zheng2020robust, jin2020graph, zhang2020gnnguard]. Compared to the initial structure, the high-order graph structure, which is reflected in the powers of the adjacency matrix, is naturally a more robust structure. Although adversarial attacks may perturb the low-order graph structures, they could hardly affect the high-order graph structures as the attacks can only change a small fraction of paths within the high-order graphs. As a result, the high-order graph structure information can naturally act as a denoising filter to make the low-order graph feature smoother.

Inspired by this, we propose a novel method that incorporates the high-order structural information into the learning process for robust graph structure learning. In this paper, we first analyze the graph adversarial attack from the perspective of graph feature smoothness, which is defined as the distance between connected nodes. And we both theoretically and empirically show that the adversarial structure perturbation essentially increases the local feature smoothness. We then devise a novel method to explore the high-order structural information for graph structure learning. Intuitively, the high-order adjacency matrix reflects the common neighbors between two nodes that could guide the structure learning. Though the adversarial attack may perturb some graph edges, it’s less likely to have much perturbation on the overall distribution of high-order graphs, and thus the high-order structure can be used to alleviate the influence of adversarial perturbations. Furthermore, we also theoretically show that high-order graph is very effective in smoothing the graph and eliminating the influence of adversarial perturbations. Our main contributions are as follows:

  • We analyze the graph adversarial attack from the perspective of feature smoothness, i.e., high-order graph structure is a smoother filter whose overall distribution is less influenced by the adversarial attack.

  • We explore the high-order graph structure to alleviate the influence of adversarial attack, which can be formulated as the normalized adjacency matrix regularization to guide the graph structure learning.

  • We conduct extensive experiments on several popular datasets using different types of attacks, and analyze the performance and sensitivity under different attack settings. Experimental results demonstrate that our method is a universal method to defense graph adversarial attacks.

2 Related Work

In this section, we briefly review recent works on graph neural networks and adversarial attacks or defense for graph neural networks.

2.1 Graph Neural Networks.

Graph Neural Networks are deep learning based methods for processing graph-structured data, and have shown excellent performance in many realistic tasks [yang2016revisiting, grover2016node2vec, niepert2016learning]

. Generally Graph Neural Networks could be classified into spectral and spacial methods. The spectral methods generally learn graph filters based on graph spectral theory 

[estrach2014spectral]

first generalize Convolutional Neural Networks to graph signals based hierarchical clustering and graph Laplacian. ChebNet 

[defferrard2016convolutional] utilizes Chebyshev polynomials as the fast localized spectral filter for computational efficiency. Graph Convolution Network [kipf2017semi] exploits a localized first-order approximation of spectral filters to further simply the filtering operation. The spatial methods directly propagate information based message passing in spatial domain. GraphSAGE [hamilton2017inductive] proposes an inductive learning framework on graphs that generates embeddings by sampling and aggregating neighbor information. Graph Attention Network [velickovic2018graph] proposes to apply attention mechanism on graph so as to learn different aggregation weight for neighbors based on their dependency. There are also many state-of-the-art methods recently.

2.2 Adversarial Attack and Defense.

Deep learning models have been shown to be vulnerable to adversarial perturbation [szegedy2013intriguing, goodfellow2014explaining], and so as Graph Neural Networks. A large amount researches have focus on the adversarial attack and defenses recently[zheng2021graph, sun2018adversarial, he2020robustness]. Generally, the attack models could be categorized into black box attack and white box attack based on the information the attacker could access about the model. Typical attack methods are as follows, where Nettack [zugner2018adversarial] derives an incremental attack method which utilizes approximation, RL-S2V [dai2018adversarial] uses Q-learning to add or drop edges from the graph sequentially, NIPA [sun2020non] proposes a more practical node injection attack, GF-Attack [chang2020restricted] aims to attack the graph filter of given models, Metattack [zugner2018adversarial]

treats the graph structure matrix as a hyperparameter to learn, IG-JSMA

[wu2019adversarial] introduces integrated gradients [sundararajan2017axiomatic] based methods, PGD [xu2019topology] utilizes projected gradient descent from a perspective of first-order optimization.

At the other end of the scale, defense methods aim to eliminate the influence of adversarial attack as much as possible. Based on the design principle, the defense models could generally be divided into certificate methods, adversarial training, and structure learning. The general idea of certificate methods is to ensure the prediction of the model not change in a certified perturbation radius, typical methods include attribute oriented [zugner2019certifiable], structure oriented [bojchevski2019certifiable], sparsity-aware certificate [bojchevski2020efficient], collective certificate [schuchardt2020collective], and so on. Adversarial training aims to directly improves the robustness of the model by training with adversarial examples, typical methods include RAWEN [ding2020improving], DWNS [dai2019adversarial]. And Structure learning methods aim to learn graph structure from the perturbed graph, typical methods include NeuralSparse [zheng2020robust] that learns to remove potentially task-irrelevant edges from input graphs, Pro-GNN [jin2020graph] that explores graph properties of sparsity, low rank and feature smoothness, and GNNGUARD [zhang2020gnnguard] that exploits the relationship between the graph structure and node features to mitigate negative effects of attack.

Figure 1:

The work-flow of the our method. We use the initial input graph (perturbed) to learn a estimated graph and to obtain the high-order graph. Here the high-order graph can be viewed as a weighted graph, where nodes with larger connectivity have closer links. The high-order graph can be further used to guide the graph structure learning process.

3 Methods

In this section, we first introduce the notations used in this paper. We then analyze how high-order structure helps defending adversarial attacks on graph. Lastly, we introduce the proposed high-order structure learning.

3.1 Preliminaries

We introduce the notations used in this paper as follows. Specifically, we use bold upper case letters for matrix, bold lower case letters for vector, and regular letters for scalar. Given a graph

, where is the set of nodes, is the set of edges, let denote the number of nodes. For each node , we have a corresponding attribute feature vector , and all nodes them form an attribute feature matrix , where denotes the dimension of the attribute. In addition, the adjacency matrix of is defined as , where indicates an edge between node and , and indicates no edge. For the node classification task, GNNs can be seen a function with parameters that map the nodes into different classes, using the adjacency matrix and the feature matrix as inputs. A general two-layer graph convolution network can be formulated as , where , and is a diagonal matrix where , and indicate trainable weight matrices, and

is the activation function such as ReLU 

[maas2013rectifier].

Given an adjacency matrix poisoned by the adversarial attack, our goal is to learn the estimated adjacency matrix that could alleviate the influence of attack, and the GNN parameters for downstream tasks. Consistent with previous symbols, we use to denote the normalized estimated adjacency matrix. The Laplacian of the graph is defined as .

3.2 Adversarial Attacks Increase Smoothness

As illustrated in previous work [zhu2021improving], the attack losses increase only when removing a homophilous edge, or adding a heterophilous edge to node . In this subsection, we analyze the adversarial attack problem from the perspective of feature smoothness. We both theoretically and empirically show that the above-mentioned attacks actually increase the feature smoothness of graph.

Similar to [jin2020graph], we define the average graph feature smoothness as follows:

(1)

where the and are the feature of nodes and , respectively. This formula utilizes the difference of node features to depict the overall graph smoothness, where larger distance indicates larger smoothness.

Proposition 1

Let be a graph with adjacency matrix and feature matrix . The node features of is , where is a vector with all elements equal to , and is the uniform noise strength. We also assume a homophilous edge fraction of each neighbors that belongs to the same class and holds. The edge set can be divided into two sets and , where and . The adversarial attack leads to the increase of feature smoothness.

We consider a targeted attack, and the local smoothness is defined as:

(2)

where is the degree of node , where , and where . We then add adversarial perturbation on the graph. To simplify the graph without loss of generality, we only modify one edge to see how the smoothness changes as follows.

  • Add a heterophilous edge. The local feature smoothness of node becomes:

  • Delete a homophilous edge. The local feature smoothness of node becomes:

Therefore, we have that either adding a heterophilous edge or deleting a homophilous edge will increase of local feature smoothness.

Figure 2: The influence of different perturbation rates on the smoothness.

3.2.1 Empirical Evidence.

Intuitively, adding a heterophilous edge will cause the feature of node aggregated with features different from the node, which makes the feature move towards the features of different label to cause the misclassification, and deleting a homophilous edge is on the contrary. To validate our analysis, we conduct some experiments and calculate the graph smoothness by . Here we conduct attack experiments with different perturbation rate on several datasets. As shown in Figure 2, we find that the feature smoothness increases when increasing the perturbation rate, which is consistent with our proposition. This phenomenon also indicates that the adversarial attack essentially increases the local smoothness of graph to fool the model into making wrong prediction.

3.3 High-Order Graph is a Smooth Filter

High-order graph indicates the powers of the normalized matrix. Intuitively, the perturbation of high-order structure is much less than the initial graph structure. We have the normalized graph Laplacian as follows:

(3)

where the symmetric graph Laplacian is positive semi-definite.

Theorem 1

Let

be the adjacency matrix with self-loop. Denote the eigenvalues of

as . For each eigenvalue , we have that the eigenvalues satisfy , i.e., for any .

Let be any real vector of unit norm, then we have

Therefore, we have that the largest eigenvalue of is upper bounded by 2, i.e., the eigenvalue of satisfy and for any . Note that the is the largest eigenvalue of , so is the eigenvalue of , and for the rest of the eigenvalues we have .

From Theorem 1, we know that by taking powers , the spectrum allows the filter to act as a low-pass type filter. This filter makes the frequencies of graph signal become lower, i.e., the local graph feature becomes smoother.

3.4 High-Order Structure Learning

In this subsection, we explore the high-order normalized adjacency matrix for graph structure learning, which can naturally preserve the high-order structural similarity and smooth the graph. The high-order structure learning can be formulated as:

(4)

where the first term minimizes the difference between the learned adjacency matrix and the perturbed adjacency matrix. High-order terms are used to improve the structure learning. One notable difference between second- and third-order is the symbol of eigenvalue, where the second-order filter has only positive eigenvalues. Though the filter becomes smoother with becoming larger, it also leads to the over-smoothing problem [li2018deeper], where the representations become too smooth to distinguish. In practice, we use or . From the geometric view, the element of high-order normalized adjacency matrix denotes the

-hop transition probability between two nodes

and , where node pairs with larger connectivity are assigned with larger weights. Intuitively, though the adversarial attack may add/delete a edge between node and , it has less influence on the distribution of the high-order adjacency power. Therefore, exploring the high-order adjacency matrix can be helpful to alleviate the influence of adversarial attack.

Corollary 1

Let be the graph Laplacian defined as , and be the 2-order graph Laplacian. For an identity feature matrix , we have .

According to the properties of Laplacian matrix, we have:

Here the second last equation is because that from the diagonal elements of is the paths number from node to itself, apparently that in a graph with self-loops there are paths, where is the degree of node . For an identity feature matrix , we have similar to , so that .

From the above corollary, we see that the trace of is larger than , which indicates the second-order filter smoothes the local structure.

3.4.1 Overall Loss Function

Since the natural graphs always exhibit sparsity and low-rank properties [jin2020graph]

, we also add sparsity and low-rank regularization terms on structure learning. And we could also add the feature smoothness term when features are available. So that the overall loss function could be formulated by:

where the term is the classification loss of the GNN such as cross entropy, is the nuclear norm. We iteratively optimize the parameters of GNN and the learned adjacency matrix . The optimization algorithm is shown in Algorithm 1, where we use proximal operator for the optimization of norm and nuclear norm [beck2009fast, richard2012estimation], is a projection function that project to and to .

Input: Adjacency matrix , feature matrix , labels , hyper-parameters , , , , , Learning rate ,
Output: Learned adjacency matrix , GNN parameters .

1:  Initialize
2:  Randomly initialize
3:  while stopping condition is not met do
4:     
5:     
6:     
7:     
8:     for  to  do
9:        
10:     end for
11:  end while
12:  return  ,
Algorithm 1 Optimization Algorithm.

4 Experiments

In this section, we evaluate the effectiveness of our proposed method against different types of adversarial attacks. We compare our method with several state-of-the-art methods, and analyze the parameter sensitivity of our method.

4.1 Experimental Settings

4.1.1 Datasets

We evaluate our method on three benchmark datasets, i.e., Cora, Citeseer and Polblogs, as shown in Table 1. Specifically, Cora and Citeseer are citation networks and they all have features, while Polblogs is a web network dataset whose features are not available.

Nodes Edges Features Classes
Cora 2,708 5,429 1,433 7
Citeseer 3,327 4,732 3,703 6
Polblogs 1,222 16,714 - 2
Table 1: Description of datasets used for node classification.

4.1.2 Implementation Details

To validate the effectiveness of our method, we compare it with several state-of-the-art defense methods. Our experiments are conducted based on the adversarial attack repository DeepRobust [li2020deeprobust]. Following the classical semi-supervised classification setting, we randomly choose of the nodes for training, for validation and

for testing for each graph. For each experiment, we run 10 times and report the mean performance and variance of each method. We adopt the default parameter settings for the baseline methods. We use

for all the datasets for implementation.

Dataset Ptb Rate GCN GAT RGCN Jaccard ProGNN ElasticGNN Ours
Cora 0.00
0.05
0.10
0.15
0.20
0.25
Citeseer 0.00
0.05
0.10
0.15
0.20
0.25
Polblogs 0.00
0.05
0.10
0.15
0.20
0.25
Table 2: Experiment results of node classification tasks against Metattack.

4.2 Defense Performance

We conduct experiments on three attack settings, including non-targeted attack, targeted attack and random attack.

  • Non-Targeted Attack. It considers the whole graph and aims to degrade the overall performance. We adopt a recent state-of-the-art attack, Metattack [zugner2018adversarial_].

  • Targeted Attack. It aims to attack specific nodes. We adopt a recent state-of-the-art attack, Nettack [zugner2018adversarial].

  • Random Attack. It randomly perturbs the graph structure, which injects a random noise.

Figure 3: The classification accuracy against Nettack.
Figure 4: The classification accuracy against random attack.
Figure 5: The classification accuracy of different hyper-parameters with the increase of perturbation rate.
Figure 6: The parameter sensitivity of each hyper-parameter on three datasets.

4.2.1 Non-targeted Attack

We first evaluate the performance of each method against non-targeted attack. In this experiment, we adopt the Metatack for implementation, and keep all the parameter settings in original paper. To test the performance under different degrees of perturbation, we vary the perturbation rate from to with the step size of . For each experiment, we run times with different seeds and report the mean accuracy and variance of each method, which is shown in Table 2. We highlight the best performance in bold under each degree of perturbation rate. From this table, we could make the following observations:

  • Our method constantly outperforms other methods under different attack degrees. For example, on each dataset, our method have different degrees of improvement compared to ProGNN. And in larger perturbation rate, our method have more distinctive improvement. Compared to vanilla GCN, our method improves , and on cora, citeseer and polblogs respectively.

  • On datasets without features, our method improves more than the baseline methods. For example, on polblogs, our method improves rather greatly compared to these methods. Especially under perturbation rate, our method improves compared to baselines.

4.2.2 Targeted Attack

We then evaluate the performance of each method against targeted attack. In this experiment, we adopt the Nettack for attack implementation and use the default parameter settings. We vary the number of perturbations on every targeted node from 1 to 5 with the step size of 1, and the nodes with degree larger than are choosed as targets. As shown in Figure 3, we find that our method out performs the baseline methods under different degree of attacks. Moreover, compared to baselines, our method reduces the decline rate of performance with the increase of perturbation numbers.

4.2.3 Random Attack

We evaluate the performance of each method against random attack. In this experiment, we vary the random noise from to with the step size of . As shown in Figure 4, we find that our method also performs well under random attack. Still, our method performs better under larger perturbation rates, which indicates that our method can better defense against random noise.

4.3 Parameter Analysis

In this part, we explore the sensitivity of hyper-parameters of each order for our method. We first set the fixed and see how well they performed with the change of perturbation rate. We then set the perturbation rate fixed and check the sensitivity of each hyper-parameter.

4.3.1 Parameter Effect

In this experiment, we solely use one order of regularization term, and see how they affect the performance of our method. The experimental result is shown in Figure 5. As shown in Figure 5, we can observe that when the perturbation rate is small, the low-order structure hyper-parameter contributes more for the classification performance. As the perturbation rates increases, the effect of drops sharply, and the high-order structure hyper-parameter and gradually performs better than the low-order structure. For the high-order parameter, performs better when the perturbation is at a small level, while has better performance when the perturbation is relatively large.

4.3.2 Parameter Sensitivity

We then analyze the sensitivity of different hyper-parameters. In this part, we set the perturbation rate fixed, and analyze the performance by adjusting the value of hyper-parameters. For illustration, we use as the perturbation rate for all datasets, and tune the hyper-parameter to see the performance. The result is shown in Figure 6.

As shown in Figure 6, we could observe that the accuracy of can be boosted when choosing appropriate values for all the hyper-parameters. For all of these three datasets, always get the best performance; is more sensitive to the value of hyper-parameter; seems contribute less at a high-level perturbation compared to the other two parameters. Specifically, seems to help very little on Citeseer dataset at a high perturbation level, while and have similar effects.

5 Conclusion

Graph Neural Networks are vulnerable to the adversarial attack, where a small structure perturbation can fool GNNs into making wrong predictions. To improve the robustness of GNNs, we analyze the adversarial attack problem from the perspective of feature smoothing. We prove that high-order graph is smooth filter, which can be used to defense the adversarial attacks on graph. Therefore, we propose a novel structure learning method which explores the high-order structure of graph to help the learning process. Extensive experiments on defensing graph adversarial attacks show that our method can effectively improve the robustness of GNNs, and the proposed method outperforms recent state-of-the-art methods with a clear margin.

References