DeepRobust
A pytorch adversarial library for attack and defense methods on images and graphs
view repo
DeepRobust is a PyTorch adversarial learning library which aims to build a comprehensive and easy-to-use platform to foster this research field. It currently contains more than 10 attack algorithms and 8 defense algorithms in image domain and 9 attack algorithms and 4 defense algorithms in graph domain, under a variety of deep learning architectures. In this manual, we introduce the main contents of DeepRobust with detailed instructions. The library is kept updated and can be found at https://github.com/DSE-MSU/DeepRobust.
READ FULL TEXT VIEW PDFA pytorch adversarial library for attack and defense methods on images and graphs
Deep learning has advanced numerous machine learning tasks such as image classification, speech recognition and graph representation learning. Since deep learning has been increasingly adopted by real-world safety-critical applications such as autonomous driving, healthcare and education, it is crucial to examine its vulnerability and safety issues. Szegedy et al.
[szegedy2013intriguing]first found that Deep Neural Networks (DNNs) are vulnerable to small designed perturbations, which are called adversarial perturbations. Figure
1(a) demonstrated adversarial example in the domains of images and graphs. Since then, tremendous efforts have been made on developing attack methods to fool DNNs and designing their counter measures. As a result, there is a growing need to build a comprehensive platform for adversarial attacks and defenses. Such platform enables us to systematically experiment on existing algorithms and efficiently test new algorithms that can deepen our understandings, and thus immensely foster this research field.Currently there are some existing libraries in this research field, such as Cleverhans [papernot2018cleverhans], advertorch [ding2019advertorch]. They mainly focused on attack methods in the image domain. However, little attention has been paid on defense methods. Furthermore, the majority of them are dedicated to the image domain while largely ignoring other domains such as graph-structured data. Our library DeepRobust not only provides representative attack and defense methods in the image domain, but also covers algorithms for graph data. This repository contains most classic and advanced algorithms
The remaining of this report is organized as follows. Section 2 introduces key concepts for adversarial attacks and defenses. Section 3 gives an overview of the DeepRobust library. Sections 4 and 5 introduce math background and implementation details of the algorithms in the library for the image and graph domains, separately. Section 6 provides concrete examples to demonstrate how to use the library.
The main goal of attack algorithms is to make invisible perturbations on data and then lead to wrong classification of a classifier which normally has good performance on clean data. The study of attacks can be categorized from different perspectives such as attackers’ goal, attackers’ ability and so on.
According to attackers’ goal, attack methods can be categorized as follows:
[leftmargin = 20pt]
Poisoning Attack vs. Evasion Attack
Poisoning Attack: Poisoning attacks refer to the attacking algorithms that allow an attacker to insert/modify several fake samples into the training data of a DNN algorithm. This training process with fake samples can cause bad performance on the test data.
Evasion Attack: For evasion attack, the victim classifiers are fixed and normally have good performance on benign test samples. The adversaries do not have authority to change the classifier or its parameters, instead they craft some fake samples that the classifier can neither correctly classify nor distinguish them as an unusual input. In other words, the adversaries generate some fraudulent examples to evade detection by the classifier.
Targeted Attack vs. Non-Targeted Attack
Targeted attack: When the victim sample is given, where
is feature vector and
is the ground truth label of , the adversary aims to induce the classifier to give a specific label to the perturbed sample .Non-Targeted Attack: If there is no specific given, an adversarial example can be viewed a successful attack as long as it is classified as any wrong label.
According to attackers’ ability, attack methods can be grouped as follows:
[leftmargin = 20pt]
White-Box attack: the adversary has access to all the information of the target model, including its architecture, parameters, gradients, etc.
Black-Box attack: In a black-box attack setting, the inner configuration of the target model is unavailable to adversaries. This type of methods often based on a bunch of attack queries.
Grey-Box attack: In a gray box attack setting, the attacker trains a generative model for producing adversarial examples in a white-box setting. Once the generative model is trained, it can be used to craft adversarial examples in a black-box setting.
To mitigate the risk of adversarial attack, different countermeasures have been investigated. There are four main categories of defenses. The first one is robustness optimization, that is to do adversarial training, namely to use adversarial examples to retrain the model. Another type is adversarial example detection. The goal is to distinguish adversarial examples from data distribution. The third type is gradient masking. This type of defense mainly includes some pre-processing methods to hide the gradient, in order to make the optimization in attack process much harder. The last type, provable defense, has gradually becomes a important stream of defense. Those methods aims to provide adversarial robustness guarantee.
In this section, we aim to first provide an overview of the DeepRobust library including environment requirements and the overview design of DeepRobust.
DeepRobust works on python and pytorch . All dependencies are listed in Appendix A. After downloading this repository, run setup.py to install DeepRobust into local python environment.
This repository mainly includes two components – the image component and the graph component as below. The directory structure can be found in Appendix B.
Image package
attack
defense
netmodels
evaluation
configs
Graph package
targeted_attack
global_attack
defense
data
The Image Component: According to the function of each program, the image component is divided into several sub-packages and contents. Attack sub-package includes attack base class and attack algorithms. Defense sub-package contains defense base class and defense algorithms. In Section 3, we will give specific introduction for each algorithm. Netmodels contains different network model classes. Users can simply generate a victim model by instantiating one model class. Through the evaluation program, we provide an easy-to-use API to test attack against defense. All the default parameters are saved in configs.
The Graph Component: The graph component contains several sub-packages and contents based on the functions. Targeted-attack sub-package includes the targeted attack base class and famous targeted attack algorithms. Similarly, global attack base class and global attack algorithms are included in global-attack sub-package. Defense sub-package contains GCN model and other methods for defending graph adversarial attack. Besides, sub-package data provides an easy access to public benchmark datasets including Cora, Cora-ml, Citeseer, Polblogs and Pubmed as well as pre-attacked graph data.
In this section, we aim to give an introduction of the interface for attack and defense methods in the image domain. Meanwhile, we provide algorithm description and implementation details for each algorithm. A comprehensive survey about attack and defense methods in the image domain can be found in [xu2019adversarial].
This subsection introduces the API for the attack methods in the image domain. Currently, this package covers nine representative attack algorithms including LBFGS[szegedy2013intriguing], FGSM[goodfellow2014explaining], PGD[madry2017towards], CW[carlini2017towards], onepixel[su2019one], deepfool[moosavi2016deepfool], BPDA[athalye2018obfuscated], Universal[moosavi2017universal] and Nattack[li2019nattack].
For these algorithms, we currently support the following neural networks and datasets: Supported networks:
CNN
ResNet-18/34
DenseNet
VGG-11/13/16/19
Supported datasets:
MNIST
CIFAR10
deeprobust.image.attack.base_attack
In order to make further development flexible and extendable, we organize functions shared by different methods in one module as the attack base class. The main body of the algorithm is override in each subclass. Following are detailed instructions for the functions contained in this class:
__init__(self, model, device = ’cuda’)
Initialization is completed in this function.
Parameters:
model: the victim model.
device: whether the program is run on GPU or CPU.
check_type_device(self, image, label, **kwargs)
The main purpose for this function is to convert the input into a unified data type so that they can be correctly used in the algorithm procedure.
Parameters:
image: clean input.
label: ground truth label corresponding to the clean input.
**kwargs: optional input dependent on each derived class.
parse_params(self, **kwargs)
This function provides the interface for these user defined parameters.
Parameters:
**kwargs: optional input dependent on each derived class.
generate(self, image, label, **kwargs)
Call generate() to launch the attack algorithms.
Parameters:
**kwargs: optional input. Parameters for the attack algorithms.
deeprobust.image.attack.lbfgs L-BFGS attack[szegedy2013intriguing] is the key work that has arisen people’s attention on the neural network’s vulnerability to small perturbation. This work tries to find a minimal distorted adversarial example by solving a intuitive box-constraint optimization problem:
(1) |
where is the adversarial example and is the clean example. We aim to find an adversarial example which could be classified as certain class and is close to the clean image. From the implementation perspective, addressing this optimization problem with two constraints is hard. Thus, this work turns to solve an alternative problem by doing binary search on a parameter to balance the trade off between perturb constraint and attack success for target class, and then use the LBFGS algorithm to get an approximate solution:
(2) |
where denotes the loss value of for the target class t.
deeprobust.image.attack.fgsm Fast Gradient Sign Method(FGSM)[goodfellow2014explaining] is an one-step optimization problem. The intuition is to move the input sample along the gradient direction to achieve highest loss value corresponding to the ground truth label. Thus the perturbed samples could fool the network with high confidence. To guarantee that the adversarial example lies in a small nearby area of starting the point , the gradient descent step is followed by a clip operation. Thus, the formulation of the process is described as follows:
(3) |
Here, Clip denotes a function to project its argument to the surface of ’s -neighbor ball.
deeprobust.image.attack.pgd Projected Gradient Descent(PGD)[madry2017towards] is an iterative version of the FGSM attack. The formulation of generating is:
(4) |
It chooses the origin image as a starting point. PGD attack could create strong adversarial examples and is often used as a baseline attack for defense methods.
deeprobust.image.attack.deepfool Deepfool attack aims to find a shortest path to let the data point
go across the decision boundary. It starts from an binary classifier. Denote the hyperplane as
. The minimum perturbation is the distance from the data point to the hyperplane that can be:(5) |
This calculation can be extended to general classifiers and also extend this problem to norm constraint perturbation. Compare to other methods like FGSM and PGD, Deepfool attack produces less perturbation to attack successfully.
deeprobust.image.attack.cw Carlini and Wagner’s attack [carlini2017towards] aims to solve the same problem as defined in L-BFGS attack, namely trying to find the minimally-distorted perturbation.
It addresses the problem by instead solving:
(6) |
where is defined as . Here,
is the logits function. Minimizing
encourages the algorithm to find an that has the larger score for class than any other label, so that the classifier will predict as the class . Next, by applying a line search on the constant , it can find the that has the least distance to .deeprobust.image.attack.universal Previous methods only consider one specific targeted victim sample . However, the work [moosavi2017universal] devises an algorithm that successfully misleads a classifier’s decision on almost all test images. It tries to find a perturbation satisfying:
.
.
Formulation 1 is constraint of the norm of perturbation size and formulation 2 set a threshold for the probability of misclassification. Actually it aims to find a invisible perturbation
such that the classifier gives wrong decisions on most of the samples.deeprobust.image.attack.onepixel One pixel attack [Su_2019] constraints the perturbation by the norm instead of norm or
norm. That is to say, finding the minimum number of pixels to perturb. Differential evolutionary algorithm(DE) is applied to solve this optimization problem.
deeprobust.image.attack.bpda Backward Pass Differentiable Approximation(BPDA) [athalye2018obfuscated] is a technique to attack defenses where gradients are not readily available. BPDA solves this problem by finding an approximation function for the non-differential layer and calculating the gradient through the approximation function.
deeprobust.image.attack.nattack Nattack [li2019nattack] is a black box attack trying to find a probability density distribution of where adversarial samples lying over a small region centered around the input. One sample that are drawn from this distribution is likely to be an adversarial example.
It uses an intuitive way to find this density distribution. First, it initializes the distribution with random parameters. Then, it samples from this distribution several times as the neural network input and calculates the average loss value of those samples. Thus, the average loss is a function of the distribution parameters. Finally, It performs gradient decent on the average loss and updates the distribution parameters. It will iterates this process until successful attack.
This subsection introduces the API for the defense methods in the image domain. Until now, this package covers three categories of defense methods including adversarial training, gradient masking and detection.
deeprobust.image.defense.base_defense
This module is the base class of all the adversarial training algorithms. It provides basic components for defense methods. Following functions are contained in this class:
__init__(self, model, device)
Parameter initialization is completed in this function.
Parameters:
model: the attack victim model.
device: whether the program is run on GPU or CPU.
parse_params(self, **kwargs)
This function provides the interface for user defined parameters.
Parameters:
**kwargs: optional input dependent on each derived class.
generate(self, train_loader, test_loader, **kwargs)
Call generate() to launch the defense.
Parameters:
**kwargs: optional input dependent on each derived class.
loss(self, output, target)
Calculate the training loss. This function will be overridden by each defense class according to the algorithms requirements.
Parameters:
output: model output.
target: ground truth label.
adv_data(self, model, data, target, **kwargs)
Generate adversarial training samples for robust training. This function will be overridden by each defense class according to algorithms requirements.
Parameters:
model: The victim model used to generate the adversarial example.
data: clean data.
target: target label
**kwargs: optional parameters.
train
(self, train_loader, optimizer, epoch)
Call train to train the adversarial model.
Parameters:
train_loader: the dataloader used to train robust model.
optimizer: the optimizer for the training process.
epoch: maximum epoch for the training process.
**kwargs: optional input dependent on each derived class.
test(self, test_loader)
Call test() function to test adversarial model.
Parameters:
test_loader: the dataloader used to test model.
deeprobust.image.defense.fgsmtraining FGSM adversarial training [goodfellow2014explaining] aims to improve model accuracy by training with adversarial examples. It generates adversarial examples in each iteration and updates model parameters via these adversarial examples.
deeprobust.image.defense.fast Fast [wong2020fast] is an improved version of FGSM adversarial training. This work finds out that by simply adding a random initialization into the adversarial training samples’ generating process, the model robustness would improve significantly
deeprobust.image.defense.pgdtraining PGD adversarial training uses adversarial examples generated by PGD instead of FGSM to train the model and achieve overall high performance.
deeprobust.image.defense.YOPO You-only-Propagate-Once(YOPO) [zhang2019you] is an accelerated version of the PGD adversarial training. When it generates the PGD adversarial examples for a layer network, it approximates the derivative of first layer as a constant, therefore there is no need to calculate the whole back propagation process in every iteration. Thus, the training time would be remarkably reduced.
deprobust.image.defense.trades The work [zhang2019theoretically] proposes a adversarial training strategy which encourages the clean samples and adversarial examples to be close in feature space. Its training objective is to minimize the loss:
(7) |
This loss function can be devided into two parts, the first part is the natural loss while the second part set a goal for minimizing the distance between the classifier output for those examples that are close in input space. Similar to the PGD adversarial training strategy, in each step, it first solves the inner maximization problem to find an optimal
, and then updates model parameters to minimize the outside loss value.deeprobust.image.defense.TherEncoding Thermometer encoding [buckman2018thermometer] is one way to mask the gradient information of the DNN models, in order to avoid the attacker from finding successful adversarial examples. It uses a preprocessor to discretize an image’s pixel value into a -dimensional vector . (e.g. when , ). The vector acts as a “thermometer” to record the pixel ’s value.
deeprobust.image.defense.LIDclassifier Local Intrinsic Dimensionality(LID) detection [ma2018characterizing] tries to train a classifier to distinguish adversarial examples from normal examples based on the LID features. Starting from a sample, it calculates the number of data points in a ball of a certain distance, and LID features measure the growth rate of the number of data points as the distance increases.
The design of graph package is slightly different from that of the image package. Specifically, graph package includes three main components, targeted attack, untargeted attack and defense. For these algorithms, supported networks and datasets are listed as follows: Supported network:
GCN
Supported datasets:
Cora
Cora-ml
Citeseer
Polblogs
Pubmed
More details about adversarial attack and defense can be found in [jin2020adversarial]. In the following, we are going to illustrate the details of various subpackages.
deeprobust.graph.targeted_attack This module introduces the API for targted attack methods in the graph package. In total, this package covers 5 algorithms: FGA [chen2018fga], Nettack [nettack], RL-S2V [rl-s2v], IG-Attack [deep-insight-jaccard] and RND [nettack].
deeprobust.graph.targeted_attack.fga FGSM [goodfellow2014explaining] can also be applied to attack graph data but it needs some modification to fit into the binary nature of graph data. One representative method to solve this problem is FGA [chen2018fga]. Basically, FGA first calculates the gradient of attack loss with respect to the graph structure and greedily chooses the perturbation with largest gradient.
deeprobust.graph.targeted_attack.nettack The work [nettack] proposes an attack method called Nettack to generate structure and feature attacks on graphs. Nettack first selects possible perturbation candidates that would not violate degree distribution and feature co-occurrence of the original graph. Then it greedily chooses the perturbation that has the largest score to modify the graph. By doing this repeatedly until reaching the perturbation constraint, it can get the final modified graph.
deeprobust.graph.targeted_attack.rl_s2v
To do black-box query on the victim model, reinforcement learning is introduced. RL-S2V
[rl-s2v]aims to employ the reinforcement learning technique to generate adversarial attacks on graph data under the black-box setting. It models the attack procedure as a Markov Decision Process (MDP) and the attacker is allowed to modify
edges to change the predicted label of the target node . Further, the Q-learning algorithm [mnih2013playing] is adopted to solve the MDP and guide the attacker to modify the graph.deeprobust.graph.targeted_attack.ig_attack Due to the discrete nature of graph data, how to precisely approximate the gradient of adversarial perturbations is a big challenge. To solve this issue, IG attack [deep-insight-jaccard] suggests to use integrated gradient [sundararajan2017axiomatic-ig] to better search for adversarial edges and feature perturbations. During the attacking process, the attacker iteratively chooses the edge or feature which has the strongest effect to the adversarial objective.
deeprobust.graph.targeted_attack.rnd RND is a baseline of attacking method used in [nettack]. Based on the assumption that unequal class labels are hindering classification, it modifies the graph structure sequentially. To be specific, given the target node, in each step it randomly samples nodes whose labels are different from the target node and then connects them in the graph.
deeprobust.graph.global_attack This module introduces the API for untargted attack methods in the graph package. Currrently, this package covers 4 algorithms: Metattack [metattack], PGD [xu2019topology-attack], Min-max [xu2019topology-attack] and DICE [waniek2018hiding-dice].
deeprobust.graph.global_attack.metattack Aiming to modify graph structure, Metattack [metattack] is a kind of untargeted poisoning attacks. Basically, it treats the graph structure matrix as a hyper-parameter and calculates the meta gradient of the loss function with respect to graph structure. Further, A greedy approach is applied to select the perturbation based on the meta gradient.
deeprobust.graph.global_attack.topology_attack The work [xu2019topology-attack] considers two different settings: 1) attacking a fixed GNN and 2) attacking a re-trainable GNN. For attacking a fixed GNN, it utilizes the Projected Gradient Descent (PGD) algorithm in [madry2017towards] to search the optimal structure perturbation. This is called PGD attack. For the re-trainable GNNs, the attack problem is formulated as a min-max form where the inner maximization can be solved by gradient ascent and the outer minimization can be solved by PGD. It is called Min-max attack.
deeprobust.graph.global_attack.dice DICE [waniek2018hiding-dice] means “delete internally, connect externally” where it randomly connects nodes with different labels or drops edges between nodes sharing the same label. It is noted that DICE is a white-box attack and widely used as a baseline in comparing the performance of untargeted attacks.
deeprobust.graph.defense.adv_training Since adversarial training is a widely used countermeasure for adversarial attacks in the image data [goodfellow2014explaining], we can also adopt this strategy to defend graph adversarial attacks. The min-max optimization problem indicates that adversarial training involves two processes: (1) generating perturbations that maximize the prediction loss and (2) updating model parameters that minimize the prediction loss. By alternating the above two processes attractively, we can train a robust model against adversarial attacks. Since there are two inputs for graphs, i.e., adjacency matrix and attribute matrix, adversarial training can be done on them separately.
dedprobust.graph.defense.gcn_jaccard The work [deep-insight-jaccard] proposes a preprocessing method based on two empirical observations of the attack methods: (1) Attackers usually prefer to adding edges over removing edges or modifying features and (2) Attackers tend to connect dissimilar nodes. Based on these findings, they propose a defense method by eliminating the edges whose two end nodes have small Jaccard Similarity [said2010social].
dedprobust.graph.defense.gcn_svd It is observed that Nettack [nettack]
generates the perturbations which mainly change the small singular values of the graph adjacency matrix
[entezari2020all-svd]. Thus it proposes to preprocess the perturbed adjacency matrix by using truncated SVD to get its low-rank approximation.deprobust.graph.defense.rgcn Different from the above preprocessing methods which try to exclude adversarial perturbations, RGCN [rgcn]
aims to train a robust GNN model by penalizing model’s weights on adversarial edges or nodes. Based on the assumption that adversarial nodes may have high prediction uncertainty, they propose to model the hidden representation of nodes as Gaussian distribution with mean value and variance where the uncertainty can be reflected in the variance. When aggregating the information from neighbor nodes, it applies an attention mechanism to penalize the nodes with high variance.
In this section, we would give concrete examples to illustrate how to use this repository. For each type of methods, we provide one demo code.
In deeprobust.image.netmodels, we provide several deep network architecture. Call train() to train a model.
To launch an attack method, The first step is to import certain attack class from deeprobust.image.attack. Then, we need to initialize a victim model and create a dataloader, which contains the test images to be generated as adversarial examples. Then, we can feed the model and data to the attack method. The output would be adversarial examples.
Defense method can be imported in deeprobust.image.defense. We need to feed a model structure and a dataloader to the defense model. The output would be adversarial trained model and the performance on both clean data and adversarial data.
We provide a simple access to evaluate the performance of attack toward defense.
We show an example of attacking graph neural networks. We will use a linearized GCN as the surrogate model and apply untargeted Metattack to generate perturbed graph on the Cora citation dataset.
First we need to import the packages we are going to use in the head of the code and load Cora dataset.
Then set up the surrogate model to be attacked.
Then we use Metattack to generate perturbations to attack the surrogate model. Here the variable modified_adj is the perturbed graph generated by Metattack.
We show an example of defending graph adversarial attacks. We will use Metattack as the attacking method and GCN-Jaccard as the defense method.
First, we import all the packages we need to use and load the clean graph and pre-attacked graph of Cora dataset.
Then we set up the defense model GCN-Jaccard and test it performance on the perturbed graph.
As a comparison, we can also set up GCN model and test its performance on the perturbed graph.
Our main goal is to provide a comprehensive, easy-to-use platform for researchers who are interested in adversarial attack and defense. In the future, we would support larger datasets and more model architectures. Moreover, we will keep including the newest models and updating this repository.
Dependency | Version |
---|---|
torch | 1.2.0 |
torchvision | 0.4.0 |
numpy | 1.17.1 |
matplotlib | 3.1.1 |
scipy | 1.3.1 |
Pillow | 7.0.0 |
scikit_learn | 0.22.1 |
skimage | 0 |
tensorboardX | 2 |
tqdm | 4.42.1 |
texttable | 1.6.2 |
numba | 0.48.0 |