1 Introduction
Deep learning has advanced numerous machine learning tasks such as image classification, speech recognition and graph representation learning. Since deep learning has been increasingly adopted by realworld safetycritical applications such as autonomous driving, healthcare and education, it is crucial to examine its vulnerability and safety issues. Szegedy et al.
[szegedy2013intriguing]first found that Deep Neural Networks (DNNs) are vulnerable to small designed perturbations, which are called adversarial perturbations. Figure
1(a) demonstrated adversarial example in the domains of images and graphs. Since then, tremendous efforts have been made on developing attack methods to fool DNNs and designing their counter measures. As a result, there is a growing need to build a comprehensive platform for adversarial attacks and defenses. Such platform enables us to systematically experiment on existing algorithms and efficiently test new algorithms that can deepen our understandings, and thus immensely foster this research field.Currently there are some existing libraries in this research field, such as Cleverhans [papernot2018cleverhans], advertorch [ding2019advertorch]. They mainly focused on attack methods in the image domain. However, little attention has been paid on defense methods. Furthermore, the majority of them are dedicated to the image domain while largely ignoring other domains such as graphstructured data. Our library DeepRobust not only provides representative attack and defense methods in the image domain, but also covers algorithms for graph data. This repository contains most classic and advanced algorithms
The remaining of this report is organized as follows. Section 2 introduces key concepts for adversarial attacks and defenses. Section 3 gives an overview of the DeepRobust library. Sections 4 and 5 introduce math background and implementation details of the algorithms in the library for the image and graph domains, separately. Section 6 provides concrete examples to demonstrate how to use the library.
2 Foundations of Adversarial Attacks and Defenses
The main goal of attack algorithms is to make invisible perturbations on data and then lead to wrong classification of a classifier which normally has good performance on clean data. The study of attacks can be categorized from different perspectives such as attackers’ goal, attackers’ ability and so on.
According to attackers’ goal, attack methods can be categorized as follows:

[leftmargin = 20pt]

Poisoning Attack vs. Evasion Attack
Poisoning Attack: Poisoning attacks refer to the attacking algorithms that allow an attacker to insert/modify several fake samples into the training data of a DNN algorithm. This training process with fake samples can cause bad performance on the test data.Evasion Attack: For evasion attack, the victim classifiers are fixed and normally have good performance on benign test samples. The adversaries do not have authority to change the classifier or its parameters, instead they craft some fake samples that the classifier can neither correctly classify nor distinguish them as an unusual input. In other words, the adversaries generate some fraudulent examples to evade detection by the classifier.

Targeted Attack vs. NonTargeted Attack
Targeted attack: When the victim sample is given, whereis feature vector and
is the ground truth label of , the adversary aims to induce the classifier to give a specific label to the perturbed sample .NonTargeted Attack: If there is no specific given, an adversarial example can be viewed a successful attack as long as it is classified as any wrong label.
According to attackers’ ability, attack methods can be grouped as follows:

[leftmargin = 20pt]

WhiteBox attack: the adversary has access to all the information of the target model, including its architecture, parameters, gradients, etc.

BlackBox attack: In a blackbox attack setting, the inner configuration of the target model is unavailable to adversaries. This type of methods often based on a bunch of attack queries.

GreyBox attack: In a gray box attack setting, the attacker trains a generative model for producing adversarial examples in a whitebox setting. Once the generative model is trained, it can be used to craft adversarial examples in a blackbox setting.
To mitigate the risk of adversarial attack, different countermeasures have been investigated. There are four main categories of defenses. The first one is robustness optimization, that is to do adversarial training, namely to use adversarial examples to retrain the model. Another type is adversarial example detection. The goal is to distinguish adversarial examples from data distribution. The third type is gradient masking. This type of defense mainly includes some preprocessing methods to hide the gradient, in order to make the optimization in attack process much harder. The last type, provable defense, has gradually becomes a important stream of defense. Those methods aims to provide adversarial robustness guarantee.
3 An Overview of DeepRobust
In this section, we aim to first provide an overview of the DeepRobust library including environment requirements and the overview design of DeepRobust.
3.1 Environment Requirements and Setup
DeepRobust works on python and pytorch . All dependencies are listed in Appendix A. After downloading this repository, run setup.py to install DeepRobust into local python environment.
3.2 The Overview Design
This repository mainly includes two components – the image component and the graph component as below. The directory structure can be found in Appendix B.

Image package

attack

defense

netmodels

evaluation

configs


Graph package

targeted_attack

global_attack

defense

data

The Image Component: According to the function of each program, the image component is divided into several subpackages and contents. Attack subpackage includes attack base class and attack algorithms. Defense subpackage contains defense base class and defense algorithms. In Section 3, we will give specific introduction for each algorithm. Netmodels contains different network model classes. Users can simply generate a victim model by instantiating one model class. Through the evaluation program, we provide an easytouse API to test attack against defense. All the default parameters are saved in configs.
The Graph Component: The graph component contains several subpackages and contents based on the functions. Targetedattack subpackage includes the targeted attack base class and famous targeted attack algorithms. Similarly, global attack base class and global attack algorithms are included in globalattack subpackage. Defense subpackage contains GCN model and other methods for defending graph adversarial attack. Besides, subpackage data provides an easy access to public benchmark datasets including Cora, Coraml, Citeseer, Polblogs and Pubmed as well as preattacked graph data.
4 Image Component
In this section, we aim to give an introduction of the interface for attack and defense methods in the image domain. Meanwhile, we provide algorithm description and implementation details for each algorithm. A comprehensive survey about attack and defense methods in the image domain can be found in [xu2019adversarial].
4.1 Attacks
This subsection introduces the API for the attack methods in the image domain. Currently, this package covers nine representative attack algorithms including LBFGS[szegedy2013intriguing], FGSM[goodfellow2014explaining], PGD[madry2017towards], CW[carlini2017towards], onepixel[su2019one], deepfool[moosavi2016deepfool], BPDA[athalye2018obfuscated], Universal[moosavi2017universal] and Nattack[li2019nattack].
For these algorithms, we currently support the following neural networks and datasets: Supported networks:

CNN

ResNet18/34

DenseNet

VGG11/13/16/19
Supported datasets:

MNIST

CIFAR10
4.1.1 Attack Base Class
deeprobust.image.attack.base_attack
In order to make further development flexible and extendable, we organize functions shared by different methods in one module as the attack base class. The main body of the algorithm is override in each subclass. Following are detailed instructions for the functions contained in this class:

__init__(self, model, device = ’cuda’)
Initialization is completed in this function.
Parameters:
model: the victim model.

device: whether the program is run on GPU or CPU.


check_type_device(self, image, label, **kwargs)
The main purpose for this function is to convert the input into a unified data type so that they can be correctly used in the algorithm procedure.
Parameters:
image: clean input.

label: ground truth label corresponding to the clean input.

**kwargs: optional input dependent on each derived class.


parse_params(self, **kwargs)
This function provides the interface for these user defined parameters.
Parameters:
**kwargs: optional input dependent on each derived class.


generate(self, image, label, **kwargs)
Call generate() to launch the attack algorithms.
Parameters:
**kwargs: optional input. Parameters for the attack algorithms.

4.1.2 Attack algorithms
deeprobust.image.attack.lbfgs LBFGS attack[szegedy2013intriguing] is the key work that has arisen people’s attention on the neural network’s vulnerability to small perturbation. This work tries to find a minimal distorted adversarial example by solving a intuitive boxconstraint optimization problem:
(1) 
where is the adversarial example and is the clean example. We aim to find an adversarial example which could be classified as certain class and is close to the clean image. From the implementation perspective, addressing this optimization problem with two constraints is hard. Thus, this work turns to solve an alternative problem by doing binary search on a parameter to balance the trade off between perturb constraint and attack success for target class, and then use the LBFGS algorithm to get an approximate solution:
(2) 
where denotes the loss value of for the target class t.
deeprobust.image.attack.fgsm Fast Gradient Sign Method(FGSM)[goodfellow2014explaining] is an onestep optimization problem. The intuition is to move the input sample along the gradient direction to achieve highest loss value corresponding to the ground truth label. Thus the perturbed samples could fool the network with high confidence. To guarantee that the adversarial example lies in a small nearby area of starting the point , the gradient descent step is followed by a clip operation. Thus, the formulation of the process is described as follows:
(3) 
Here, Clip denotes a function to project its argument to the surface of ’s neighbor ball.
deeprobust.image.attack.pgd Projected Gradient Descent(PGD)[madry2017towards] is an iterative version of the FGSM attack. The formulation of generating is:
(4) 
It chooses the origin image as a starting point. PGD attack could create strong adversarial examples and is often used as a baseline attack for defense methods.
deeprobust.image.attack.deepfool Deepfool attack aims to find a shortest path to let the data point
go across the decision boundary. It starts from an binary classifier. Denote the hyperplane as
. The minimum perturbation is the distance from the data point to the hyperplane that can be:(5) 
This calculation can be extended to general classifiers and also extend this problem to norm constraint perturbation. Compare to other methods like FGSM and PGD, Deepfool attack produces less perturbation to attack successfully.
deeprobust.image.attack.cw Carlini and Wagner’s attack [carlini2017towards] aims to solve the same problem as defined in LBFGS attack, namely trying to find the minimallydistorted perturbation.
It addresses the problem by instead solving:
(6) 
where is defined as . Here,
is the logits function. Minimizing
encourages the algorithm to find an that has the larger score for class than any other label, so that the classifier will predict as the class . Next, by applying a line search on the constant , it can find the that has the least distance to .deeprobust.image.attack.universal Previous methods only consider one specific targeted victim sample . However, the work [moosavi2017universal] devises an algorithm that successfully misleads a classifier’s decision on almost all test images. It tries to find a perturbation satisfying:

.

.
Formulation 1 is constraint of the norm of perturbation size and formulation 2 set a threshold for the probability of misclassification. Actually it aims to find a invisible perturbation
such that the classifier gives wrong decisions on most of the samples.deeprobust.image.attack.onepixel One pixel attack [Su_2019] constraints the perturbation by the norm instead of norm or
norm. That is to say, finding the minimum number of pixels to perturb. Differential evolutionary algorithm(DE) is applied to solve this optimization problem.
deeprobust.image.attack.bpda Backward Pass Differentiable Approximation(BPDA) [athalye2018obfuscated] is a technique to attack defenses where gradients are not readily available. BPDA solves this problem by finding an approximation function for the nondifferential layer and calculating the gradient through the approximation function.
deeprobust.image.attack.nattack Nattack [li2019nattack] is a black box attack trying to find a probability density distribution of where adversarial samples lying over a small region centered around the input. One sample that are drawn from this distribution is likely to be an adversarial example.
It uses an intuitive way to find this density distribution. First, it initializes the distribution with random parameters. Then, it samples from this distribution several times as the neural network input and calculates the average loss value of those samples. Thus, the average loss is a function of the distribution parameters. Finally, It performs gradient decent on the average loss and updates the distribution parameters. It will iterates this process until successful attack.
4.2 Defense Subpackage
This subsection introduces the API for the defense methods in the image domain. Until now, this package covers three categories of defense methods including adversarial training, gradient masking and detection.
4.2.1 Defense Base Class
deeprobust.image.defense.base_defense
This module is the base class of all the adversarial training algorithms. It provides basic components for defense methods. Following functions are contained in this class:

__init__(self, model, device)
Parameter initialization is completed in this function.
Parameters:
model: the attack victim model.

device: whether the program is run on GPU or CPU.


parse_params(self, **kwargs)
This function provides the interface for user defined parameters.
Parameters:
**kwargs: optional input dependent on each derived class.


generate(self, train_loader, test_loader, **kwargs)
Call generate() to launch the defense.
Parameters:
**kwargs: optional input dependent on each derived class.


loss(self, output, target)
Calculate the training loss. This function will be overridden by each defense class according to the algorithms requirements.
Parameters:
output: model output.

target: ground truth label.


adv_data(self, model, data, target, **kwargs)
Generate adversarial training samples for robust training. This function will be overridden by each defense class according to algorithms requirements.
Parameters:
model: The victim model used to generate the adversarial example.

data: clean data.

target: target label

**kwargs: optional parameters.


train
(self, train_loader, optimizer, epoch)
Call train to train the adversarial model.
Parameters:
train_loader: the dataloader used to train robust model.

optimizer: the optimizer for the training process.

epoch: maximum epoch for the training process.

**kwargs: optional input dependent on each derived class.


test(self, test_loader)
Call test() function to test adversarial model.
Parameters:
test_loader: the dataloader used to test model.

4.2.2 Adversarial Training
deeprobust.image.defense.fgsmtraining FGSM adversarial training [goodfellow2014explaining] aims to improve model accuracy by training with adversarial examples. It generates adversarial examples in each iteration and updates model parameters via these adversarial examples.
deeprobust.image.defense.fast Fast [wong2020fast] is an improved version of FGSM adversarial training. This work finds out that by simply adding a random initialization into the adversarial training samples’ generating process, the model robustness would improve significantly
deeprobust.image.defense.pgdtraining PGD adversarial training uses adversarial examples generated by PGD instead of FGSM to train the model and achieve overall high performance.
deeprobust.image.defense.YOPO YouonlyPropagateOnce(YOPO) [zhang2019you] is an accelerated version of the PGD adversarial training. When it generates the PGD adversarial examples for a layer network, it approximates the derivative of first layer as a constant, therefore there is no need to calculate the whole back propagation process in every iteration. Thus, the training time would be remarkably reduced.
deprobust.image.defense.trades The work [zhang2019theoretically] proposes a adversarial training strategy which encourages the clean samples and adversarial examples to be close in feature space. Its training objective is to minimize the loss:
(7) 
This loss function can be devided into two parts, the first part is the natural loss while the second part set a goal for minimizing the distance between the classifier output for those examples that are close in input space. Similar to the PGD adversarial training strategy, in each step, it first solves the inner maximization problem to find an optimal
, and then updates model parameters to minimize the outside loss value.4.2.3 Gradient Masking
deeprobust.image.defense.TherEncoding Thermometer encoding [buckman2018thermometer] is one way to mask the gradient information of the DNN models, in order to avoid the attacker from finding successful adversarial examples. It uses a preprocessor to discretize an image’s pixel value into a dimensional vector . (e.g. when , ). The vector acts as a “thermometer” to record the pixel ’s value.
4.2.4 Detection
deeprobust.image.defense.LIDclassifier Local Intrinsic Dimensionality(LID) detection [ma2018characterizing] tries to train a classifier to distinguish adversarial examples from normal examples based on the LID features. Starting from a sample, it calculates the number of data points in a ball of a certain distance, and LID features measure the growth rate of the number of data points as the distance increases.
5 Graph Package
The design of graph package is slightly different from that of the image package. Specifically, graph package includes three main components, targeted attack, untargeted attack and defense. For these algorithms, supported networks and datasets are listed as follows: Supported network:

GCN
Supported datasets:

Cora

Coraml

Citeseer

Polblogs

Pubmed
More details about adversarial attack and defense can be found in [jin2020adversarial]. In the following, we are going to illustrate the details of various subpackages.
5.1 Targeted Attack Subpackage
deeprobust.graph.targeted_attack This module introduces the API for targted attack methods in the graph package. In total, this package covers 5 algorithms: FGA [chen2018fga], Nettack [nettack], RLS2V [rls2v], IGAttack [deepinsightjaccard] and RND [nettack].
deeprobust.graph.targeted_attack.fga FGSM [goodfellow2014explaining] can also be applied to attack graph data but it needs some modification to fit into the binary nature of graph data. One representative method to solve this problem is FGA [chen2018fga]. Basically, FGA first calculates the gradient of attack loss with respect to the graph structure and greedily chooses the perturbation with largest gradient.
deeprobust.graph.targeted_attack.nettack The work [nettack] proposes an attack method called Nettack to generate structure and feature attacks on graphs. Nettack first selects possible perturbation candidates that would not violate degree distribution and feature cooccurrence of the original graph. Then it greedily chooses the perturbation that has the largest score to modify the graph. By doing this repeatedly until reaching the perturbation constraint, it can get the final modified graph.
deeprobust.graph.targeted_attack.rl_s2v
To do blackbox query on the victim model, reinforcement learning is introduced. RLS2V
[rls2v]aims to employ the reinforcement learning technique to generate adversarial attacks on graph data under the blackbox setting. It models the attack procedure as a Markov Decision Process (MDP) and the attacker is allowed to modify
edges to change the predicted label of the target node . Further, the Qlearning algorithm [mnih2013playing] is adopted to solve the MDP and guide the attacker to modify the graph.deeprobust.graph.targeted_attack.ig_attack Due to the discrete nature of graph data, how to precisely approximate the gradient of adversarial perturbations is a big challenge. To solve this issue, IG attack [deepinsightjaccard] suggests to use integrated gradient [sundararajan2017axiomaticig] to better search for adversarial edges and feature perturbations. During the attacking process, the attacker iteratively chooses the edge or feature which has the strongest effect to the adversarial objective.
deeprobust.graph.targeted_attack.rnd RND is a baseline of attacking method used in [nettack]. Based on the assumption that unequal class labels are hindering classification, it modifies the graph structure sequentially. To be specific, given the target node, in each step it randomly samples nodes whose labels are different from the target node and then connects them in the graph.
5.2 Untargeted Attack Subpackage
deeprobust.graph.global_attack This module introduces the API for untargted attack methods in the graph package. Currrently, this package covers 4 algorithms: Metattack [metattack], PGD [xu2019topologyattack], Minmax [xu2019topologyattack] and DICE [waniek2018hidingdice].
deeprobust.graph.global_attack.metattack Aiming to modify graph structure, Metattack [metattack] is a kind of untargeted poisoning attacks. Basically, it treats the graph structure matrix as a hyperparameter and calculates the meta gradient of the loss function with respect to graph structure. Further, A greedy approach is applied to select the perturbation based on the meta gradient.
deeprobust.graph.global_attack.topology_attack The work [xu2019topologyattack] considers two different settings: 1) attacking a fixed GNN and 2) attacking a retrainable GNN. For attacking a fixed GNN, it utilizes the Projected Gradient Descent (PGD) algorithm in [madry2017towards] to search the optimal structure perturbation. This is called PGD attack. For the retrainable GNNs, the attack problem is formulated as a minmax form where the inner maximization can be solved by gradient ascent and the outer minimization can be solved by PGD. It is called Minmax attack.
deeprobust.graph.global_attack.dice DICE [waniek2018hidingdice] means “delete internally, connect externally” where it randomly connects nodes with different labels or drops edges between nodes sharing the same label. It is noted that DICE is a whitebox attack and widely used as a baseline in comparing the performance of untargeted attacks.
5.3 Defense Subpackage
5.3.1 Adversarial Training
deeprobust.graph.defense.adv_training Since adversarial training is a widely used countermeasure for adversarial attacks in the image data [goodfellow2014explaining], we can also adopt this strategy to defend graph adversarial attacks. The minmax optimization problem indicates that adversarial training involves two processes: (1) generating perturbations that maximize the prediction loss and (2) updating model parameters that minimize the prediction loss. By alternating the above two processes attractively, we can train a robust model against adversarial attacks. Since there are two inputs for graphs, i.e., adjacency matrix and attribute matrix, adversarial training can be done on them separately.
5.3.2 Preprocessing
dedprobust.graph.defense.gcn_jaccard The work [deepinsightjaccard] proposes a preprocessing method based on two empirical observations of the attack methods: (1) Attackers usually prefer to adding edges over removing edges or modifying features and (2) Attackers tend to connect dissimilar nodes. Based on these findings, they propose a defense method by eliminating the edges whose two end nodes have small Jaccard Similarity [said2010social].
dedprobust.graph.defense.gcn_svd It is observed that Nettack [nettack]
generates the perturbations which mainly change the small singular values of the graph adjacency matrix
[entezari2020allsvd]. Thus it proposes to preprocess the perturbed adjacency matrix by using truncated SVD to get its lowrank approximation.5.3.3 Attention Mechanism
deprobust.graph.defense.rgcn Different from the above preprocessing methods which try to exclude adversarial perturbations, RGCN [rgcn]
aims to train a robust GNN model by penalizing model’s weights on adversarial edges or nodes. Based on the assumption that adversarial nodes may have high prediction uncertainty, they propose to model the hidden representation of nodes as Gaussian distribution with mean value and variance where the uncertainty can be reflected in the variance. When aggregating the information from neighbor nodes, it applies an attention mechanism to penalize the nodes with high variance.
6 Handon Case Studies
In this section, we would give concrete examples to illustrate how to use this repository. For each type of methods, we provide one demo code.
6.1 Image Case Studies
6.1.1 Train Network
In deeprobust.image.netmodels, we provide several deep network architecture. Call train() to train a model.
6.1.2 Attack
To launch an attack method, The first step is to import certain attack class from deeprobust.image.attack. Then, we need to initialize a victim model and create a dataloader, which contains the test images to be generated as adversarial examples. Then, we can feed the model and data to the attack method. The output would be adversarial examples.
6.1.3 Defense
Defense method can be imported in deeprobust.image.defense. We need to feed a model structure and a dataloader to the defense model. The output would be adversarial trained model and the performance on both clean data and adversarial data.
6.1.4 Evaluation
We provide a simple access to evaluate the performance of attack toward defense.
6.2 Graph Case Studies
6.2.1 Attack Graph Neural Networks
We show an example of attacking graph neural networks. We will use a linearized GCN as the surrogate model and apply untargeted Metattack to generate perturbed graph on the Cora citation dataset.
First we need to import the packages we are going to use in the head of the code and load Cora dataset.
Then set up the surrogate model to be attacked.
Then we use Metattack to generate perturbations to attack the surrogate model. Here the variable modified_adj is the perturbed graph generated by Metattack.
6.2.2 Defend Graph Adversarial Attacks
We show an example of defending graph adversarial attacks. We will use Metattack as the attacking method and GCNJaccard as the defense method.
First, we import all the packages we need to use and load the clean graph and preattacked graph of Cora dataset.
Then we set up the defense model GCNJaccard and test it performance on the perturbed graph.
As a comparison, we can also set up GCN model and test its performance on the perturbed graph.
7 Conclusion
Our main goal is to provide a comprehensive, easytouse platform for researchers who are interested in adversarial attack and defense. In the future, we would support larger datasets and more model architectures. Moreover, we will keep including the newest models and updating this repository.
References
Appendix A Environment Dependencies
Dependency  Version 

torch  1.2.0 
torchvision  0.4.0 
numpy  1.17.1 
matplotlib  3.1.1 
scipy  1.3.1 
Pillow  7.0.0 
scikit_learn  0.22.1 
skimage  0 
tensorboardX  2 
tqdm  4.42.1 
texttable  1.6.2 
numba  0.48.0 
Comments
There are no comments yet.