DeepRobust: A PyTorch Library for Adversarial Attacks and Defenses

05/13/2020 ∙ by Yaxin Li, et al. ∙ Michigan State University 48

DeepRobust is a PyTorch adversarial learning library which aims to build a comprehensive and easy-to-use platform to foster this research field. It currently contains more than 10 attack algorithms and 8 defense algorithms in image domain and 9 attack algorithms and 4 defense algorithms in graph domain, under a variety of deep learning architectures. In this manual, we introduce the main contents of DeepRobust with detailed instructions. The library is kept updated and can be found at https://github.com/DSE-MSU/DeepRobust.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

page 12

page 17

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning has advanced numerous machine learning tasks such as image classification, speech recognition and graph representation learning. Since deep learning has been increasingly adopted by real-world safety-critical applications such as autonomous driving, healthcare and education, it is crucial to examine its vulnerability and safety issues. Szegedy et al.

[szegedy2013intriguing]

first found that Deep Neural Networks (DNNs) are vulnerable to small designed perturbations, which are called adversarial perturbations. Figure

1(a) demonstrated adversarial example in the domains of images and graphs. Since then, tremendous efforts have been made on developing attack methods to fool DNNs and designing their counter measures. As a result, there is a growing need to build a comprehensive platform for adversarial attacks and defenses. Such platform enables us to systematically experiment on existing algorithms and efficiently test new algorithms that can deepen our understandings, and thus immensely foster this research field.

(a) Image Adversarial Example
(b) Graph Adversarial Example
Figure 1: An Illustration of Adversarial Example.

Currently there are some existing libraries in this research field, such as Cleverhans [papernot2018cleverhans], advertorch [ding2019advertorch]. They mainly focused on attack methods in the image domain. However, little attention has been paid on defense methods. Furthermore, the majority of them are dedicated to the image domain while largely ignoring other domains such as graph-structured data. Our library DeepRobust not only provides representative attack and defense methods in the image domain, but also covers algorithms for graph data. This repository contains most classic and advanced algorithms

The remaining of this report is organized as follows. Section 2 introduces key concepts for adversarial attacks and defenses. Section 3 gives an overview of the DeepRobust library. Sections 4 and 5 introduce math background and implementation details of the algorithms in the library for the image and graph domains, separately. Section 6 provides concrete examples to demonstrate how to use the library.

2 Foundations of Adversarial Attacks and Defenses

The main goal of attack algorithms is to make invisible perturbations on data and then lead to wrong classification of a classifier which normally has good performance on clean data. The study of attacks can be categorized from different perspectives such as attackers’ goal, attackers’ ability and so on.

According to attackers’ goal, attack methods can be categorized as follows:

  • [leftmargin = 20pt]

  • Poisoning Attack vs. Evasion Attack
    Poisoning Attack:
    Poisoning attacks refer to the attacking algorithms that allow an attacker to insert/modify several fake samples into the training data of a DNN algorithm. This training process with fake samples can cause bad performance on the test data.

    Evasion Attack: For evasion attack, the victim classifiers are fixed and normally have good performance on benign test samples. The adversaries do not have authority to change the classifier or its parameters, instead they craft some fake samples that the classifier can neither correctly classify nor distinguish them as an unusual input. In other words, the adversaries generate some fraudulent examples to evade detection by the classifier.

  • Targeted Attack vs. Non-Targeted Attack
    Targeted attack:
    When the victim sample is given, where

    is feature vector and

    is the ground truth label of , the adversary aims to induce the classifier to give a specific label to the perturbed sample .

    Non-Targeted Attack: If there is no specific given, an adversarial example can be viewed a successful attack as long as it is classified as any wrong label.

According to attackers’ ability, attack methods can be grouped as follows:

  • [leftmargin = 20pt]

  • White-Box attack: the adversary has access to all the information of the target model, including its architecture, parameters, gradients, etc.

  • Black-Box attack: In a black-box attack setting, the inner configuration of the target model is unavailable to adversaries. This type of methods often based on a bunch of attack queries.

  • Grey-Box attack: In a gray box attack setting, the attacker trains a generative model for producing adversarial examples in a white-box setting. Once the generative model is trained, it can be used to craft adversarial examples in a black-box setting.

To mitigate the risk of adversarial attack, different countermeasures have been investigated. There are four main categories of defenses. The first one is robustness optimization, that is to do adversarial training, namely to use adversarial examples to retrain the model. Another type is adversarial example detection. The goal is to distinguish adversarial examples from data distribution. The third type is gradient masking. This type of defense mainly includes some pre-processing methods to hide the gradient, in order to make the optimization in attack process much harder. The last type, provable defense, has gradually becomes a important stream of defense. Those methods aims to provide adversarial robustness guarantee.

3 An Overview of DeepRobust

In this section, we aim to first provide an overview of the DeepRobust library including environment requirements and the overview design of DeepRobust.

3.1 Environment Requirements and Setup

DeepRobust works on python and pytorch . All dependencies are listed in Appendix A. After downloading this repository, run setup.py to install DeepRobust into local python environment.

3.2 The Overview Design

This repository mainly includes two components – the image component and the graph component as below. The directory structure can be found in Appendix B.

  • Image package

    • attack

    • defense

    • netmodels

    • evaluation

    • configs

  • Graph package

    • targeted_attack

    • global_attack

    • defense

    • data

The Image Component: According to the function of each program, the image component is divided into several sub-packages and contents. Attack sub-package includes attack base class and attack algorithms. Defense sub-package contains defense base class and defense algorithms. In Section 3, we will give specific introduction for each algorithm. Netmodels contains different network model classes. Users can simply generate a victim model by instantiating one model class. Through the evaluation program, we provide an easy-to-use API to test attack against defense. All the default parameters are saved in configs.

The Graph Component: The graph component contains several sub-packages and contents based on the functions. Targeted-attack sub-package includes the targeted attack base class and famous targeted attack algorithms. Similarly, global attack base class and global attack algorithms are included in global-attack sub-package. Defense sub-package contains GCN model and other methods for defending graph adversarial attack. Besides, sub-package data provides an easy access to public benchmark datasets including Cora, Cora-ml, Citeseer, Polblogs and Pubmed as well as pre-attacked graph data.

4 Image Component

In this section, we aim to give an introduction of the interface for attack and defense methods in the image domain. Meanwhile, we provide algorithm description and implementation details for each algorithm. A comprehensive survey about attack and defense methods in the image domain can be found in [xu2019adversarial].

4.1 Attacks

This subsection introduces the API for the attack methods in the image domain. Currently, this package covers nine representative attack algorithms including LBFGS[szegedy2013intriguing], FGSM[goodfellow2014explaining], PGD[madry2017towards], CW[carlini2017towards], onepixel[su2019one], deepfool[moosavi2016deepfool], BPDA[athalye2018obfuscated], Universal[moosavi2017universal] and Nattack[li2019nattack].

For these algorithms, we currently support the following neural networks and datasets: Supported networks:

  • CNN

  • ResNet-18/34

  • DenseNet

  • VGG-11/13/16/19

Supported datasets:

  • MNIST

  • CIFAR10

4.1.1 Attack Base Class

deeprobust.image.attack.base_attack

In order to make further development flexible and extendable, we organize functions shared by different methods in one module as the attack base class. The main body of the algorithm is override in each subclass. Following are detailed instructions for the functions contained in this class:

  • __init__(self, model, device = ’cuda’)

    Initialization is completed in this function.
    Parameters:

    • model: the victim model.

    • device: whether the program is run on GPU or CPU.

  • check_type_device(self, image, label, **kwargs)

    The main purpose for this function is to convert the input into a unified data type so that they can be correctly used in the algorithm procedure.
    Parameters:

    • image: clean input.

    • label: ground truth label corresponding to the clean input.

    • **kwargs: optional input dependent on each derived class.

  • parse_params(self, **kwargs)

    This function provides the interface for these user defined parameters.
    Parameters:

    • **kwargs: optional input dependent on each derived class.

  • generate(self, image, label, **kwargs)

    Call generate() to launch the attack algorithms.
    Parameters:

    • **kwargs: optional input. Parameters for the attack algorithms.

4.1.2 Attack algorithms

deeprobust.image.attack.lbfgs  L-BFGS attack[szegedy2013intriguing] is the key work that has arisen people’s attention on the neural network’s vulnerability to small perturbation. This work tries to find a minimal distorted adversarial example by solving a intuitive box-constraint optimization problem:

(1)

where is the adversarial example and is the clean example. We aim to find an adversarial example which could be classified as certain class and is close to the clean image. From the implementation perspective, addressing this optimization problem with two constraints is hard. Thus, this work turns to solve an alternative problem by doing binary search on a parameter to balance the trade off between perturb constraint and attack success for target class, and then use the LBFGS algorithm to get an approximate solution:

(2)

where denotes the loss value of for the target class t.


deeprobust.image.attack.fgsm  Fast Gradient Sign Method(FGSM)[goodfellow2014explaining] is an one-step optimization problem. The intuition is to move the input sample along the gradient direction to achieve highest loss value corresponding to the ground truth label. Thus the perturbed samples could fool the network with high confidence. To guarantee that the adversarial example lies in a small nearby area of starting the point , the gradient descent step is followed by a clip operation. Thus, the formulation of the process is described as follows:

(3)

Here, Clip denotes a function to project its argument to the surface of ’s -neighbor ball.


deeprobust.image.attack.pgd  Projected Gradient Descent(PGD)[madry2017towards] is an iterative version of the FGSM attack. The formulation of generating is:

(4)

It chooses the origin image as a starting point. PGD attack could create strong adversarial examples and is often used as a baseline attack for defense methods.


deeprobust.image.attack.deepfool  Deepfool attack aims to find a shortest path to let the data point

go across the decision boundary. It starts from an binary classifier. Denote the hyperplane as

. The minimum perturbation is the distance from the data point to the hyperplane that can be:

(5)

This calculation can be extended to general classifiers and also extend this problem to norm constraint perturbation. Compare to other methods like FGSM and PGD, Deepfool attack produces less perturbation to attack successfully.


deeprobust.image.attack.cw  Carlini and Wagner’s attack  [carlini2017towards] aims to solve the same problem as defined in L-BFGS attack, namely trying to find the minimally-distorted perturbation.

It addresses the problem by instead solving:

(6)

where is defined as . Here,

is the logits function. Minimizing

encourages the algorithm to find an that has the larger score for class than any other label, so that the classifier will predict as the class . Next, by applying a line search on the constant , it can find the that has the least distance to .


deeprobust.image.attack.universal  Previous methods only consider one specific targeted victim sample . However, the work  [moosavi2017universal] devises an algorithm that successfully misleads a classifier’s decision on almost all test images. It tries to find a perturbation satisfying:

  1. .

  2. .

Formulation 1 is constraint of the norm of perturbation size and formulation 2 set a threshold for the probability of misclassification. Actually it aims to find a invisible perturbation

such that the classifier gives wrong decisions on most of the samples.


deeprobust.image.attack.onepixel  One pixel attack  [Su_2019] constraints the perturbation by the norm instead of norm or

norm. That is to say, finding the minimum number of pixels to perturb. Differential evolutionary algorithm(DE) is applied to solve this optimization problem.


deeprobust.image.attack.bpda  Backward Pass Differentiable Approximation(BPDA) [athalye2018obfuscated] is a technique to attack defenses where gradients are not readily available. BPDA solves this problem by finding an approximation function for the non-differential layer and calculating the gradient through the approximation function.


deeprobust.image.attack.nattack  Nattack  [li2019nattack] is a black box attack trying to find a probability density distribution of where adversarial samples lying over a small region centered around the input. One sample that are drawn from this distribution is likely to be an adversarial example.

It uses an intuitive way to find this density distribution. First, it initializes the distribution with random parameters. Then, it samples from this distribution several times as the neural network input and calculates the average loss value of those samples. Thus, the average loss is a function of the distribution parameters. Finally, It performs gradient decent on the average loss and updates the distribution parameters. It will iterates this process until successful attack.

4.2 Defense Subpackage

This subsection introduces the API for the defense methods in the image domain. Until now, this package covers three categories of defense methods including adversarial training, gradient masking and detection.

4.2.1 Defense Base Class

deeprobust.image.defense.base_defense

This module is the base class of all the adversarial training algorithms. It provides basic components for defense methods. Following functions are contained in this class:

  • __init__(self, model, device)

    Parameter initialization is completed in this function.
    Parameters:

    • model: the attack victim model.

    • device: whether the program is run on GPU or CPU.

  • parse_params(self, **kwargs)

    This function provides the interface for user defined parameters.
    Parameters:

    • **kwargs: optional input dependent on each derived class.

  • generate(self, train_loader, test_loader, **kwargs)

    Call generate() to launch the defense.
    Parameters:

    • **kwargs: optional input dependent on each derived class.

  • loss(self, output, target)

    Calculate the training loss. This function will be overridden by each defense class according to the algorithms requirements.
    Parameters:

    • output: model output.

    • target: ground truth label.

  • adv_data(self, model, data, target, **kwargs)

    Generate adversarial training samples for robust training. This function will be overridden by each defense class according to algorithms requirements.
    Parameters:

    • model: The victim model used to generate the adversarial example.

    • data: clean data.

    • target: target label

    • **kwargs: optional parameters.

  • train

    (self, train_loader, optimizer, epoch)

    Call train to train the adversarial model.
    Parameters:

    • train_loader: the dataloader used to train robust model.

    • optimizer: the optimizer for the training process.

    • epoch: maximum epoch for the training process.

    • **kwargs: optional input dependent on each derived class.

  • test(self, test_loader)

    Call test() function to test adversarial model.
    Parameters:

    • test_loader: the dataloader used to test model.

4.2.2 Adversarial Training

deeprobust.image.defense.fgsmtraining  FGSM adversarial training  [goodfellow2014explaining] aims to improve model accuracy by training with adversarial examples. It generates adversarial examples in each iteration and updates model parameters via these adversarial examples.


deeprobust.image.defense.fast  Fast  [wong2020fast] is an improved version of FGSM adversarial training. This work finds out that by simply adding a random initialization into the adversarial training samples’ generating process, the model robustness would improve significantly


deeprobust.image.defense.pgdtraining  PGD adversarial training uses adversarial examples generated by PGD instead of FGSM to train the model and achieve overall high performance.


deeprobust.image.defense.YOPO  You-only-Propagate-Once(YOPO) [zhang2019you] is an accelerated version of the PGD adversarial training. When it generates the PGD adversarial examples for a layer network, it approximates the derivative of first layer as a constant, therefore there is no need to calculate the whole back propagation process in every iteration. Thus, the training time would be remarkably reduced.


deprobust.image.defense.trades  The work [zhang2019theoretically] proposes a adversarial training strategy which encourages the clean samples and adversarial examples to be close in feature space. Its training objective is to minimize the loss:

(7)

This loss function can be devided into two parts, the first part is the natural loss while the second part set a goal for minimizing the distance between the classifier output for those examples that are close in input space. Similar to the PGD adversarial training strategy, in each step, it first solves the inner maximization problem to find an optimal

, and then updates model parameters to minimize the outside loss value.

4.2.3 Gradient Masking

deeprobust.image.defense.TherEncoding  Thermometer encoding [buckman2018thermometer] is one way to mask the gradient information of the DNN models, in order to avoid the attacker from finding successful adversarial examples. It uses a preprocessor to discretize an image’s pixel value into a -dimensional vector . (e.g. when , ). The vector acts as a “thermometer” to record the pixel ’s value.

4.2.4 Detection

deeprobust.image.defense.LIDclassifier Local Intrinsic Dimensionality(LID) detection  [ma2018characterizing] tries to train a classifier to distinguish adversarial examples from normal examples based on the LID features. Starting from a sample, it calculates the number of data points in a ball of a certain distance, and LID features measure the growth rate of the number of data points as the distance increases.

5 Graph Package

The design of graph package is slightly different from that of the image package. Specifically, graph package includes three main components, targeted attack, untargeted attack and defense. For these algorithms, supported networks and datasets are listed as follows: Supported network:

  • GCN

Supported datasets:

  • Cora

  • Cora-ml

  • Citeseer

  • Polblogs

  • Pubmed

More details about adversarial attack and defense can be found in [jin2020adversarial]. In the following, we are going to illustrate the details of various subpackages.

5.1 Targeted Attack Subpackage

deeprobust.graph.targeted_attack  This module introduces the API for targted attack methods in the graph package. In total, this package covers 5 algorithms: FGA [chen2018fga], Nettack [nettack], RL-S2V [rl-s2v], IG-Attack [deep-insight-jaccard] and RND [nettack].


deeprobust.graph.targeted_attack.fga  FGSM [goodfellow2014explaining] can also be applied to attack graph data but it needs some modification to fit into the binary nature of graph data. One representative method to solve this problem is FGA [chen2018fga]. Basically, FGA first calculates the gradient of attack loss with respect to the graph structure and greedily chooses the perturbation with largest gradient.


deeprobust.graph.targeted_attack.nettack  The work [nettack] proposes an attack method called Nettack to generate structure and feature attacks on graphs. Nettack first selects possible perturbation candidates that would not violate degree distribution and feature co-occurrence of the original graph. Then it greedily chooses the perturbation that has the largest score to modify the graph. By doing this repeatedly until reaching the perturbation constraint, it can get the final modified graph.


deeprobust.graph.targeted_attack.rl_s2v

 To do black-box query on the victim model, reinforcement learning is introduced. RL-S2V 

[rl-s2v]

aims to employ the reinforcement learning technique to generate adversarial attacks on graph data under the black-box setting. It models the attack procedure as a Markov Decision Process (MDP) and the attacker is allowed to modify

edges to change the predicted label of the target node . Further, the Q-learning algorithm [mnih2013playing] is adopted to solve the MDP and guide the attacker to modify the graph.


deeprobust.graph.targeted_attack.ig_attack  Due to the discrete nature of graph data, how to precisely approximate the gradient of adversarial perturbations is a big challenge. To solve this issue, IG attack [deep-insight-jaccard] suggests to use integrated gradient [sundararajan2017axiomatic-ig] to better search for adversarial edges and feature perturbations. During the attacking process, the attacker iteratively chooses the edge or feature which has the strongest effect to the adversarial objective.


deeprobust.graph.targeted_attack.rnd  RND is a baseline of attacking method used in [nettack]. Based on the assumption that unequal class labels are hindering classification, it modifies the graph structure sequentially. To be specific, given the target node, in each step it randomly samples nodes whose labels are different from the target node and then connects them in the graph.

5.2 Untargeted Attack Subpackage

deeprobust.graph.global_attack  This module introduces the API for untargted attack methods in the graph package. Currrently, this package covers 4 algorithms: Metattack [metattack], PGD [xu2019topology-attack], Min-max [xu2019topology-attack] and DICE [waniek2018hiding-dice].


deeprobust.graph.global_attack.metattack  Aiming to modify graph structure, Metattack [metattack] is a kind of untargeted poisoning attacks. Basically, it treats the graph structure matrix as a hyper-parameter and calculates the meta gradient of the loss function with respect to graph structure. Further, A greedy approach is applied to select the perturbation based on the meta gradient.


deeprobust.graph.global_attack.topology_attack  The work [xu2019topology-attack] considers two different settings: 1) attacking a fixed GNN and 2) attacking a re-trainable GNN. For attacking a fixed GNN, it utilizes the Projected Gradient Descent (PGD) algorithm in [madry2017towards] to search the optimal structure perturbation. This is called PGD attack. For the re-trainable GNNs, the attack problem is formulated as a min-max form where the inner maximization can be solved by gradient ascent and the outer minimization can be solved by PGD. It is called Min-max attack.


deeprobust.graph.global_attack.dice  DICE [waniek2018hiding-dice] means “delete internally, connect externally” where it randomly connects nodes with different labels or drops edges between nodes sharing the same label. It is noted that DICE is a white-box attack and widely used as a baseline in comparing the performance of untargeted attacks.

5.3 Defense Subpackage

5.3.1 Adversarial Training

deeprobust.graph.defense.adv_training  Since adversarial training is a widely used countermeasure for adversarial attacks in the image data [goodfellow2014explaining], we can also adopt this strategy to defend graph adversarial attacks. The min-max optimization problem indicates that adversarial training involves two processes: (1) generating perturbations that maximize the prediction loss and (2) updating model parameters that minimize the prediction loss. By alternating the above two processes attractively, we can train a robust model against adversarial attacks. Since there are two inputs for graphs, i.e., adjacency matrix and attribute matrix, adversarial training can be done on them separately.

5.3.2 Pre-processing

dedprobust.graph.defense.gcn_jaccard  The work [deep-insight-jaccard] proposes a preprocessing method based on two empirical observations of the attack methods: (1) Attackers usually prefer to adding edges over removing edges or modifying features and (2) Attackers tend to connect dissimilar nodes. Based on these findings, they propose a defense method by eliminating the edges whose two end nodes have small Jaccard Similarity [said2010social].

dedprobust.graph.defense.gcn_svd  It is observed that Nettack [nettack]

generates the perturbations which mainly change the small singular values of the graph adjacency matrix 

[entezari2020all-svd]. Thus it proposes to preprocess the perturbed adjacency matrix by using truncated SVD to get its low-rank approximation.

5.3.3 Attention Mechanism

deprobust.graph.defense.rgcn  Different from the above preprocessing methods which try to exclude adversarial perturbations, RGCN [rgcn]

aims to train a robust GNN model by penalizing model’s weights on adversarial edges or nodes. Based on the assumption that adversarial nodes may have high prediction uncertainty, they propose to model the hidden representation of nodes as Gaussian distribution with mean value and variance where the uncertainty can be reflected in the variance. When aggregating the information from neighbor nodes, it applies an attention mechanism to penalize the nodes with high variance.

6 Hand-on Case Studies

In this section, we would give concrete examples to illustrate how to use this repository. For each type of methods, we provide one demo code.

6.1 Image Case Studies

6.1.1 Train Network

In deeprobust.image.netmodels, we provide several deep network architecture. Call train() to train a model.

import deeprobust.image.netmodels.train_model as trainmodel
trainmodel.train(’CNN’, ’MNIST’, ’cuda’, 20)

6.1.2 Attack

To launch an attack method, The first step is to import certain attack class from deeprobust.image.attack. Then, we need to initialize a victim model and create a dataloader, which contains the test images to be generated as adversarial examples. Then, we can feed the model and data to the attack method. The output would be adversarial examples.

from deeprobust.image.attack.pgd import PGD
from deeprobust.image.config import attack_params
import deeprobust.image.netmodels.resnet as resnet
model = resnet.ResNet18().to(’cuda’)
model.load_state_dict(torch.load \
                     ("./trained_models/MNIST_CNN_epoch_20.pt"))
model.eval()
transform_val = transforms.Compose([transforms.ToTensor()])
test_loader  = torch.utils.data.DataLoader(
                datasets.CIFAR10(’deeprobust/image/data’,
                train=False, download=True, transform=transform_val),
                batch_size=10, shuffle=True)
x, y = next(iter(test_loader))
x = x.to(’cuda’).float()
adversary = PGD(model, device)
Adv_img = adversary.generate(x, y, **attack_params[’PGD_CIFAR10’])

6.1.3 Defense

Defense method can be imported in deeprobust.image.defense. We need to feed a model structure and a dataloader to the defense model. The output would be adversarial trained model and the performance on both clean data and adversarial data.

from deeprobust.image.defense.pgdtraining import PGDtraining
from deeprobust.image.config import defense_params
from deeprobust.image.netmodels.CNN import Net
import torch
from torchvision import datasets, transforms
model = Net()
train_loader = torch.utils.data.DataLoader(
               datasets.MNIST(’deeprobust/image/defense/data’,
               train=True, download=True,
               transform=transforms.Compose([transforms.ToTensor()])),
               batch_size=100,shuffle=True)
test_loader = torch.utils.data.DataLoader(
              datasets.MNIST(’deeprobust/image/defense/data’,
              train=False,
              transform=transforms.Compose([transforms.ToTensor()])),
                batch_size=1000,shuffle=True)
defense = PGDtraining(model, ’cuda’)
defense.generate(train_loader, test_loader, \
                 **defense_params["PGDtraining_MNIST"])

6.1.4 Evaluation

We provide a simple access to evaluate the performance of attack toward defense.

cd DeepRobust
#creat a victim model
python examples/image/test_train.py
#evaluation attack
python deeprobust/image/evaluation_attack.py --attack_method PGD
--attack_model CNN --dataset MNIST

6.2 Graph Case Studies

6.2.1 Attack Graph Neural Networks

We show an example of attacking graph neural networks. We will use a linearized GCN as the surrogate model and apply untargeted Metattack to generate perturbed graph on the Cora citation dataset.

First we need to import the packages we are going to use in the head of the code and load Cora dataset.

import torch
from deeprobust.graph.data import Dataset
from deeprobust.graph.defense import GCN
from deeprobust.graph.global_attack import Metattack
# load dataset
data = Dataset(root=’/tmp/’, name=’cora’, setting=’nettack’)
adj, features, labels = data.adj, data.features, data.labels
idx_train,idx_val,idx_test = data.idx_train,data.idx_val,data.idx_test

Then set up the surrogate model to be attacked.

# set up surrogate model
device = torch.device("cuda:0" \
                      if torch.cuda.is_available() else "cpu")
surrogate = GCN(nfeat=features.shape[1], nclass=labels.max().item()+1,
                nhid=16,with_relu=False, device=device)
surrogate = surrogate.to(device)
surrogate.fit(features, adj, labels, idx_train)

Then we use Metattack to generate perturbations to attack the surrogate model. Here the variable modified_adj is the perturbed graph generated by Metattack.

# use Metattack to generate attacks
model = Metattack(surrogate, nnodes=adj.shape[0],
                  feature_shape=features.shape, device=device)
model = model.to(device)
perturbations = int(0.05 * (adj.sum() // 2))  # set attack budget
model.attack(features, adj, labels, idx_train,
             idx_unlabeled, perturbations)
modified_adj = model.modified_adj

6.2.2 Defend Graph Adversarial Attacks

We show an example of defending graph adversarial attacks. We will use Metattack as the attacking method and GCN-Jaccard as the defense method.

First, we import all the packages we need to use and load the clean graph and pre-attacked graph of Cora dataset.

import torch
from deeprobust.graph.data import Dataset, PtbDataset
from deeprobust.graph.defense import GCN, GCNJaccard
import numpy as np
np.random.seed(15)
# load clean graph
data = Dataset(root=’/tmp/’, name=’cora’, setting=’nettack’)
adj, features, labels = data.adj, data.features, data.labels
idx_train,idx_val,idx_test = data.idx_train,data.idx_val,data.idx_test
# load pre-attacked graph by mettack
perturbed_data = PtbDataset(root=’/tmp/’, name=’cora’)
perturbed_adj = perturbed_data.adj

Then we set up the defense model GCN-Jaccard and test it performance on the perturbed graph.

# Set up defense model and test performance
device = torch.device("cuda:0" \
                      if torch.cuda.is_available() else "cpu")
model = GCNJaccard(nfeat=features.shape[1], nclass=labels.max()+1,
                   nhid=16, device=device)
model = model.to(device)
model.fit(features, perturbed_adj, labels, idx_train)
model.eval()
output = model.test(idx_test)

As a comparison, we can also set up GCN model and test its performance on the perturbed graph.

# Test GCN on the perturbed graph
model = GCN(nfeat=features.shape[1], nclass=labels.max()+1,
            nhid=16, device=device)
model = model.to(device)
model.fit(features, perturbed_adj, labels, idx_train)
model.eval()
output = model.test(idx_test)

7 Conclusion

Our main goal is to provide a comprehensive, easy-to-use platform for researchers who are interested in adversarial attack and defense. In the future, we would support larger datasets and more model architectures. Moreover, we will keep including the newest models and updating this repository.

References

Appendix A Environment Dependencies

Dependency Version
torch 1.2.0
torchvision 0.4.0
numpy 1.17.1
matplotlib 3.1.1
scipy 1.3.1
Pillow 7.0.0
scikit_learn 0.22.1
skimage 0
tensorboardX 2
tqdm 4.42.1
texttable 1.6.2
numba 0.48.0
Table 1: Dependencies

Appendix B Structure tree

|___    LICENSE
|___    README.md
|___    adversary_examples
|___    deeprobust
|   |___    __init__.py
|   |___    graph
|   |   |___    README.md
|   |   |___    __init__.py
|   |   |___    black_box.py
|   |   |___    data
|   |   |   |___    __init__.py
|   |   |   |___    attacked_data.py
|   |   |   |__ dataset.py
|   |   |___    defense
|   |   |   |___    __init__.py
|   |   |   |___    adv_training.py
|   |   |   |___    gcn.py
|   |   |   |___    gcn_preprocess.py
|   |   |   |___    r_gcn.py
|   |   |   |__ r_gcn.py.backup
|   |   |___    examples
|   |   |   |___    test_adv_train_evasion.py
|   |   |   |___    test_adv_train_poisoning.py
|   |   |   |___    test_dice.py
|   |   |   |___    test_fgsm.py
|   |   |   |___    test_gcn.py
|   |   |   |___    test_gcn_jaccard.py
|   |   |   |___    test_gcn_svd.py
|   |   |   |___    test_mettack.py
|   |   |   |___    test_nettack.py
|   |   |   |___    test_nipa.py
|   |   |   |___    test_random.py
|   |   |   |___    test_rgcn.py
|   |   |   |___    test_rl_s2v.py
|   |   |   |___    test_rnd.py
|   |   |   |__ test_topology_attack.py
|   |   |___    global_attack
|   |   |   |___    __init__.py
|   |   |   |___    base_attack.py
|   |   |   |___    dice.py
|   |   |   |___    mettack.py
|   |   |   |___    nipa.py
|   |   |   |___    random.py
|   |   |   |__ topology_attack.py
|   |   |___    requirements.txt
|   |   |___    rl
|   |   |   |___    env.py
|   |   |   |___    nipa.py
|   |   |   |___    nipa_config.py
|   |   |   |___    nipa_env.py
|   |   |   |___    nipa_nstep_replay_mem.py
|   |   |   |___    nipa_q_net_node.py
|   |   |   |___    nstep_replay_mem.py
|   |   |   |___    q_net_node.py
|   |   |   |___    rl_s2v.py
|   |   |   |___    rl_s2v_config.py
|   |   |   |__ rl_s2v_env.py
|   |   |___    targeted_attack
|   |   |   |___    __init__.py
|   |   |   |___    base_attack.py
|   |   |   |___    evaluation.py
|   |   |   |___    fgsm.py
|   |   |   |___    nettack.py
|   |   |   |___    rl_s2v.py
|   |   |   |__ rnd.py
|   |   |__ utils.py
|   |__image
|       |___    README.md
|       |___    __init__.py
|       |___    adversary_examples
|       |___    attack
|       |   |___    BPDA.py
|       |   |___    Nattack.py
|       |   |___    Universal.py
|       |   |___    YOPOpgd.py
|       |   |___    __init__.py
|       |   |___    base_attack.py
|       |   |___    cw.py
|       |   |___    deepfool.py
|       |   |___    fgsm.py
|       |   |___    l2_attack.py
|       |   |___    lbfgs.py
|       |   |___    onepixel.py
|       |   |__ pgd.py
|       |___    config.py
|       |___    data
|       |___    defense
|       |   |___    LIDclassifier.py
|       |   |___    TherEncoding.py
|       |   |___    YOPO.py
|       |   |___    __init__.py
|       |   |___    advexample_pgd.png
|       |   |___    base_defense.py
|       |   |___    fast.py
|       |   |___    fgsmtraining.py
|       |   |___    pgdtraining.py
|       |   |___    test_PGD_defense.py
|       |   |___    trade.py
|       |   |__ trades.py
|       |___    evaluation_attack.py
|       |___    netmodels
|       |   |___    CNN.py
|       |   |___    CNN_multilayer.py
|       |   |___    YOPOCNN.py
|       |   |___    __init__.py
|       |   |___    resnet.py
|       |   |___    train_model.py
|       |   |__ train_resnet.py
|       |___    optimizer.py
|       |___    synset_words.txt
|       |__ utils.py
|___    examples
|   |___    graph
|   |   |___    test_adv_train_evasion.py
|   |   |___    test_adv_train_poisoning.py
|   |   |___    test_dice.py
|   |   |___    test_fgsm.py
|   |   |___    test_gcn.py
|   |   |___    test_gcn_jaccard.py
|   |   |___    test_gcn_svd.py
|   |   |___    test_mettack.py
|   |   |___    test_nettack.py
|   |   |___    test_nipa.py
|   |   |___    test_random.py
|   |   |___    test_rgcn.py
|   |   |___    test_rl_s2v.py
|   |   |__ test_rnd.py
|   |__ image
|       |___    __init__.py
|       |___    __pycache__
|       |   |___    __init__.cpython-36.pyc
|       |   |__ test_cw.cpython-36.pyc
|       |___    test1.py
|       |___    test_PGD.py
|       |___    test_cw.py
|       |___    test_deepfool.py
|       |___    test_fgsm.py
|       |___    test_lbfgs.py
|       |___    test_nattack.py
|       |___    test_onepixel.py
|       |___    test_pgdtraining.py
|       |___    test_trade.py
|       |___    test_train.py
|       |__ testprint_mnist.py
|___    get-pip.py
|___    requirements.txt
|___    setup.py
|___    tree.md
|__ tutorials
    |___    __init__.py
    |___    test1.py
    |___    test_PGD.py
    |___    test_cw.py
    |___    test_deepfool.py
    |___    test_fgsm.py
    |___    test_lbfgs.py
    |___    test_nattack.py
    |___    test_onepixel.py
    |___    test_pgdtraining.py
    |___    test_trade.py
    |___    test_train.py
    |__ testprint_mnist.py