Defensive Dropout for Hardening Deep Neural Networks under Adversarial Attacks

09/13/2018 ∙ by Siyue Wang, et al. ∙ Boston University Northeastern University Florida International University 0

Deep neural networks (DNNs) are known vulnerable to adversarial attacks. That is, adversarial examples, obtained by adding delicately crafted distortions onto original legal inputs, can mislead a DNN to classify them as any target labels. This work provides a solution to hardening DNNs under adversarial attacks through defensive dropout. Besides using dropout during training for the best test accuracy, we propose to use dropout also at test time to achieve strong defense effects. We consider the problem of building robust DNNs as an attacker-defender two-player game, where the attacker and the defender know each others' strategies and try to optimize their own strategies towards an equilibrium. Based on the observations of the effect of test dropout rate on test accuracy and attack success rate, we propose a defensive dropout algorithm to determine an optimal test dropout rate given the neural network model and the attacker's strategy for generating adversarial examples.We also investigate the mechanism behind the outstanding defense effects achieved by the proposed defensive dropout. Comparing with stochastic activation pruning (SAP), another defense method through introducing randomness into the DNN model, we find that our defensive dropout achieves much larger variances of the gradients, which is the key for the improved defense effects (much lower attack success rate). For example, our defensive dropout can reduce the attack success rate from 100 13.89

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Deep neural networks (DNNs) are powerful models that achieve extraordinary performance in various speech and vision tasks, including speech recognition (Hinton et al., 2012a)

, natural language processing

(Collobert and Weston, 2008)

, scene understanding

(Karpathy and Fei-Fei, 2015), and object recognition (Krizhevsky et al., 2012; LeCun et al., 1998; LeCun et al., 1989). However, recent studies (Szegedy et al., 2013; Kurakin et al., 2016; Goodfellow et al., 2014) show that DNNs are vulnerable to adversarial attacks implemented by generating adversarial examples, i.e., adding imperceptible but well-designed distortions to the original legal inputs. Delicately crafted adversarial examples can mislead a DNN to classify them as any target labels, while they appear recognizable and visually normal to human eyes.

Evidences have shown that audio/visual inputs sound/look like speeches/objects to DNNs but non-sense to humans (Carlini et al., 2016; Nguyen et al., 2015). Recently Kurakin, Goodfellow, and Bengio have demonstrated the existence of adversarial attacks not only in theoretical models but also the physical world (Kurakin et al., 2016). They mimicked the scenario of physical world application of DNNs by feeding the adversarial examples to a DNN through a cellphone camera to find that adversarial examples remain mis-classified by the DNN even when perceived through a camera.

Concerns have been aroused for applying DNNs in security-critical tasks. The security properties of DNNs have been widely investigated from two aspects: (i) crafting adversarial examples to test the vulnerability of DNNs and (ii) enhancing the robustness of DNNs under adversarial attacks. For the former aspect, adversarial examples have been generated by solving optimization problems (Szegedy et al., 2013; Carlini and Wagner, 2017; Chen et al., 2017; Athalye et al., 2018; Goodfellow et al., 2014; Papernot et al., 2016b). For the later aspect, research works have been conducted by either filtering out added distortions (Guo et al., 2017; Bhagoji et al., 2017; Dziugaite et al., 2016; Xie et al., 2017) or revising DNN models (Papernot et al., 2016c; Dhillon et al., 2018; Feinman et al., 2017) to defend against adversarial attacks. These two aspects mutually benefit each other towards hardening DNNs under adversarial attacks.

In this work, we provide a new solution to hardening DNNs under adversarial attacks through defensive dropout. Dropout is a commonly used regulation method to deal with the overfitting issue due to limited training data (Srivastava et al., 2014). As a regulation method, dropout is applied during training that for each training case in a mini-batch, a sub-network is sampled by dropping some of the units (i.e., neural nodes). Based on some observations from preliminary experiments, we propose to use dropout also at test time as a defense method against adversarial attacks. By introducing dropout to test time, we achieve shorter and fatter distributions of the gradients, which is the key for the improved defense effects (lower attack success rate) compared with another model-randomness-based defense method i.e., stochastic activation pruning (SAP) (Dhillon et al., 2018). For MNIST dataset, our defensive dropout reduces the attack success rate from 100% to 13.89% under the currently strongest attack i.e., C&W attack (Carlini and Wagner, 2017), while the distillation as a defense (Papernot et al., 2016c) and the adversarial training (Tramèr et al., 2017) are totally vulnerable under C&W attack. For CIFAR dataset, our defensive dropout reduces the attack success rate to 43.33%, while SAP can only reduce the attack success rate to 77.78% under C&W attack based on the same neural network model.

The contributions of this work are summarized as following:

(i) We consider the problem of building robust DNNs as an attacker-defender two-player game, where the attacker and the defender know each others’ strategies and try to optimize their own strategies towards an equilibrium.

(ii) We propose a defensive dropout algorithm that determines an optimal test dropout rate given the neural network model and the attacker’s strategy for generating adversarial examples. Basically, we need to trade-off between the defense effects and test accuracy.

(iii) We explain the mechanism behind the outstanding defense effects by the proposed defensive dropout. The shorter and fatter gradient distributions make it difficult for the attacker to generate adversarial examples using the gradients from the sampled sub-networks.

2. Related Work

2.1. Preliminaries

In this paper we focus on neural networks used as image classifiers. In this case, the input images can be denoted as 3-dimensional tensors

, where , and denote the height, width and number of channels. For a gray scale image (e.g. MNIST), ; and for a colored RGB image (e.g. CIFAR-10), . For both attacks and defends, all pixel values in the images are scaled to for easy calculation, and therefore a valid input image should be inside a unit cube in the high dimensional space. We use model to denote a neural network, where accepts an input and generates an output .

Suppose the neural network is an -class classifier and the output layer performs softmax operation. Let

denote the output of all layers except for the softmax layer, and we have

. The input to the softmax layer,

, is called logits. The element

of the output vector

represents the probability that input

belongs to the -th class. The output vector

is treated as a probability distribution and its elements satisfy

and . The neural network classifies input according to the maximum probability i.e., .

The adversarial attack can be either targeted or untargeted. Given an original legal input with its correct label and a target label , the targeted adversarial attack is to find an input such that and and are close according to some measure of the distortion between and . The input is then called as an adversarial example. The untargeted adversarial attack is to find an input satisfying and and are close according to some measure of the distortion. The untargeted adversarial attack does not specify any target label to mislead the classifier. In this work, we consider targeted adversarial attacks since they are believed stronger than untargeted attacks.

The general problem of constructing adversarial examples can be formulated as: Given an original legal input ,

(1)

where is the distortion added onto input , is a measure of the added distortion.

We need to measure the distortion between the original legal input and the adversarial example . norms are the most commonly used measures in the literature. The norm of the distortion is defined as:

(2)

We see the use of , , , and norms in different attacks.

  • norm: measures the number of mismatched elements between and .

  • norm: measures the sum of the absolute values of the differences between and .

  • norm: measures the standard Euclidean distance between and .

  • norm: measures the maximum difference between and for all ’s.

2.2. Attacks

2.2.1. Fault Injection (Liu et al., 2017):

published in ICCAD 2017, proposes two kinds of fault injection attacks that only require slight changes to the DNN’s parameters to achieve misclassification: single bias attack (SBA) and gradient descent attack (GDA). SAB is able to achieve misclassification by modifying only one bias value in the network. And GDA achieves higher stealthiness and efficiency by using layer-wise searching and modification compression techniques. It implements very efficient attacks on MNIST and CIFAR-10 datasets. This work perceives the DNN attack problem from a different angle, i.e., modifying the DNN models, while all the other attacks and defends mentioned in this paper assume the modifications are performed onto the inputs.

2.2.2. Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014):

is an

attack and utilizes the gradient of the loss function to determine the direction to modify the pixels. They are designed to be fast, rather than optimal. They can be used for adversarial training by directly changing the loss function instead of explicitly injecting adversarial examples into the training data. FGSM generates adversarial examples following:

(3)

where is the magnitude of the added distortion,

is the target label. Using backpropagation, FGSM calculates the gradient of the loss function with respect to the label

to determine the direction to change the pixel values.

2.2.3. Jacobian-based Saliency Map Attack (JSMA) (Papernot et al., 2016b):

is an attack and uses a greedy algorithm that picks the most influential pixels by calculating Jacobian-based Saliency Map and modifies the pixels iteratively. The computational complexity of this attack method is very high.

2.2.4. C&w (Carlini and Wagner, 2017):

is a series of , , and attacks that achieve 100% attack success rate with much lower distortions comparing with the above-mentioned attacks. In particular, the C&W attack is superior to other attacks because it uses a better objective function. C&W formulates the problem of generating adversarial examples in an alternative way that can be better optimized:

(4)

where is a constant to be chosen and objective function has the following form:

(5)

Here, is a parameter that controls the confidence in attacks. Stochastic gradient decent methods can be used to solve this problem. For example, the Adam optimizer (Kingma and Ba, 2015) is used due to its fast and robust convergence behavior.

2.3. Defense Methods

2.3.1. Adversarial Training (Tramèr et al., 2018):

injects adversarial examples with correct labels into the training dataset and then retrains the neural network, thus increasing robustness of DNNs under adversarial attacks.

2.3.2. Distillation as a Defense (Papernot et al., 2016c):

introduces temperature into the softmax layer and uses a higher temperature for training and a lower temperature for testing. The training phase first trains a teacher model that can produce soft labels for the training dataset and then trains a distilled model using the training dataset with soft labels. The distilled model with reduced temperature will be preserved for testing. The modified softmax function utilized in the distilled model is given by:

(6)

where is the -th logit corresponding to the last hidden layer output before softmax.

2.3.3. Stochastic Activation Pruning (SAP) (Dhillon et al., 2018):

SAP uses normalization to calculate the multinomial distribution regarding every activation layer. For every hidden activation vector at the -th layer, SAP defines the probability of whether or not to prune the -th activation output with the following,

(7)

The remaining activation outputs are scaled up according to the pruning probability and the number of outputs kept at this layer by using the reweighting factor,

(8)

where is the number of outputs sampled.

3. Proposed Defensive Dropout

3.1. Dropout Preliminaries

Figure 1. (a) A standard neural network with 2 hidden layers. (b) A sub-network produced by applying dropout. Units in grey color are dropped from the whole network.

Motivated by the mixability theory in the evolutionary biology (Livnat et al., 2010)

, dropout was proposed as a regularization method in machine learning to prevent the overfitting issue with limited training data

(Srivastava et al., 2014). The term “dropout” refers to dropping out units (hidden and visible) in a neural network. By dropping a unit out, the unit along with all its incoming and outgoing connections are temporarily removed from the network. Fig. 1 shows applying dropout to a neural network amounts to sampling a sub-network from it.

The feedforward operation of a standard neural network can be described as:

(9)
(10)

where denotes the vector of outputs from layer , denotes the vector of inputs to layer , and

are weight matrix and bias vector of layer

, and

is any activation function. With dropout, the feedforward operation becomes

(Hinton et al., 2012b):

(11)
(12)
(13)
(14)

where

is a vector of independent Bernoulli random variables, each with probability

of being 1, and denotes an element-wise product.

A neural network with units can been seen as a collection of sampled sub-networks, which all share weights so that the total number of parameters is still

. During training a neural network with dropout, stochastic gradient descent is used as in standard training, except that for each training case in a mini-batch, a sub-network is sampled by dropping out units, and forward and backpropagation for that training case are done only on this sub-network. The gradients for each parameter are averaged over the training cases in each mini-batch. Therefore, training a neural network with

units using dropout can be seen as training a collection of sub-networks with extensive weight sharing.

Figure 2. (a) At training time, a unit presents with probability . (b) At test time, the unit is always present and the outgoing weights are multiplied by .

The purpose of applying dropout is to prevent units from co-adapting too much by combining the predictions of many sub-networks with shared weights. However, at test time, it is not feasible to explicitly average the predictions from exponentially many sub-networks. A very simple approximate averaging method is to use a single neural network at test time without dropout, the weights of which are scaled-down versions of the trained weights. If a unit presents with probability during training with dropout, the outgoing weights of that unit are multiplied by at test time, as shown in Fig. 2. This ensures that the output at test time is the same as the expected output at training time.

3.2. Defensive Dropout Implementations in Training and Test

Dropout is a commonly used regularization method. To achieve very good test accuracy, in practice dropout is usually applied to units in the fully-connected layer close to the output layer of the neural network (Goodfellow et al., 2016). Also, the dropout rate , instead of the probability of presence for a unit is used during training with dropout (Bouthillier et al., 2015). For each training case in a mini-batch, the units are dropped with rate and a sub-network is sampled for the training case. The gradient for each parameter (weight) is then calculated based on the sampled sub-network. Please note that, for a unit with rate of being dropped, if it presents in the sub-network, we need to divide the output of its activation function by when evaluating the loss function for gradient calculation. This is for making the output at test time roughly the same as the expected output at training time. If a parameter is not used in the sub-network, a zero gradient is set for that. Gradients for each parameter are averaged over all training cases in the mini-batch. When dropout is applied as a regulation method to deal with overfitting, at test time the whole neural network without dropout is used, but with the output of the activation function divided by if the unit is dropped with rate during training.

Intuitively, introducing randomness into the test time can also help to harden deep neural networks against adversarial attacks. Therefore, we propose to apply dropout also at the test time as a defense method. If dropout was applied to units in a specific layer during training with dropout rate , we are going to apply dropout to the same layer at test time with dropout rate . For each test case, units are dropped with rate and a sub-network is sampled for it. To have roughly the same expected output as the whole neural network at test time, we also need to scale-up the activation functions of the retained units in the dropout layer of the sub-network by . Please note that is optimized during deep neural network training, and is the optimization variable of our defense method to be derived by the Algorithm in Section 3.4.

Our work aims at defending against the strongest attacks, i.e., the white-box attacks, that is, the attacker has perfect information about the neural network architecture and parameters. Therefore, when we defend against adversarial attacks, we assume the attacker knows not only the complete neural network model but also the stochastics in the model (i.e., which layer applied dropout and the dropout rates and ). Specifically, when the attacker generating adversarial examples by solving an optimization problem based on stochastic gradient descent, the gradients are calculated in the similar manner as training with dropout i.e., using sampled sub-networks and the activation functions scaled-up by . By doing this, we give the attacker full access to the neural network model and we are able to evaluate our defense method against the strongest white-box attacks.

3.3. Observations and Motivations

(a) Test Accuracy (b) C&W Attack Success Rate (c) C&W Attack Norm
Figure 3. (a) Test accuracy, (b) Attack success rate, and (c) norm under C&W attack on MNIST dataset using different training dropout rates and test dropout rates.

We perform some preliminary experiments that motivate and support our defensive dropout method. We pick the currently strongest attack, i.e., C&W attack (Carlini and Wagner, 2017) and experiment on different training dropout rates and different test dropout rates to analyze test accuracy and defense effects. Fig. 3 presents the results, where the x-axes denote test dropout rate and each curve represents one training dropout rate. From Fig. 3 (a), we can observe that test accuracy decreases with increasing test dropout rate. Also, a training dropout rate of 0.3 achieves the highest test accuracy for MNIST dataset. The increase in test accuracy is more prominent in other datasets. For example, a training dropout rate of 0.7 increases the test accuracy by 7.5% in our neural network model on CIFAR-10 dataset. In summary, the decrease in test accuracy due to the test dropout rate can be compensated in some extent by the training dropout rate.

Fig. 3 (b) and (c) demonstrate the defense effects of training and test dropout rates. In general, increasing test dropout rate can reduce the attack success rate, see Fig. 3 (b). And the norm of the added distortions in the adversarial examples reaches a peak at certain test dropout rate, see Fig. 3 (c). The solution for defending against C&W attack on MNIST dataset is to use a test dropout rate of 0.5 when the training dropout rate is 0.3, which loses 0.8% test accuracy but decreases the attack success rate from 100% to 13.89% with the largest norm of the distortion (the peak point in Fig. 3 (c)), indicating that the added distortion in the adversarial examples might be large enough to be recognized by humans. Please note that under C&W attack, the distillation as a defense (Papernot et al., 2016c) and the adversarial training (Tramèr et al., 2017) could not decrease the attack success rate at all (Carlini and Wagner, 2017), i.e., still 100% attack success rate under these defense methods.

Figure 4. Probability density of sampled gradients when generating adversarial example using C&W attack on CIFAR-10. The same neural network architecture, the same original input image, and the same training dropout rate of 0.7 for the best test accuracy is used throughout (a)(f). Histogram and the corresponding fitted probability density curve in each color denote one out of dimensions in the sampled gradients and include 50 data points. (a)(e) are from our proposed defensive dropout for different test dropout rate, and (f) is from stochastic activation pruning (SAP).

We also investigate the mechanism behind the outstanding defense effects achieved by the proposed defensive dropout. It is intuitive to explain the defense effects as model randomness through adding dropout at test time. However, there are also other defenses through introducing model randomness, such as stochastic activation pruning (SAP) (Dhillon et al., 2018) and migration through randomization (MTR) (Xie et al., 2017), that only achieve limited defense effects against C&W. We are trying to explain the mechanism in a different way. Fig. 4 is plotted for the probability density of uniformly sampled gradients when generating adversarial example using C&W attack on CIFAR-10 dataset. Please note that the gradients have dimensions, and therefore we select 5 dimensions for visualization, each presented by a color and each containing 50 data points throughout Fig. 4 (a)(f). Also for fair comparison, we use the same neural network model with training dropout rate of 0.7 for the best test accuracy, and the same original input image for Fig. 4 (a)(f), including our defensive dropout and SAP.

From Fig. 4 (a)(e), with increasing test dropout rate, the probability densities become shorter and fatter, demonstrating increasing variances of the gradients, which is the key for the improved defense effects (decreasing attack success rate) with increasing test dropout rate. The larger variances of the gradients, the more difficult for the attacker to generate effective adversarial examples by using stochastic gradient descent when solving the optimization problem. It cross-validates the conclusion from Fig. 3 (a) and (b). Of course, we could not use the largest test dropout rate for the strongest defense effects, because the test accuracy might be very low. We need to trade-off defense effects for the test accuracy. Fig. 4 (f) is the probability densities of the gradients from stochastic activation pruning (SAP) (Dhillon et al., 2018), which shows very small variances of the gradients comparing with our defense method. That is the reason our defensive dropout outperforms SAP.

3.4. Defensive Dropout Algorithm

Towards hardening deep neural networks under adversarial attacks, the attacker and the defender improve their own strategies like in a two-player game. In such a game, the attacker and the defender know each others’ strategies and try to optimize their own strategies towards an equilibrium. The defender can benefit from the improvement of the attacker’s strategy. Therefore, we need to take into consideration the attacker’s strategy of generating adversarial examples when designing our defense.

Based on the observations on the test dropout rate’s effect on test accuracy and attack success rate, we design the defensive dropout algorithm that helps us to determine an optimal test dropout rate given the neural network model and the attacker’s strategy for generating adversarial examples. In the algorithm, we also optimize the training dropout rate along with the test dropout rate, but that is only for the purpose of training neural network for the best test accuracy. Basically, we first train a neural network model by finding a proper training dropout rate . Then we fix the model and search for the largest test dropout rate for the strongest defense effects while satisfying the test accuracy requirement. Pseudo code of the defensive dropout algorithm is given in Algorithm 1.

0:  : Dataset : Neural Network Model : Attacker Strategy (e.g., C&W, FGSM, JSMA) : Maximum decrease of test accuracy
0:  : Test Dropout Rate : Training Dropout Rate
1:  Retrain neural network model using a proper training dropout rate and obtain the best test accuracy ;
2:  ;
3:  ;
4:  while  do
5:      step_size;
6:     Generate adversarial examples using attacker strategy , neural network model , and test dropout rate ;
7:     Evaluate test accuracy using neural network model and test dropout rate ;
8:     Evaluate attack success rate using neural network model and test dropout rate ;
9:  end while
Algorithm 1 Defensive Dropout Algorithm

4. Experimental Results

4.1. Setup

As in other attack and defense work, we are also using the two datasets: MNIST and CIFAR-10. The MNIST dataset (Modified National Institute of Standards and Technology database) (Yann et al., 1998) is a collection of handwritten digits that is commonly used for training and test various machine learning tasks. It consists of ( training images and test images) grey-scale images for digits to . The CIFAR-10 dataset (Canadian Institute For Advanced Research) (Krizhevsky and Hinton, 2009) is a collection of color images. It contains images in different classes (e.g., cars, birds, airplanes, etc.).

For DNN models in our experiment, we use standard convolutional neural networks with 4 convolutional layers and 2 fully-connected layers. This architecture has been used as standard model in many previous work

(Carlini and Wagner, 2017; Papernot et al., 2016c)

. While the overall neural network architecture for MNIST or CIFAR-10 dataset is the same, the size of the neural network model for CIFAR-10 is slightly larger than that for MNIST, since CIFAR-10 images have higher resolution. The activation function is rectified linear unit (ReLU) for all convolutional and fully connected layers. The architectures of the neural network models for MNIST and CIFAR-10 are summarized in Table

1. In both models we apply defensive dropout to Fully connected layer 1. After training, the models achieve the state-of-the-art 99.4% and 80% test accuracies for MNIST dataset and CIFAR-10 dataset, respectively.

Model for MNIST Model for CIFAR-10
Conv layer 32 filters with size (3,3) 64 filters with size (3,3)
Conv layer 32 filters with size (3,3) 64 filters with size (3,3)
Pooling layer pool size (2,2) pool size (2,2)
Conv layer 64 filters with size (3,3) 128 filters with size (3,3)
Conv layer 64 filters with size (3,3) 128 filters with size (3,3)
Pooling layer pool size (2,2) pool size (2,2)
Fully connected 1 200 units 256 units
Fully connected 2 200 units 256 units
Output layer 10 units 10 units
Table 1. Architectures of neural network models for MNIST and CIFAR-10

We implement FGSM, JSMA, and C&W attacks based on the CleverHans package (Papernot et al., 2016a). For FGSM, we use a fixed as suggested in the original paper (Goodfellow et al., 2014). For JSMA, we use the codes from CleverHans directly. For C&W, we perform iterations of binary search for constant . With a selected , we then run iterations of gradient decent with the Adam optimizer. We compare with other defenses such as adversarial training (Tramèr et al., 2018), distillation as a defense (Papernot et al., 2016c), and stochastic activation pruning (SAP) (Dhillon et al., 2018), among which we implement SAP ourselves, and defense effects of the adversarial training and distillation as a defense are cited from (Carlini and Wagner, 2017).

4.2. Results

We use two metrics to evaluate the defense effects against adversartial attacks, i.e., attack success rate (ASR) and norm of the distortion. The lower attack sucssess rate and the higher norm imply the stronger defense effects.

(a) Test Accuracy (b) C&W Attack Success Rate (c) C&W Attack Norm
Figure 5. (a) Test accuracy, (b) Attack success rate, and (c) norm under C&W attack on CIFAR-10 dataset using different training dropout rates and test dropout rates.
train 0 train 0.1 train 0.3 train 0.5 train 0.7 train 0.9

Test acc.
72.07% 75.69% 76.39% 78.12% 78.15% 68.35%
C&W ASR 54.44% 64.44% 77.78% 78.89% 85.56% 70%
norm 0.504 0.522 0.618 0.498 0.679 0.784
Table 2. Test accuracy, attack success rate, and norm using SAP against C&W attack on CIFAR-10.
Dropout rate test 0 test 0.1 test 0.3 test 0.5 test 0.7 test 0.9
train 0 32.48%  –  –  –  –  –


train 0.5
15.87% 14.46% 13.89%  –  –  –
Table 3. Attack success rate using our defensive dropout against FGSM attack on CIFAR-10.

We first compare with SAP on the defense effects against C&W attack using CIFAR-10 dataset. The results of SAP are summarized in Table 2. The results of our defensive dropout are summarized through Fig. 5. In Table 2, we perform SAP on neural network models using different training dropout rates (in columns). The second to forth rows report test accuracy, attack success rate (ASR), and norm. If we allow test accuracy decrease within 4%, the SAP can reduce the attack success rate from 100% to 77.78% with a test accuracy of 76.39%. From Fig. 5, we can observe that at training dropout rate of 0.7 and test dropout rate of 0.7, our defensive dropout can reduce the attack success rate to 43.33% with a test accuracy of 77%, demonstrating superior defense effects to SAP. Also the norm of our defensive dropout is around 1.1 which is much higher than that of SAP (i.e., 0.618 from Table 2). Table 3 shows the attack success rate using our defensive dropout against FGSM attack on CIFAR-10, observing that the defensive dropout reduces attack success rate from 32.48% to 13.89% at test accuracy of 77.89%. In general, attack success rate of FGSM is much lower than C&W, because it is a faster, but not optimal attack.

We also summarize the results on MNIST dataset using our defensive dropout against FGSM, JSMA, and C&W attacks, respectively, in Tables 4, 5, and 6, with 1% test accurary drop. For FGSM, defensive dropout reduces attack success rate from 40.67% to 16.44%. For JSMA, defensive dropout reduces attack success rate from 91.89% to 26.78%. For C&W, defensive dropout reduces attack success rate from 100% to 13.89%. Please note that adversarial training and distillation as a defense are totally vulnerable to C&W attack (Carlini and Wagner, 2017).

Dropout rate test 0 test 0.1 test 0.3 test 0.5 test 0.7 test 0.9


train 0.7
22.74% 21.89% 20.67% 19.56% 16.44%  –
Table 4. Attack success rate using our defensive dropout against FGSM attack on MNIST.
Dropout rate test 0 test 0.1 test 0.3 test 0.5 test 0.7 test 0.9


train 0.7
90.67% 60.56% 43.78% 35.67% 26.78%  –
Table 5. Attack success rate using our defensive dropout against JSMA attack on MNIST.
Dropout rate test 0 test 0.1 test 0.3 test 0.5 test 0.7 test 0.9


train 0.3
100% 24.66% 24.00% 13.89%  –  –
Table 6. Attack success rate using our defensive dropout against C&W attack on MNIST.

5. Conclusion

In this paper, we propose defensive dropout for hardening deep neural networks under adversarial attacks. Considering the problem of building robust DNNs as an attacker-defender two-player game, we provide a defensive dropout algorithm that determines an optimal test dropout rate given the neural network model and the attacker’s strategy for generating adversarial examples. We also explain the mechanism behind the outstanding defense effects achieved by the proposed defensive dropout.

Acknowledgements

This work is supported by the National Science Foundation (CCF-1733701, CNS-1618379, DMS-1737897, and CNS-1840813), Air Force Research Laboratory FA8750-18-2-0058, Naval Research Laboratory. We thank researchers at the US Naval Research Laboratory for their comments on previous drafts of this paper.

References