Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness against Adversarial Attack

11/22/2018 ∙ by Adnan Siraj Rakin, et al. ∙ University of Central Florida 0

Recent development in the field of Deep Learning have exposed the underlying vulnerability of Deep Neural Network (DNN) against adversarial examples. In image classification, an adversarial example is a carefully modified image that is visually imperceptible to the original image but can cause DNN model to misclassify it. Training the network with Gaussian noise is an effective technique to perform model regularization, thus improving model robustness against input variation. Inspired by this classical method, we explore to utilize the regularization characteristic of noise injection to improve DNN's robustness against adversarial attack. In this work, we propose Parametric-Noise-Injection (PNI) which involves trainable Gaussian noise injection at each layer on either activation or weights through solving the min-max optimization problem, embedded with adversarial training. These parameters are trained explicitly to achieve improved robustness. To the best of our knowledge, this is the first work that uses trainable noise injection to improve network robustness against adversarial attacks, rather than manually configuring the injected noise level through cross-validation. The extensive results show that our proposed PNI technique effectively improves the robustness against a variety of powerful white-box and black-box attacks such as PGD, C & W, FGSM, transferable attack and ZOO attack. Last but not the least, PNI method improves both clean- and perturbed-data accuracy in comparison to the state-of-the-art defense methods, which outperforms current unbroken PGD defense by 1.1 data respectively using Resnet-20 architecture.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Neural Networks (DNNs) have achieved great success in a variety of applications, including but not limited to image classification [1], speech recognition [2], machine translation [3], and autonomous driving [4]. Despite the remarkable accuracy imrovement [5], recent studies [6, 7, 8] have shown that DNNs are vulnerable to adversarial examples. In image classification task, an adversarial example is a natural image intentionally perturbed by visually imperceptible variation, but can cause drastic classification accuracy degradation. Fig. 1 provides an illustration of adversarial example and its original counterpart. In addition to image classification, attacks to other DNN-powered tasks have also been actively investigated, such as visual question answering [9, 10], image captioning [11], semantic segmentation [12, 10] and etc [13, 14, 15].

Figure 1:

Adversarial attack miss-classifying a cat image to a hen with higher confidence.

There has been a cohort of works on generating adversarial attacks and developing corresponding defense methods. The adversarial attacks can be categorized as white-box attack and black-box attack based on the attacker’s knowledge to the target model. For white-box attack [6, 8], the adversary has full access to the network architecture and parameters. Whereas, only the input and output to the network can be externally accessed by the black-box attacks [16, 17, 18]. White-box attack can often achieve high success rates for various applications [6, 19, 20, 21, 22, 23, 18, 8].

Recently, different works [24] have viewed the problem of adversarial examples from an unified perspective of model robustness and regularization. Conventional regularization mainly serves the purpose of reducing the generation error, thus preventing model from overfiting the training set. Traditional regularization methods have been effective in neural network training. For example, dropout [25]

, Batch Normalization (BN)

[26] and quantization [27, 28] all serve the purpose of model regularization. However, BN is specifically effective in convolution networks and dropout is applicable for fully connected network. Hinton discusses that adding Gaussian noise into the model (input, weight and activation) during training performs as a regularizer in his lecture note111https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec9.pdf and dropout work [25].

It is evident that a more general model regularization method specifically directed for improving neural network robustness can serve the purpose of defending adversarial example more effectively. Recently, different works have implemented noise regularization method both during training and inference phases [29, 30, 31, 24]. In this work, we propose to improve neural network robustness against adversarial attack by regularizing the training process through adding noise during the training phase. We believe adversarial defense methodologies that focus on defending the network at the inference only will eventually fall short in the advent of new attack methods. Thus a more general regularized training method can generate robust DNN to defend against a wide range of attacks.

Overview of our approach: In this work, we propose a novel noise injection method called Parametric Noise Injection (PNI) to improve neural network robustness against adversarial attack. It has the flexibility to inject trainable noise at the input (to whole network), activation and weights during both training and inference. The proposed PNI is embedded with well-known adversarial training, where Gaussian Noise with trainable parameters could adjust injected noise level at each neural network layer. In this work, we conduct a wide range of white-box and black-box adversarial attack experiments to demonstrate the effectiveness of our proposed PNI method accross different popular DNN architectures. Our simulation shows accuracy improvement for both clean data accuracy and under attack accuracy. PNI achieves a 1.1 % improvement on the clean test data on Resnet-20 compared to Vanilla Resnet-20 with adversarial training. Along with the improvement on clean test data, our defense shows 6.8% improvement on the test accuracy under PGD wite box attack. Additionally, Our result shows improved robustness under FGSM, C & W attack and various black-box attack.

2 Related works

2.1 Adversarial Attack

Recently, various powerful adversarial attack methods have been proposed to totally fool a trained deep neural network through introducing barely visible perturbation upon input data. Several state-of-the-art white-box (i.e., PGD [32], FGSM [7] and C&W [8]) and black-box (i.e., Substitute [33] and ZOO [18]) adversarial attack method are briefly introduced as follows.

FGSM Attack:

Fast Gradient Sign Method (FGSM) [6] is a single-step efficient adversarial attack method, which alters each element of nature sample

along the direction of its gradient w.r.t the loss function

. The generation of adversarial example can be described as:

(1)

where the attack is followed by a clipping operation to ensure the . The attack strength is determined by the perturbation constraint .

PGD Attack:

Projected Gradient Descent (PGD) [32] is the multi-step variant of FGSM, which is one of the strongest adversarial example generation algorithm. With as the initialization, the iterative update of perturbed data can be expressed as:

(2)

where is the projection space which is bounded by , is the step index up to , and is the step size. Madry et al. [32] proposed that PGD is a universal adversary among all the first-order adversaries (i.e., attacks only rely on first-order information).

C & W Attack:

In C & W attack method, Carlini and Wagner [8] consider the generation of adversarial example as an optimization problem, which optimize the -norm of distance metric w.r.t the given input data , which can be described as:

(3)
(4)

where is taken as the perturbation added upon the input data, and a proper loss function is chosen in [8] to to solve the optimization problem via gradient descent method. is a constant set by attacker. In this work, we use -norm based C&W attack and take

as the evaluation metric to measure the network’s robustness, where a higher value of

indicates a more robust network or potential failure of the attack.

Black-box Attacks:

The most popular black-box attack is conducted using a substitute model [33], where the attacker trains a substitute model to mimic the functionality of target model, then use the adversarial example generated from the substitute model to attack target model. In this work, we specifically investigate the transferable adversarial attack [16], which is a variant of substitute model attack. In transferable adversarial attack, the adversarial example is generated from one source model to attack another target model. The source model and target can own the absolutely different structure but trained on the identical dataset. Moreover, Zero-th Order Optimization (ZOO) attack [18] is also considered. Rather than training a substitute model, it directly approximates the gradient of target model just based on the input data and output scores using stochastic gradient coordinate.

2.2 Adversarial Defenses:

Improving network robustness by training the model with adversarial examples [6, 32] is the most popular defense approach now-a-days. Most of later works have followed this path to supplement their defense with adversarial training [34, 35]. The first step in adversarial training is to choose an attack model to generate adversarial examples. Adopting Projected Gradient Descent (PGD) based attack model to adversarial training is becoming popular since it can generate universal adversarial examples among the first order approaches [32]. Additionally, among many recent defense methods, only PGD based adversarial training can sustain state-of-the-art accuracy under attacks [8, 6, 23]. The reported DNN accuracy in CIFAR10 dataset remains a major success to defend very strong adversarial attacks [23].

Recent works have merged the concept of improving model robustness through regularization to defend adversarial examples. Among them, an unified perspective of regularization and robustness was presented by [24]. Again, randomly pruning some activation during the inference [36] or randomizing the input layer [37] serve the purpose of injecting randomness to somehow prevent the attacker from accessing the gradient. However, these approaches achieve good success against gradient based attacks at the cost of obfuscated gradient [23].

In order to make the model more robust to adversarial attack, several works have adopted the concept of adding a noise layer just before convolution layer during both training and inference phases [29, 38]

. Even though we agree with the core idea of these works as they certainly makes the model more robust, but there are some fundamental advantages of our work compared to theirs. PNI improves the model robustness by regularizing the model while training more effectively. As classical machine learning demonstrated weight noise performs the regularization even better

[25]. We also show experimentally that particularly adding noise to the weights improves the robustness even more. While these works [29, 30] have chosen level of noise to be injected manually, we propose to inject different level of noise at different layers using trainable parameters. As choosing the level of noise manually for different layers even by validation set is not practically feasible.

3 Approach

In this section, we first introduce the proposed Parametric Noise Injection (PNI) function and will investigate the impact of noise injection on input (to the whole DNN), weight and activation.

3.1 Parametric Noise Injection

Definition.

The method that we propose to inject noise to different components or locations within DNN can be described as:

(5)
(6)

where

is the element of noise-free tensor

, and such can be input/weight/inter-layer tensor in this work.

is the additive noise term which follows the Gaussian distribution with zero mean and standard deviation

, and is the coefficient scales the magnitude of injected noise . We adopt the scheme that shares the identical standard deviation of as in Eq. 6, thus the injected additive noise is correlated to the distribution of and simultaneously. Moreover, rather than manually configuring to restrict the noise level, we set as learnable parameter which can be optimized for network robustness improvement. We name such method as Parametric Noise Injection (PNI). Considering the over-parameterization and the convergence of training , we make the element-wise noise term () shares the same scaling coefficient across the entire tensor. Assuming we performs the proposed PNI on the weight tensors of convolution/fully-connected layers throughout entire DNN, for each parametric layer there is only one layer-wise noise scaling coefficient to be optimized. We takes such layer-wise configuration as default in this work.

Optimization

In this work, we treat the noise scaling coefficient as a model parameter which can be optimized through back-propagation training process. For configuration which shares the noise scaling coefficient layer-wise, the gradient computation can be described as:

(7)

where the takes the summation over the entire tensor , and is the gradient back-propagated from the followed layers. The gradient calculation of the PNI function is:

(8)

It is noteworthy that even though

is a Gaussian random variable, each sample of

is taken as a constant during the back-propagation. Using the gradient descent optimizer with momentum, the optimization of at step can be written as:

(9)

where is the momentum, is the learning rate, and is the updating velocity. Moreover, since weight decay tends to make the learned noise scaling coefficient converge to zero, there is no weight decay term on the during the parameter updating in this work. We set as default initialization.

Robust Optimization.

We expect to utilize the aforementioned PNI technique to improve the network robustness. However, directly optimizing the noise scaling coefficient normally leads to converge at a small close-to-zero value, owing to the model optimization tends to over-fit the training dataset (referring to Table 1).

In order to succeed in adversarial defense, we jointly use the PNI method with robust optimization (a.k.a. Adversarial Training) which can boost the inference accuracy for the perturbed data under attack. Given inputs- and target labels- , the adversarial training is to obtain the optimal solution of network parameter for the following min-max problem:

(10)

where the inner maximization tends to acquire the perturbed data , and is the input data perturb set constrained by . While the outer minimization is optimized through gradient descent method as regular network training. PGD attack [32] is adopted as the default inner maximization solver (i.e., generating ). Note that, in order to prevent the label leaking during adversarial training, the perturbed data is generated through taking the predicted result of as the label (i.e. in Eq. 2).

Moreover, in order to balance the clean data accuracy and perturbed data accuracy for practical application, rather than performing the outer minimization solely on the loss of perturbed data as in Eq. 10, we minimize the ensemble loss which is the weighted sum of losses for clean- and perturbed-data. The ensemble loss is described as:

(11)

where and are the weights for clean data loss and adversarial data loss. is the default configuration in this work. Optimizing the ensemble loss with gradient decent method leads to successful training of for both the model’s inherent parameter (e.g. weight, bias) and the add-on noise scaling coefficient from PNI.

4 Experiments

4.1 Experiment setup

Datasets and network architectures.

The CIFAR-10 [39] dataset is composed of 50K training samples and 10K test samples of 3232 color image. For CIFAR-10, the classical Residual Networks [40] (ResNet-20/32/44/56) architecture are used, and ResNet-20 is taken as the baseline for most of the comparative experiments and ablation studies. A redundant network ResNet-18 is also used to report the performance for CIFAR-10, since large network capacity is helpful for adversarial defense. Moreover, rather than including the input normalization within the data augmentation, we place a non-trainable data normalization layer in front of the DNN to perform the identical function, thus attacker can directly add the perturbation on the nature image. Note that, since both PNI and PGD attack [32] include randomness, we report the accuracy in the format of meanstd% with 5 trials to alleviate error.

Adversarial attacks.

To evaluate the performance of our proposed PNI technique, we employ multiple powerful white-box and black-box attacks as introduced in Section 2.1. For PGD attack on MNIST and CIFAR-10, is set to 0.3/1 and 8/255, and is set to 40 and 7 respectively. FGSM attack adopt the same setup as PGD. The attack configurations of PGD and FGSM are identical as the setup in [34, 32]. For C&W attack, we set the constant as 0.01. ADAM [41] is used to optimize the Eq. 4 with learning rate as . We choose 0 for the confidence coefficient , which is defined in used by C&W attack in [8]. The binary search steps for the attack is 9, while number of iteration to perform the gradient descent is 10. Moreover, We also conduct the PNI defense against several state-of-the-art black-box attacks (i.e. substitute [33], ZOO [18] and transferable [16] attack) in a Section 4.2.2 to examine the robustness improvement resulted from the proposed PNI technique.

Competing methods for adversarial defense.

As far as we know, the adversarial training with PGD [32] is the only unbroken defense method [23], which is labeled as vanilla adversarial training and taken as the baseline in this work. Beyond that, several recent works also utilize similar concept as ours in their defense method are discussed as well, including certified robustness [30] and random self-ensemble [29].

4.2 PNI for adversarial attacks

4.2.1 PNI against white-box attacks

Layer
Index
Vanilla
Traning
PNI-W+Adv. Train.
(without PNI in
generation)
PNI-W+Adv. Train.
(with PNI in
generation)
Conv0 0.003 0.004 0.146
Conv1.0 0.002 0.005 0.081
Conv1.1 0.004 0.004 0.049
Conv1.2 0.002 0.001 0.097
Conv1.3 0.004 5.856 0.771
Conv1.4 0.005 0.005 0.004
Conv1.5 0.002 0.001 0.006
Conv2.0 0.004 0.000 0.006
Conv2.1 0.006 0.003 0.004
Conv2.2 0.004 0.003 0.030
Conv2.3 0.001 0.006 0.003
Conv2.4 0.003 0.001 0.033
Conv2.5 0.002 0.001 0.023
Conv3.0 0.007 0.001 0.008
Conv3.1 0.003 0.001 0.006
Conv3.2 0.007 0.002 0.001
Conv3.3 0.006 0.001 0.002
Conv3.4 0.009 0.002 0.001
Conv3.5 0.005 0.000 0.001
FC 0.002 0.002 0.001
Clean 92.11% 71.00% 84.890.11%
PGD 0.000.00% 18.11% 45.940.11%
FGSM 14.08% 26.34% 54.480.44%
Table 1: Convergence of PNI: ResNet-20 with Layerwise weight PNI on CIFAR-10 dataset. (Top) The converged layer-wise noise scaling coefficient under various training scheme. (Bottom) Test accuracy for clean- and perturbed-data under PGD and FGSM attack.
Figure 2: The evolution curve of trainable noise scaling coefficient for layerwise PNI on weight (PNI-W). Only front 5 layers (bold in Table 1) of ResNet-20 [40]

are shown. The learning rate of SGD optimizer is reduced at 80 and 120 epoch.

Optimization method of PNI

As the aforementioned discussion in Section 3.1, the noise scaling coefficient will not be properly trained without utilizing the adversarial training (i.e., solving the min-max problem). We conduct the experiments for training the layer-wise PNI on weight (PNI-W) of ResNet-20, to compare the convergence of trained noise. As tabulated in Table 1, simply performing the vanilla training using momentum SGD optimizer totally fails the adversarial defense, where the noise scaling coefficients are converged to the negligible values. On the contrary, with the aid of adversary training (i.e., optimization of Eq. 11), convolution layers in the network’s front-end has obtained relatively large which are the bold values in Table 1, and the corresponding evolution curve are shown in Fig. 2.

Since the PGD attack [32] is taken as the inner maximization solver, the generation of adversarial example in Eq. 2 is reformatted as:

(12)

where the difference between Eq. 2 and Eq. 12 is with/without PNI within generation. It is noteworthy that, keeping the noise term in the model for both adversarial example generation (Eq. 12) and model parameter update is the critical factor for the PNI optimization with adversarial training, since the optimization of is also a min-max game. Increasing noise level enhances the defense strength, but hampers the network inference accuracy for natural clean image. Lowering , however, makes the network vulnerable to adversarial attack. As listed in Table 1, without PNI-W in generation indeed leads to the failure of PNI optimization, and the large value ( in Table 1

) is not converged due to the probable gradient explosion.

Test with PNI Test without PNI
Clean PGD FGSM Clean PGD FGSM
Vanilla adv. train [32] - - - 83.84 39.140.05 46.55
PNI-W 84.890.11 45.940.11 54.480.44 85.48 31.450.07 42.55
PNI-I 85.100.08 43.250.16 50.780.16 84.82 34.870.05 44.07
PNI-A-a 85.220.18 43.830.10 51.410.08 85.20 33.930.05 44.32
PNI-A-b 84.660.16 43.630.20 51.260.09 83.97 33.530.05 43.37
PNI-W+A-a 85.120.10 43.570.12 51.150.21 84.88 33.230.05 43.59
PNI-W+A-b 84.330.11 43.800.19 51.140.07 84.42 33.300.05 43.43
Table 2: Effect of PNI location: The ResNet-20 [40] clean- and perturbed-data (under PGD and FGSM attack) accuracy (meanstd%) on CIFAR-10 test-set, with PNI technique on different network location. Baseline is the ResNet-20 with vanilla adversarial training, and all the PNI combinations are optimized through adversarial training by default.
No defense Vanilla adv. train
PNI-W+adv. train
(Test with PNI)
PNI-W+adv. train
(Test without PNI)
Model Capacity Clean PGD FGSM Clean PGD FGSM Clean PGD FGSM Clean PGD FGSM
Net20 269,722 92.1 0.00.0 14.1 83.8 39.10.1 46.6 84.90.1 45.90.1 54.50.4 85.5 31.60.1 42.6
Net32 464,154 92.8 0.00.0 17.8 85.6 42.10.0 50.3 85.90.1 43.50.3 51.50.1 86.4 35.30.1 45.5
Net44 658,586 93.1 0.00.0 23.9 85.9 40.80.1 48.2 84.70.2 48.50.2 55.80.1 86.0 39.60.1 49.9
Net56 853,018 93.3 0.00.0 24.2 86.5 40.10.1 48.8 86.80.2 46.30.3 53.90.1 87.3 41.60.1 51.1
Net20(1.5) 605,026 93.5 0.00.0 15.9 85.8 42.00.0 49.6 86.00.1 46.70.2 54.50.2 87.0 38.40.1 49.1
Net20(2) 1,073,962 94.0 0.00.0 13.0 86.3 43.10.1 52.6 86.20.1 46.10.2 54.60.2 86.8 39.10.0 50.3
Net20(4) 4,286,026 94.0 0.00.0 14.2 87.5 46.10.1 54.1 87.70.1 49.10.3 57.00.2 88.1 43.80.1 54.2
Table 3: Effect of network depth and width: The clean- and perturbed-data (under PGD and FGSM attack) accuracy (meanstd%) on CIFAR-10 test-set, utilizing different robust optimization configurations. For network depth, the classical ResNet-20/32/44/56 with increasing depth is reported. For network width, the ResNet-20 (1) is adopted as the baseline, then we compare the wide ResNet-20 with the input and output channel scaled by 1.5/2/4. Capacity denotes the number of trainable parameters in the model.
Effect of PNI on weight, activation and input.

In this work, even though the scheme of injecting noise on the weight (PNI-W) is taken as the default PNI setup, more results about PNI on activation (PNI-A-a/b), input (PNI-I) and hybrid-mode (e.g. PNI-W+A) are provided in Table 2 for a comprehensive study. PNI-A-a/PNI-A-b denotes injecting noise on the output/input tensor of the convolution/fully-connected layer respectively. Moreover, PNI-A-b scheme intrinsically includes the PNI-I, since PNI-I is applying the noise on the input tensor of first layer. Note that, all models with PNI variants are jointly trained with PGD-based adversarial training [32] as discussed above. Then, with the same trained model, we report the accuracy with/without the trained noise term (left/right in Table 2) during the test phase. As shown in Table 2, with the noise term enabled during test phase, PNI-W on ResNet-20 gives the best performance to defend PGD and FGSM attack, in comparison to PNI on other locations. Although it is elusive to fully understand the mechanism that PNI-W outperforms other counterparts, the intuition is that PNI-W is the generalization of PNI-A in each connection instead of each output unit, similar as relation between the regularization technique DropConnect [42] and Dropout [25].

Furthermore, we also observe that disabling PNI during test phase leads to significant accuracy drop for defending PGD and FGSM attack, while the clean-data accuracy maintains the same level as PNI enabled. Such observation raises two concerns about our PNI techniques: 1) Does the improvement of clean-/perturbed-data accuracy with PNI mainly comes from the attack strength reduction caused by the randomness (potential gradient obfuscation [23])? 2) Is PNI just an negligible trick or it performs the model regularization to construct a more robust model? Our answers to both questions are negative, where the explanations are elaborated under Section 5.

Effect of network capacity.

In order to investigate the relation between network capacity (i.e., number of trainable parameters) and robustness improvement by PNI, we examine various network architectures in terms of both depth and width. For different network depths, experiments on ResNet 20/32/44/56 [40] are conducted under vanilla adversarial training [32] and our proposed PNI robust optimization method. For different network widths, we adopt the original ResNet-20 as baseline and expand its input&output channel of each layer by 1.5/2/4 respectively. Same as Table 2, we report clean- and perturbed-data accuracy with/without PNI term during the test phase. The results in Table 3 indicates that increasing the model’s capacity indeed improves network robustness against white-box adversarial attacks, and our proposed PNI outperforms vanilla adversary training in terms of both clean-data accuracy and perturbed data accuracy for PGD and FGSM attack. Such observation demonstrates that the perturbed-data accuracy improvement does not come from trading off clean-data accuracy as reported in [34, 43]. Through increasing the network capacity, the drop perturbed-data accuracy, when disabling the PNI noise term during test phase, also becomes less significant. Although both adversarial training and PNI techniques perform regularization, the network structure still needs careful construction to prevent the over-fitting resulted from over-parameterization.

Robustness evaluation with C&W attack.

Improved robustness does not necessarily mean improving the test data accuracy against any particular attack method. Typically norm based C & W attack [8] should reach 100 % success rate against any defense. Thus average norm required to fool the network gives more insight about a network’s robustness in general [8]. The result presented in Table 4 represents the overall performance of our model against C & W attack. Our method of training the noise parameter becomes more effective for more redundant network. We demonstrate this phenomena by performing comparison study between Resnet-20 and Resnet-18 architecture. Clearly Resnet-18 shows the improvement in robustness from Vanilla adv. training much more than Resnet-20 against C & W attack.

CW L2-norm
Model capacity No defense Vanilla adv. train PNI-W
ResNet-20 (4x) 4,286,026 0.12 1.97 1.97
ResNet-18 11,173,962 0.12 2.39 2.63
Table 4: C & W attack norm comparison

4.2.2 PNI against black-box attack

In this section, we test our proposed PNI technique against transferable adversarial attack [16] and ZOO attack. Following the transferable adversarial attack [16], two trained neural network are taken as the source model () and target model (). The adversarial examples is generated from the source model then attack the target model using , which is denoted as . We take ResNet-18 on CIFAR-10 as an example. We train two ResNet-18 model (model-A and B) on CIFAR-10 dataset to attack each other, where model-A is optimized through vanilla adversarial training, while model-B is trained using our proposed PNI variants (i.e., PNI-W/A-a/W+A-a) robust optimization method. Table 5 shows almost equal perturbed-data accuracy for A B and B A under various PNI scenarios, which indicates that our PNI technique does not reduce the attack strength.

Transferable attack ZOO attack
Train. scheme of B A B B A success rate
PNI-W 75.130.17 75.230.18 57.72
PNI-A-a 74.670.11 75.860.13 69.61
PNI-W+A-a 75.140.10 74.920.13 50.00
Table 5: PNI against black-box attacks: On CIFAR-10 test-set, (Left) perturbed-data accuracy under transferable PGD attack, and (Right) the attack success rate for ZOO attack. Model-A is a ResNet-18 trained by vanilla adversarial training, and Model-B is a ResNet-18 trained by PNI-W/A-a/W+A-a with adversarial training.

For ZOO attack[18], we test our defense on 200 randomly selected test samples for un-targeted attack. The Attack success rate denotes the percentage of test sample change their classification to a wrong class after attack. ZOO attack success rate for vanilla Resnet-18 with adversarial training is close to  80 %. The robustness of PNI is more evident from Table 5 as the attack success rate drops significantly for PNI-W+A-a and PNI-W. However, PNI-A-a fails to resist ZOO attack even though it still maintains a lower success rate than baseline. The failure of PNI-A-a shows that just adding noise in-front of the activation does not necessarily achieves the desired robustness as claimed by some of the previous defenses [30, 29].

4.2.3 Comparison to competing methods

As discussed in Section 2.2, a large number of adversarial defense works have been proposed recently, however most of them are already broken by stronger attacks proposed in [44, 23]. As a result, in this work we choose to compare with the most effective one till date - PGD based adversarial training [32]. Additionally, we compare with other randomness-based works [29, 30] in Table 6 for examining the effectiveness of PNI.

Defense method model Clean PGD
PGD adv. train [32] ResNet-20 (4) 87 46.1
DP [30]
28-10 Wide ResNet
(L=0.1)
87.0 25
RSE [29] ResNext 87.5 40
PNI-W (this work) ResNet-20 (4) 87.7 49.1
Table 6: Comparison of state-of-the-art adversarial defense methods with clean- and perturbed-data accuracy on CIFAR-10 under PGD attack.

Previous defense works [43, 34] have shown a trade-off between clean-data accuracy and perturbed-data accuracy, where the perturbed-data accuracy improvement normally at the cost of lowering the clean-data accuracy. It is worthy to highlight that our proposed PNI improves both clean- and perturbed data accuracy under white-box attack, in comparison to PGD-based adversarial training [32]. Differential Privacy (DP) [30] is a similar method of utilizing noise injection at various locations within the network. Although their defense guarantees a certified defense it does not perform well against -norm based attack (e.g., PGD and FGSM). In order to achieve a higher level of certified defense, DP significantly sacrifices the clean-data accuracy as well. Another randomness-based approach is Random Self-ensemble (RSE) [29], which inserts noise-layer before all the convolution layer. Even though their defense performs well against C & W attack but poor against strong PGD attack. In our black-box attack simulation Table 5 we demonstrate that adding activation noise may not be as effective as weight noise. Beyond that, both DP and RSE manually configure the noise level which is extremely difficult to find the optimal setup. Whereas, in our proposed PNI method, the noise level is determined by a trainable layer-wise noise scaling coefficient and distribution of noise injected location.

5 Discussion

The defense performance improvement led by our proposed PNI does not come from the stochastic gradients. The stochastic gradient is considered to incorrectly approximate the true gradient based on a single sample. We try to show that PNI is not relying on the gradient obfuscation from two perspectives: 1) Our proposed PNI method passes each inspection item proposed by [23] to identify gradient obfuscation. 2) Under PGD attack, through increasing the attack steps, our PNI robust optimization method still outperforms vanilla adversarial training (certified as non-obfuscated gradients in [23]).

Characteristics to identify gradient obfuscation Pass Fail
1. One-step attack performs better than iterative attacks
2. Black-box attacks are better than white-box attacks
3. Unbounded attacks do not reach 100% success
4. Random sampling finds adversarial examples
5. Increasing distortion bound doesn’t increase success
Table 7: Checklist of examining the characteristic behaviors caused by obfuscated and masked gradient [23] for PNI.
Inspections of gradient obfuscation.

The famous gradient obfuscation work [23] enumerates several characteristic behaviors as listed in Table 7 which can be observed when the defense method owns gradient obfuscation. Our experiments show that PNI passes each inspection item in Table 7.

For item.1, all the experiments in Table 2 and Table 3 report that FGSM attack (one-step) performs worse than PGD attack (iterative). For item.2, our black-box attack experiment in Table 5 shows that the black-box attack strength is worse than white-box attack. For items.3, as plotted in Fig. 3, we run experiments through increasing the distortion bound-. The result shows that the unbounded attacks do lead to 0% accuracy under attack. For item.4, the prerequisite is the gradient-based attack (e.g., PGD and FGSM) cannot find the adversarial examples, however the experiments in Fig. 3 reveals that our method still can be broken when increasing the distortion bound. It just increases the resistance against the adversarial attacks, in comparison to the vanilla adversarial training. For item.5, again as shown in Fig. 3, increasing the distortion bound increase the attack success rate.

Figure 3: On CIFAR-10 test set, the perturbed-data accuracy of ResNet-18 under PGD attack (Top) versus attack bound , and (Bottom) versus number of attack steps
PNI does not rely on stochastic gradients.

As shown in Fig. 3, gradually increasing the PGD attack steps raises the attack strength [32], thus leading to perturbed-data accuracy degradation for both vanilla adversary training and our PNI technique. However, for both cases the perturbed-data accuracy start saturating and do not degrade any further when . If our PNI’s success comes from the stochastic gradient which gives incorrect gradient owing to the single sample, increasing the attack steps suppose to eventually break the PNI defense which is not observed here. Our PNI method still outperforms vanilla adversarial training even when is increased up to 100. Therefore, we can draw the conclusion that, even if PNI does include gradient obfuscation, the stochastic gradient is not the dominant role in PNI for the robustness improvement.

6 Conclusion

In this paper, we present a parametric noise injection technique where the noise intensity can be trained through solving the min-max optimization problem during adversarial training. Through extensive experiments, the proposed PNI method can outperforms the state-of-the-art defense method in terms of both clean-data accuracy and perturbed-data accuracy.

References