1 Introduction
Deep Neural Networks (DNNs) have achieved great success in a variety of applications, including but not limited to image classification [1], speech recognition [2], machine translation [3], and autonomous driving [4]. Despite the remarkable accuracy imrovement [5], recent studies [6, 7, 8] have shown that DNNs are vulnerable to adversarial examples. In image classification task, an adversarial example is a natural image intentionally perturbed by visually imperceptible variation, but can cause drastic classification accuracy degradation. Fig. 1 provides an illustration of adversarial example and its original counterpart. In addition to image classification, attacks to other DNNpowered tasks have also been actively investigated, such as visual question answering [9, 10], image captioning [11], semantic segmentation [12, 10] and etc [13, 14, 15].
There has been a cohort of works on generating adversarial attacks and developing corresponding defense methods. The adversarial attacks can be categorized as whitebox attack and blackbox attack based on the attacker’s knowledge to the target model. For whitebox attack [6, 8], the adversary has full access to the network architecture and parameters. Whereas, only the input and output to the network can be externally accessed by the blackbox attacks [16, 17, 18]. Whitebox attack can often achieve high success rates for various applications [6, 19, 20, 21, 22, 23, 18, 8].
Recently, different works [24] have viewed the problem of adversarial examples from an unified perspective of model robustness and regularization. Conventional regularization mainly serves the purpose of reducing the generation error, thus preventing model from overfiting the training set. Traditional regularization methods have been effective in neural network training. For example, dropout [25]
, Batch Normalization (BN)
[26] and quantization [27, 28] all serve the purpose of model regularization. However, BN is specifically effective in convolution networks and dropout is applicable for fully connected network. Hinton discusses that adding Gaussian noise into the model (input, weight and activation) during training performs as a regularizer in his lecture note^{1}^{1}1https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec9.pdf and dropout work [25].It is evident that a more general model regularization method specifically directed for improving neural network robustness can serve the purpose of defending adversarial example more effectively. Recently, different works have implemented noise regularization method both during training and inference phases [29, 30, 31, 24]. In this work, we propose to improve neural network robustness against adversarial attack by regularizing the training process through adding noise during the training phase. We believe adversarial defense methodologies that focus on defending the network at the inference only will eventually fall short in the advent of new attack methods. Thus a more general regularized training method can generate robust DNN to defend against a wide range of attacks.
Overview of our approach: In this work, we propose a novel noise injection method called Parametric Noise Injection (PNI) to improve neural network robustness against adversarial attack. It has the flexibility to inject trainable noise at the input (to whole network), activation and weights during both training and inference. The proposed PNI is embedded with wellknown adversarial training, where Gaussian Noise with trainable parameters could adjust injected noise level at each neural network layer. In this work, we conduct a wide range of whitebox and blackbox adversarial attack experiments to demonstrate the effectiveness of our proposed PNI method accross different popular DNN architectures. Our simulation shows accuracy improvement for both clean data accuracy and under attack accuracy. PNI achieves a 1.1 % improvement on the clean test data on Resnet20 compared to Vanilla Resnet20 with adversarial training. Along with the improvement on clean test data, our defense shows 6.8% improvement on the test accuracy under PGD wite box attack. Additionally, Our result shows improved robustness under FGSM, C & W attack and various blackbox attack.
2 Related works
2.1 Adversarial Attack
Recently, various powerful adversarial attack methods have been proposed to totally fool a trained deep neural network through introducing barely visible perturbation upon input data. Several stateoftheart whitebox (i.e., PGD [32], FGSM [7] and C&W [8]) and blackbox (i.e., Substitute [33] and ZOO [18]) adversarial attack method are briefly introduced as follows.
FGSM Attack:
Fast Gradient Sign Method (FGSM) [6] is a singlestep efficient adversarial attack method, which alters each element of nature sample
along the direction of its gradient w.r.t the loss function
. The generation of adversarial example can be described as:(1) 
where the attack is followed by a clipping operation to ensure the . The attack strength is determined by the perturbation constraint .
PGD Attack:
Projected Gradient Descent (PGD) [32] is the multistep variant of FGSM, which is one of the strongest adversarial example generation algorithm. With as the initialization, the iterative update of perturbed data can be expressed as:
(2) 
where is the projection space which is bounded by , is the step index up to , and is the step size. Madry et al. [32] proposed that PGD is a universal adversary among all the firstorder adversaries (i.e., attacks only rely on firstorder information).
C & W Attack:
In C & W attack method, Carlini and Wagner [8] consider the generation of adversarial example as an optimization problem, which optimize the norm of distance metric w.r.t the given input data , which can be described as:
(3) 
(4) 
where is taken as the perturbation added upon the input data, and a proper loss function is chosen in [8] to to solve the optimization problem via gradient descent method. is a constant set by attacker. In this work, we use norm based C&W attack and take
as the evaluation metric to measure the network’s robustness, where a higher value of
indicates a more robust network or potential failure of the attack.Blackbox Attacks:
The most popular blackbox attack is conducted using a substitute model [33], where the attacker trains a substitute model to mimic the functionality of target model, then use the adversarial example generated from the substitute model to attack target model. In this work, we specifically investigate the transferable adversarial attack [16], which is a variant of substitute model attack. In transferable adversarial attack, the adversarial example is generated from one source model to attack another target model. The source model and target can own the absolutely different structure but trained on the identical dataset. Moreover, Zeroth Order Optimization (ZOO) attack [18] is also considered. Rather than training a substitute model, it directly approximates the gradient of target model just based on the input data and output scores using stochastic gradient coordinate.
2.2 Adversarial Defenses:
Improving network robustness by training the model with adversarial examples [6, 32] is the most popular defense approach nowadays. Most of later works have followed this path to supplement their defense with adversarial training [34, 35]. The first step in adversarial training is to choose an attack model to generate adversarial examples. Adopting Projected Gradient Descent (PGD) based attack model to adversarial training is becoming popular since it can generate universal adversarial examples among the first order approaches [32]. Additionally, among many recent defense methods, only PGD based adversarial training can sustain stateoftheart accuracy under attacks [8, 6, 23]. The reported DNN accuracy in CIFAR10 dataset remains a major success to defend very strong adversarial attacks [23].
Recent works have merged the concept of improving model robustness through regularization to defend adversarial examples. Among them, an unified perspective of regularization and robustness was presented by [24]. Again, randomly pruning some activation during the inference [36] or randomizing the input layer [37] serve the purpose of injecting randomness to somehow prevent the attacker from accessing the gradient. However, these approaches achieve good success against gradient based attacks at the cost of obfuscated gradient [23].
In order to make the model more robust to adversarial attack, several works have adopted the concept of adding a noise layer just before convolution layer during both training and inference phases [29, 38]
. Even though we agree with the core idea of these works as they certainly makes the model more robust, but there are some fundamental advantages of our work compared to theirs. PNI improves the model robustness by regularizing the model while training more effectively. As classical machine learning demonstrated weight noise performs the regularization even better
[25]. We also show experimentally that particularly adding noise to the weights improves the robustness even more. While these works [29, 30] have chosen level of noise to be injected manually, we propose to inject different level of noise at different layers using trainable parameters. As choosing the level of noise manually for different layers even by validation set is not practically feasible.3 Approach
In this section, we first introduce the proposed Parametric Noise Injection (PNI) function and will investigate the impact of noise injection on input (to the whole DNN), weight and activation.
3.1 Parametric Noise Injection
Definition.
The method that we propose to inject noise to different components or locations within DNN can be described as:
(5) 
(6) 
where
is the element of noisefree tensor
, and such can be input/weight/interlayer tensor in this work.is the additive noise term which follows the Gaussian distribution with zero mean and standard deviation
, and is the coefficient scales the magnitude of injected noise . We adopt the scheme that shares the identical standard deviation of as in Eq. 6, thus the injected additive noise is correlated to the distribution of and simultaneously. Moreover, rather than manually configuring to restrict the noise level, we set as learnable parameter which can be optimized for network robustness improvement. We name such method as Parametric Noise Injection (PNI). Considering the overparameterization and the convergence of training , we make the elementwise noise term () shares the same scaling coefficient across the entire tensor. Assuming we performs the proposed PNI on the weight tensors of convolution/fullyconnected layers throughout entire DNN, for each parametric layer there is only one layerwise noise scaling coefficient to be optimized. We takes such layerwise configuration as default in this work.Optimization
In this work, we treat the noise scaling coefficient as a model parameter which can be optimized through backpropagation training process. For configuration which shares the noise scaling coefficient layerwise, the gradient computation can be described as:
(7) 
where the takes the summation over the entire tensor , and is the gradient backpropagated from the followed layers. The gradient calculation of the PNI function is:
(8) 
It is noteworthy that even though
is a Gaussian random variable, each sample of
is taken as a constant during the backpropagation. Using the gradient descent optimizer with momentum, the optimization of at step can be written as:(9) 
where is the momentum, is the learning rate, and is the updating velocity. Moreover, since weight decay tends to make the learned noise scaling coefficient converge to zero, there is no weight decay term on the during the parameter updating in this work. We set as default initialization.
Robust Optimization.
We expect to utilize the aforementioned PNI technique to improve the network robustness. However, directly optimizing the noise scaling coefficient normally leads to converge at a small closetozero value, owing to the model optimization tends to overfit the training dataset (referring to Table 1).
In order to succeed in adversarial defense, we jointly use the PNI method with robust optimization (a.k.a. Adversarial Training) which can boost the inference accuracy for the perturbed data under attack. Given inputs and target labels , the adversarial training is to obtain the optimal solution of network parameter for the following minmax problem:
(10) 
where the inner maximization tends to acquire the perturbed data , and is the input data perturb set constrained by . While the outer minimization is optimized through gradient descent method as regular network training. PGD attack [32] is adopted as the default inner maximization solver (i.e., generating ). Note that, in order to prevent the label leaking during adversarial training, the perturbed data is generated through taking the predicted result of as the label (i.e. in Eq. 2).
Moreover, in order to balance the clean data accuracy and perturbed data accuracy for practical application, rather than performing the outer minimization solely on the loss of perturbed data as in Eq. 10, we minimize the ensemble loss which is the weighted sum of losses for clean and perturbeddata. The ensemble loss is described as:
(11) 
where and are the weights for clean data loss and adversarial data loss. is the default configuration in this work. Optimizing the ensemble loss with gradient decent method leads to successful training of for both the model’s inherent parameter (e.g. weight, bias) and the addon noise scaling coefficient from PNI.
4 Experiments
4.1 Experiment setup
Datasets and network architectures.
The CIFAR10 [39] dataset is composed of 50K training samples and 10K test samples of 3232 color image. For CIFAR10, the classical Residual Networks [40] (ResNet20/32/44/56) architecture are used, and ResNet20 is taken as the baseline for most of the comparative experiments and ablation studies. A redundant network ResNet18 is also used to report the performance for CIFAR10, since large network capacity is helpful for adversarial defense. Moreover, rather than including the input normalization within the data augmentation, we place a nontrainable data normalization layer in front of the DNN to perform the identical function, thus attacker can directly add the perturbation on the nature image. Note that, since both PNI and PGD attack [32] include randomness, we report the accuracy in the format of meanstd% with 5 trials to alleviate error.
Adversarial attacks.
To evaluate the performance of our proposed PNI technique, we employ multiple powerful whitebox and blackbox attacks as introduced in Section 2.1. For PGD attack on MNIST and CIFAR10, is set to 0.3/1 and 8/255, and is set to 40 and 7 respectively. FGSM attack adopt the same setup as PGD. The attack configurations of PGD and FGSM are identical as the setup in [34, 32]. For C&W attack, we set the constant as 0.01. ADAM [41] is used to optimize the Eq. 4 with learning rate as . We choose 0 for the confidence coefficient , which is defined in used by C&W attack in [8]. The binary search steps for the attack is 9, while number of iteration to perform the gradient descent is 10. Moreover, We also conduct the PNI defense against several stateoftheart blackbox attacks (i.e. substitute [33], ZOO [18] and transferable [16] attack) in a Section 4.2.2 to examine the robustness improvement resulted from the proposed PNI technique.
Competing methods for adversarial defense.
As far as we know, the adversarial training with PGD [32] is the only unbroken defense method [23], which is labeled as vanilla adversarial training and taken as the baseline in this work. Beyond that, several recent works also utilize similar concept as ours in their defense method are discussed as well, including certified robustness [30] and random selfensemble [29].
4.2 PNI for adversarial attacks
4.2.1 PNI against whitebox attacks






Conv0  0.003  0.004  0.146  
Conv1.0  0.002  0.005  0.081  
Conv1.1  0.004  0.004  0.049  
Conv1.2  0.002  0.001  0.097  
Conv1.3  0.004  5.856  0.771  
Conv1.4  0.005  0.005  0.004  
Conv1.5  0.002  0.001  0.006  
Conv2.0  0.004  0.000  0.006  
Conv2.1  0.006  0.003  0.004  
Conv2.2  0.004  0.003  0.030  
Conv2.3  0.001  0.006  0.003  
Conv2.4  0.003  0.001  0.033  
Conv2.5  0.002  0.001  0.023  
Conv3.0  0.007  0.001  0.008  
Conv3.1  0.003  0.001  0.006  
Conv3.2  0.007  0.002  0.001  
Conv3.3  0.006  0.001  0.002  
Conv3.4  0.009  0.002  0.001  
Conv3.5  0.005  0.000  0.001  
FC  0.002  0.002  0.001  
Clean  92.11%  71.00%  84.890.11%  
PGD  0.000.00%  18.11%  45.940.11%  
FGSM  14.08%  26.34%  54.480.44% 
are shown. The learning rate of SGD optimizer is reduced at 80 and 120 epoch.
Optimization method of PNI
As the aforementioned discussion in Section 3.1, the noise scaling coefficient will not be properly trained without utilizing the adversarial training (i.e., solving the minmax problem). We conduct the experiments for training the layerwise PNI on weight (PNIW) of ResNet20, to compare the convergence of trained noise. As tabulated in Table 1, simply performing the vanilla training using momentum SGD optimizer totally fails the adversarial defense, where the noise scaling coefficients are converged to the negligible values. On the contrary, with the aid of adversary training (i.e., optimization of Eq. 11), convolution layers in the network’s frontend has obtained relatively large which are the bold values in Table 1, and the corresponding evolution curve are shown in Fig. 2.
Since the PGD attack [32] is taken as the inner maximization solver, the generation of adversarial example in Eq. 2 is reformatted as:
(12) 
where the difference between Eq. 2 and Eq. 12 is with/without PNI within generation. It is noteworthy that, keeping the noise term in the model for both adversarial example generation (Eq. 12) and model parameter update is the critical factor for the PNI optimization with adversarial training, since the optimization of is also a minmax game. Increasing noise level enhances the defense strength, but hampers the network inference accuracy for natural clean image. Lowering , however, makes the network vulnerable to adversarial attack. As listed in Table 1, without PNIW in generation indeed leads to the failure of PNI optimization, and the large value ( in Table 1
) is not converged due to the probable gradient explosion.
Test with PNI  Test without PNI  
Clean  PGD  FGSM  Clean  PGD  FGSM  
Vanilla adv. train [32]        83.84  39.140.05  46.55 
PNIW  84.890.11  45.940.11  54.480.44  85.48  31.450.07  42.55 
PNII  85.100.08  43.250.16  50.780.16  84.82  34.870.05  44.07 
PNIAa  85.220.18  43.830.10  51.410.08  85.20  33.930.05  44.32 
PNIAb  84.660.16  43.630.20  51.260.09  83.97  33.530.05  43.37 
PNIW+Aa  85.120.10  43.570.12  51.150.21  84.88  33.230.05  43.59 
PNIW+Ab  84.330.11  43.800.19  51.140.07  84.42  33.300.05  43.43 
No defense  Vanilla adv. train 



Model  Capacity  Clean  PGD  FGSM  Clean  PGD  FGSM  Clean  PGD  FGSM  Clean  PGD  FGSM  
Net20  269,722  92.1  0.00.0  14.1  83.8  39.10.1  46.6  84.90.1  45.90.1  54.50.4  85.5  31.60.1  42.6  
Net32  464,154  92.8  0.00.0  17.8  85.6  42.10.0  50.3  85.90.1  43.50.3  51.50.1  86.4  35.30.1  45.5  
Net44  658,586  93.1  0.00.0  23.9  85.9  40.80.1  48.2  84.70.2  48.50.2  55.80.1  86.0  39.60.1  49.9  
Net56  853,018  93.3  0.00.0  24.2  86.5  40.10.1  48.8  86.80.2  46.30.3  53.90.1  87.3  41.60.1  51.1  
Net20(1.5)  605,026  93.5  0.00.0  15.9  85.8  42.00.0  49.6  86.00.1  46.70.2  54.50.2  87.0  38.40.1  49.1  
Net20(2)  1,073,962  94.0  0.00.0  13.0  86.3  43.10.1  52.6  86.20.1  46.10.2  54.60.2  86.8  39.10.0  50.3  
Net20(4)  4,286,026  94.0  0.00.0  14.2  87.5  46.10.1  54.1  87.70.1  49.10.3  57.00.2  88.1  43.80.1  54.2 
Effect of PNI on weight, activation and input.
In this work, even though the scheme of injecting noise on the weight (PNIW) is taken as the default PNI setup, more results about PNI on activation (PNIAa/b), input (PNII) and hybridmode (e.g. PNIW+A) are provided in Table 2 for a comprehensive study. PNIAa/PNIAb denotes injecting noise on the output/input tensor of the convolution/fullyconnected layer respectively. Moreover, PNIAb scheme intrinsically includes the PNII, since PNII is applying the noise on the input tensor of first layer. Note that, all models with PNI variants are jointly trained with PGDbased adversarial training [32] as discussed above. Then, with the same trained model, we report the accuracy with/without the trained noise term (left/right in Table 2) during the test phase. As shown in Table 2, with the noise term enabled during test phase, PNIW on ResNet20 gives the best performance to defend PGD and FGSM attack, in comparison to PNI on other locations. Although it is elusive to fully understand the mechanism that PNIW outperforms other counterparts, the intuition is that PNIW is the generalization of PNIA in each connection instead of each output unit, similar as relation between the regularization technique DropConnect [42] and Dropout [25].
Furthermore, we also observe that disabling PNI during test phase leads to significant accuracy drop for defending PGD and FGSM attack, while the cleandata accuracy maintains the same level as PNI enabled. Such observation raises two concerns about our PNI techniques: 1) Does the improvement of clean/perturbeddata accuracy with PNI mainly comes from the attack strength reduction caused by the randomness (potential gradient obfuscation [23])? 2) Is PNI just an negligible trick or it performs the model regularization to construct a more robust model? Our answers to both questions are negative, where the explanations are elaborated under Section 5.
Effect of network capacity.
In order to investigate the relation between network capacity (i.e., number of trainable parameters) and robustness improvement by PNI, we examine various network architectures in terms of both depth and width. For different network depths, experiments on ResNet 20/32/44/56 [40] are conducted under vanilla adversarial training [32] and our proposed PNI robust optimization method. For different network widths, we adopt the original ResNet20 as baseline and expand its input&output channel of each layer by 1.5/2/4 respectively. Same as Table 2, we report clean and perturbeddata accuracy with/without PNI term during the test phase. The results in Table 3 indicates that increasing the model’s capacity indeed improves network robustness against whitebox adversarial attacks, and our proposed PNI outperforms vanilla adversary training in terms of both cleandata accuracy and perturbed data accuracy for PGD and FGSM attack. Such observation demonstrates that the perturbeddata accuracy improvement does not come from trading off cleandata accuracy as reported in [34, 43]. Through increasing the network capacity, the drop perturbeddata accuracy, when disabling the PNI noise term during test phase, also becomes less significant. Although both adversarial training and PNI techniques perform regularization, the network structure still needs careful construction to prevent the overfitting resulted from overparameterization.
Robustness evaluation with C&W attack.
Improved robustness does not necessarily mean improving the test data accuracy against any particular attack method. Typically norm based C & W attack [8] should reach 100 % success rate against any defense. Thus average norm required to fool the network gives more insight about a network’s robustness in general [8]. The result presented in Table 4 represents the overall performance of our model against C & W attack. Our method of training the noise parameter becomes more effective for more redundant network. We demonstrate this phenomena by performing comparison study between Resnet20 and Resnet18 architecture. Clearly Resnet18 shows the improvement in robustness from Vanilla adv. training much more than Resnet20 against C & W attack.
CW L2norm  

Model  capacity  No defense  Vanilla adv. train  PNIW 
ResNet20 (4x)  4,286,026  0.12  1.97  1.97 
ResNet18  11,173,962  0.12  2.39  2.63 
4.2.2 PNI against blackbox attack
In this section, we test our proposed PNI technique against transferable adversarial attack [16] and ZOO attack. Following the transferable adversarial attack [16], two trained neural network are taken as the source model () and target model (). The adversarial examples is generated from the source model then attack the target model using , which is denoted as . We take ResNet18 on CIFAR10 as an example. We train two ResNet18 model (modelA and B) on CIFAR10 dataset to attack each other, where modelA is optimized through vanilla adversarial training, while modelB is trained using our proposed PNI variants (i.e., PNIW/Aa/W+Aa) robust optimization method. Table 5 shows almost equal perturbeddata accuracy for A B and B A under various PNI scenarios, which indicates that our PNI technique does not reduce the attack strength.
Transferable attack  ZOO attack  

Train. scheme of B  A B  B A  success rate 
PNIW  75.130.17  75.230.18  57.72 
PNIAa  74.670.11  75.860.13  69.61 
PNIW+Aa  75.140.10  74.920.13  50.00 
For ZOO attack[18], we test our defense on 200 randomly selected test samples for untargeted attack. The Attack success rate denotes the percentage of test sample change their classification to a wrong class after attack. ZOO attack success rate for vanilla Resnet18 with adversarial training is close to 80 %. The robustness of PNI is more evident from Table 5 as the attack success rate drops significantly for PNIW+Aa and PNIW. However, PNIAa fails to resist ZOO attack even though it still maintains a lower success rate than baseline. The failure of PNIAa shows that just adding noise infront of the activation does not necessarily achieves the desired robustness as claimed by some of the previous defenses [30, 29].
4.2.3 Comparison to competing methods
As discussed in Section 2.2, a large number of adversarial defense works have been proposed recently, however most of them are already broken by stronger attacks proposed in [44, 23]. As a result, in this work we choose to compare with the most effective one till date  PGD based adversarial training [32]. Additionally, we compare with other randomnessbased works [29, 30] in Table 6 for examining the effectiveness of PNI.
Defense method  model  Clean  PGD  

PGD adv. train [32]  ResNet20 (4)  87  46.1  
DP [30] 

87.0  25  
RSE [29]  ResNext  87.5  40  
PNIW (this work)  ResNet20 (4)  87.7  49.1 
Previous defense works [43, 34] have shown a tradeoff between cleandata accuracy and perturbeddata accuracy, where the perturbeddata accuracy improvement normally at the cost of lowering the cleandata accuracy. It is worthy to highlight that our proposed PNI improves both clean and perturbed data accuracy under whitebox attack, in comparison to PGDbased adversarial training [32]. Differential Privacy (DP) [30] is a similar method of utilizing noise injection at various locations within the network. Although their defense guarantees a certified defense it does not perform well against norm based attack (e.g., PGD and FGSM). In order to achieve a higher level of certified defense, DP significantly sacrifices the cleandata accuracy as well. Another randomnessbased approach is Random Selfensemble (RSE) [29], which inserts noiselayer before all the convolution layer. Even though their defense performs well against C & W attack but poor against strong PGD attack. In our blackbox attack simulation Table 5 we demonstrate that adding activation noise may not be as effective as weight noise. Beyond that, both DP and RSE manually configure the noise level which is extremely difficult to find the optimal setup. Whereas, in our proposed PNI method, the noise level is determined by a trainable layerwise noise scaling coefficient and distribution of noise injected location.
5 Discussion
The defense performance improvement led by our proposed PNI does not come from the stochastic gradients. The stochastic gradient is considered to incorrectly approximate the true gradient based on a single sample. We try to show that PNI is not relying on the gradient obfuscation from two perspectives: 1) Our proposed PNI method passes each inspection item proposed by [23] to identify gradient obfuscation. 2) Under PGD attack, through increasing the attack steps, our PNI robust optimization method still outperforms vanilla adversarial training (certified as nonobfuscated gradients in [23]).
Characteristics to identify gradient obfuscation  Pass  Fail 

1. Onestep attack performs better than iterative attacks  ✓  
2. Blackbox attacks are better than whitebox attacks  ✓  
3. Unbounded attacks do not reach 100% success  ✓  
4. Random sampling finds adversarial examples  ✓  
5. Increasing distortion bound doesn’t increase success  ✓ 
Inspections of gradient obfuscation.
The famous gradient obfuscation work [23] enumerates several characteristic behaviors as listed in Table 7 which can be observed when the defense method owns gradient obfuscation. Our experiments show that PNI passes each inspection item in Table 7.
For item.1, all the experiments in Table 2 and Table 3 report that FGSM attack (onestep) performs worse than PGD attack (iterative). For item.2, our blackbox attack experiment in Table 5 shows that the blackbox attack strength is worse than whitebox attack. For items.3, as plotted in Fig. 3, we run experiments through increasing the distortion bound. The result shows that the unbounded attacks do lead to 0% accuracy under attack. For item.4, the prerequisite is the gradientbased attack (e.g., PGD and FGSM) cannot find the adversarial examples, however the experiments in Fig. 3 reveals that our method still can be broken when increasing the distortion bound. It just increases the resistance against the adversarial attacks, in comparison to the vanilla adversarial training. For item.5, again as shown in Fig. 3, increasing the distortion bound increase the attack success rate.
PNI does not rely on stochastic gradients.
As shown in Fig. 3, gradually increasing the PGD attack steps raises the attack strength [32], thus leading to perturbeddata accuracy degradation for both vanilla adversary training and our PNI technique. However, for both cases the perturbeddata accuracy start saturating and do not degrade any further when . If our PNI’s success comes from the stochastic gradient which gives incorrect gradient owing to the single sample, increasing the attack steps suppose to eventually break the PNI defense which is not observed here. Our PNI method still outperforms vanilla adversarial training even when is increased up to 100. Therefore, we can draw the conclusion that, even if PNI does include gradient obfuscation, the stochastic gradient is not the dominant role in PNI for the robustness improvement.
6 Conclusion
In this paper, we present a parametric noise injection technique where the noise intensity can be trained through solving the minmax optimization problem during adversarial training. Through extensive experiments, the proposed PNI method can outperforms the stateoftheart defense method in terms of both cleandata accuracy and perturbeddata accuracy.
References
 [1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
 [2] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012.
 [3] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
 [4] Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving: Learning affordance for direct perception in autonomous driving. In Computer Vision (ICCV), 2015 IEEE International Conference on, pages 2722–2730. IEEE, 2015.
 [5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
 [6] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
 [7] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
 [8] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 39–57. IEEE, 2017.
 [9] Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darell, and Dawn Song. Can you fool ai with adversarial examples on a visual turing test? arXiv preprint arXiv:1709.08693, 2017.
 [10] Naveed Akhtar and Ajmal Mian. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 6:14410–14430, 2018.
 [11] Hongge Chen, Huan Zhang, PinYu Chen, Jinfeng Yi, and ChoJui Hsieh. Showandfool: Crafting adversarial examples for neural image captioning. arXiv preprint arXiv:1712.02051, 2017.
 [12] Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, and Volker Fischer. Universal adversarial perturbations against semantic image segmentation. stat, 1050:19, 2017.
 [13] Minhao Cheng, Jinfeng Yi, Huan Zhang, PinYu Chen, and ChoJui Hsieh. Seq2sick: Evaluating the robustness of sequencetosequence models with adversarial examples. arXiv preprint arXiv:1803.01128, 2018.
 [14] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speechtotext. arXiv preprint arXiv:1801.01944, 2018.
 [15] Mengying Sun, Fengyi Tang, Jinfeng Yi, Fei Wang, and Jiayu Zhou. Identify susceptible locations in medical records via adversarial attacks on deep predictive models. arXiv preprint arXiv:1802.04822, 2018.
 [16] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and blackbox attacks. arXiv preprint arXiv:1611.02770, 2016.
 [17] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.

[18]
PinYu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and ChoJui Hsieh.
Zoo: Zeroth order optimization based blackbox attacks to deep neural
networks without training substitute models.
In
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
, pages 15–26. ACM, 2017.  [19] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
 [20] Jernej Kos and Dawn Song. Delving into adversarial attacks on deep policies. arXiv preprint arXiv:1705.06452, 2017.
 [21] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016.

[22]
SeyedMohsen MoosaviDezfooli, Alhussein Fawzi, and Pascal Frossard.
Deepfool: a simple and accurate method to fool deep neural networks.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 2574–2582, 2016.  [23] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
 [24] Alberto Bietti, Grégoire Mialon, and Julien Mairal. On regularization and robustness of deep neural networks. arXiv preprint arXiv:1810.00363, 2018.
 [25] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958, 2014.
 [26] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456, 2015.
 [27] Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefanet: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
 [28] Matthieu Courbariaux, Yoshua Bengio, and JeanPierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems, pages 3123–3131, 2015.
 [29] Xuanqing Liu, Minhao Cheng, Huan Zhang, and ChoJui Hsieh. Towards robust neural networks via random selfensemble. arXiv preprint arXiv:1712.00673, 2017.
 [30] M Lecuyer, V Atlidakis, R Geambasu, D Hsu, and S Jana. Certified robustness to adversarial examples with differential privacy. ArXiv eprints, 2018.
 [31] Yuichi Yoshida and Takeru Miyato. Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941, 2017.
 [32] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
 [33] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.
 [34] Colin Raffel Ian Goodfellow Jacob Buckman, Aurko Roy. Thermometer encoding: One hot way to resist adversarial examples. International Conference on Learning Representations, 2018. accepted as poster.
 [35] Pouya Samangouei, Maya Kabkab, and Rama Chellappa. DefenseGAN: Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018.
 [36] Gokula Krishnan Santhanam and Paulina Grnarova. Defending against adversarial attacks by leveraging an entire gan. CoRR, abs/1805.10652, 2018.
 [37] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.
 [38] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. On the connection between differential privacy and adversarial robustness in machine learning. arXiv preprint arXiv:1802.03471, 2018.
 [39] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
 [40] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
 [41] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [42] Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. Regularization of neural networks using dropconnect. In International Conference on Machine Learning, pages 1058–1066, 2013.
 [43] Anonymous. L2nonexpansive neural networks. In Submitted to International Conference on Learning Representations, 2019. under review.
 [44] Anish Athalye and Nicholas Carlini. On the robustness of the CVPR 2018 whitebox adversarial example defenses. CoRR, abs/1804.03286, 2018.
Comments
There are no comments yet.