Deep Neural Networks (DNNs) have achieved great success in a variety of applications, including but not limited to image classification , speech recognition , machine translation , and autonomous driving . Despite the remarkable accuracy imrovement , recent studies [6, 7, 8] have shown that DNNs are vulnerable to adversarial examples. In image classification task, an adversarial example is a natural image intentionally perturbed by visually imperceptible variation, but can cause drastic classification accuracy degradation. Fig. 1 provides an illustration of adversarial example and its original counterpart. In addition to image classification, attacks to other DNN-powered tasks have also been actively investigated, such as visual question answering [9, 10], image captioning , semantic segmentation [12, 10] and etc [13, 14, 15].
There has been a cohort of works on generating adversarial attacks and developing corresponding defense methods. The adversarial attacks can be categorized as white-box attack and black-box attack based on the attacker’s knowledge to the target model. For white-box attack [6, 8], the adversary has full access to the network architecture and parameters. Whereas, only the input and output to the network can be externally accessed by the black-box attacks [16, 17, 18]. White-box attack can often achieve high success rates for various applications [6, 19, 20, 21, 22, 23, 18, 8].
Recently, different works  have viewed the problem of adversarial examples from an unified perspective of model robustness and regularization. Conventional regularization mainly serves the purpose of reducing the generation error, thus preventing model from overfiting the training set. Traditional regularization methods have been effective in neural network training. For example, dropout 
, Batch Normalization (BN) and quantization [27, 28] all serve the purpose of model regularization. However, BN is specifically effective in convolution networks and dropout is applicable for fully connected network. Hinton discusses that adding Gaussian noise into the model (input, weight and activation) during training performs as a regularizer in his lecture note111https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec9.pdf and dropout work .
It is evident that a more general model regularization method specifically directed for improving neural network robustness can serve the purpose of defending adversarial example more effectively. Recently, different works have implemented noise regularization method both during training and inference phases [29, 30, 31, 24]. In this work, we propose to improve neural network robustness against adversarial attack by regularizing the training process through adding noise during the training phase. We believe adversarial defense methodologies that focus on defending the network at the inference only will eventually fall short in the advent of new attack methods. Thus a more general regularized training method can generate robust DNN to defend against a wide range of attacks.
Overview of our approach: In this work, we propose a novel noise injection method called Parametric Noise Injection (PNI) to improve neural network robustness against adversarial attack. It has the flexibility to inject trainable noise at the input (to whole network), activation and weights during both training and inference. The proposed PNI is embedded with well-known adversarial training, where Gaussian Noise with trainable parameters could adjust injected noise level at each neural network layer. In this work, we conduct a wide range of white-box and black-box adversarial attack experiments to demonstrate the effectiveness of our proposed PNI method accross different popular DNN architectures. Our simulation shows accuracy improvement for both clean data accuracy and under attack accuracy. PNI achieves a 1.1 % improvement on the clean test data on Resnet-20 compared to Vanilla Resnet-20 with adversarial training. Along with the improvement on clean test data, our defense shows 6.8% improvement on the test accuracy under PGD wite box attack. Additionally, Our result shows improved robustness under FGSM, C & W attack and various black-box attack.
2 Related works
2.1 Adversarial Attack
Recently, various powerful adversarial attack methods have been proposed to totally fool a trained deep neural network through introducing barely visible perturbation upon input data. Several state-of-the-art white-box (i.e., PGD , FGSM  and C&W ) and black-box (i.e., Substitute  and ZOO ) adversarial attack method are briefly introduced as follows.
Fast Gradient Sign Method (FGSM)  is a single-step efficient adversarial attack method, which alters each element of nature sample
along the direction of its gradient w.r.t the loss function. The generation of adversarial example can be described as:
where the attack is followed by a clipping operation to ensure the . The attack strength is determined by the perturbation constraint .
Projected Gradient Descent (PGD)  is the multi-step variant of FGSM, which is one of the strongest adversarial example generation algorithm. With as the initialization, the iterative update of perturbed data can be expressed as:
where is the projection space which is bounded by , is the step index up to , and is the step size. Madry et al.  proposed that PGD is a universal adversary among all the first-order adversaries (i.e., attacks only rely on first-order information).
C & W Attack:
In C & W attack method, Carlini and Wagner  consider the generation of adversarial example as an optimization problem, which optimize the -norm of distance metric w.r.t the given input data , which can be described as:
where is taken as the perturbation added upon the input data, and a proper loss function is chosen in  to to solve the optimization problem via gradient descent method. is a constant set by attacker. In this work, we use -norm based C&W attack and take
as the evaluation metric to measure the network’s robustness, where a higher value ofindicates a more robust network or potential failure of the attack.
The most popular black-box attack is conducted using a substitute model , where the attacker trains a substitute model to mimic the functionality of target model, then use the adversarial example generated from the substitute model to attack target model. In this work, we specifically investigate the transferable adversarial attack , which is a variant of substitute model attack. In transferable adversarial attack, the adversarial example is generated from one source model to attack another target model. The source model and target can own the absolutely different structure but trained on the identical dataset. Moreover, Zero-th Order Optimization (ZOO) attack  is also considered. Rather than training a substitute model, it directly approximates the gradient of target model just based on the input data and output scores using stochastic gradient coordinate.
2.2 Adversarial Defenses:
Improving network robustness by training the model with adversarial examples [6, 32] is the most popular defense approach now-a-days. Most of later works have followed this path to supplement their defense with adversarial training [34, 35]. The first step in adversarial training is to choose an attack model to generate adversarial examples. Adopting Projected Gradient Descent (PGD) based attack model to adversarial training is becoming popular since it can generate universal adversarial examples among the first order approaches . Additionally, among many recent defense methods, only PGD based adversarial training can sustain state-of-the-art accuracy under attacks [8, 6, 23]. The reported DNN accuracy in CIFAR10 dataset remains a major success to defend very strong adversarial attacks .
Recent works have merged the concept of improving model robustness through regularization to defend adversarial examples. Among them, an unified perspective of regularization and robustness was presented by . Again, randomly pruning some activation during the inference  or randomizing the input layer  serve the purpose of injecting randomness to somehow prevent the attacker from accessing the gradient. However, these approaches achieve good success against gradient based attacks at the cost of obfuscated gradient .
In order to make the model more robust to adversarial attack, several works have adopted the concept of adding a noise layer just before convolution layer during both training and inference phases [29, 38]
. Even though we agree with the core idea of these works as they certainly makes the model more robust, but there are some fundamental advantages of our work compared to theirs. PNI improves the model robustness by regularizing the model while training more effectively. As classical machine learning demonstrated weight noise performs the regularization even better. We also show experimentally that particularly adding noise to the weights improves the robustness even more. While these works [29, 30] have chosen level of noise to be injected manually, we propose to inject different level of noise at different layers using trainable parameters. As choosing the level of noise manually for different layers even by validation set is not practically feasible.
In this section, we first introduce the proposed Parametric Noise Injection (PNI) function and will investigate the impact of noise injection on input (to the whole DNN), weight and activation.
3.1 Parametric Noise Injection
The method that we propose to inject noise to different components or locations within DNN can be described as:
is the element of noise-free tensor, and such can be input/weight/inter-layer tensor in this work. , and is the coefficient scales the magnitude of injected noise . We adopt the scheme that shares the identical standard deviation of as in Eq. 6, thus the injected additive noise is correlated to the distribution of and simultaneously. Moreover, rather than manually configuring to restrict the noise level, we set as learnable parameter which can be optimized for network robustness improvement. We name such method as Parametric Noise Injection (PNI). Considering the over-parameterization and the convergence of training , we make the element-wise noise term () shares the same scaling coefficient across the entire tensor. Assuming we performs the proposed PNI on the weight tensors of convolution/fully-connected layers throughout entire DNN, for each parametric layer there is only one layer-wise noise scaling coefficient to be optimized. We takes such layer-wise configuration as default in this work.
In this work, we treat the noise scaling coefficient as a model parameter which can be optimized through back-propagation training process. For configuration which shares the noise scaling coefficient layer-wise, the gradient computation can be described as:
where the takes the summation over the entire tensor , and is the gradient back-propagated from the followed layers. The gradient calculation of the PNI function is:
It is noteworthy that even though
is a Gaussian random variable, each sample ofis taken as a constant during the back-propagation. Using the gradient descent optimizer with momentum, the optimization of at step can be written as:
where is the momentum, is the learning rate, and is the updating velocity. Moreover, since weight decay tends to make the learned noise scaling coefficient converge to zero, there is no weight decay term on the during the parameter updating in this work. We set as default initialization.
We expect to utilize the aforementioned PNI technique to improve the network robustness. However, directly optimizing the noise scaling coefficient normally leads to converge at a small close-to-zero value, owing to the model optimization tends to over-fit the training dataset (referring to Table 1).
In order to succeed in adversarial defense, we jointly use the PNI method with robust optimization (a.k.a. Adversarial Training) which can boost the inference accuracy for the perturbed data under attack. Given inputs- and target labels- , the adversarial training is to obtain the optimal solution of network parameter for the following min-max problem:
where the inner maximization tends to acquire the perturbed data , and is the input data perturb set constrained by . While the outer minimization is optimized through gradient descent method as regular network training. PGD attack  is adopted as the default inner maximization solver (i.e., generating ). Note that, in order to prevent the label leaking during adversarial training, the perturbed data is generated through taking the predicted result of as the label (i.e. in Eq. 2).
Moreover, in order to balance the clean data accuracy and perturbed data accuracy for practical application, rather than performing the outer minimization solely on the loss of perturbed data as in Eq. 10, we minimize the ensemble loss which is the weighted sum of losses for clean- and perturbed-data. The ensemble loss is described as:
where and are the weights for clean data loss and adversarial data loss. is the default configuration in this work. Optimizing the ensemble loss with gradient decent method leads to successful training of for both the model’s inherent parameter (e.g. weight, bias) and the add-on noise scaling coefficient from PNI.
4.1 Experiment setup
Datasets and network architectures.
The CIFAR-10  dataset is composed of 50K training samples and 10K test samples of 3232 color image. For CIFAR-10, the classical Residual Networks  (ResNet-20/32/44/56) architecture are used, and ResNet-20 is taken as the baseline for most of the comparative experiments and ablation studies. A redundant network ResNet-18 is also used to report the performance for CIFAR-10, since large network capacity is helpful for adversarial defense. Moreover, rather than including the input normalization within the data augmentation, we place a non-trainable data normalization layer in front of the DNN to perform the identical function, thus attacker can directly add the perturbation on the nature image. Note that, since both PNI and PGD attack  include randomness, we report the accuracy in the format of meanstd% with 5 trials to alleviate error.
To evaluate the performance of our proposed PNI technique, we employ multiple powerful white-box and black-box attacks as introduced in Section 2.1. For PGD attack on MNIST and CIFAR-10, is set to 0.3/1 and 8/255, and is set to 40 and 7 respectively. FGSM attack adopt the same setup as PGD. The attack configurations of PGD and FGSM are identical as the setup in [34, 32]. For C&W attack, we set the constant as 0.01. ADAM  is used to optimize the Eq. 4 with learning rate as . We choose 0 for the confidence coefficient , which is defined in used by C&W attack in . The binary search steps for the attack is 9, while number of iteration to perform the gradient descent is 10. Moreover, We also conduct the PNI defense against several state-of-the-art black-box attacks (i.e. substitute , ZOO  and transferable  attack) in a Section 4.2.2 to examine the robustness improvement resulted from the proposed PNI technique.
Competing methods for adversarial defense.
As far as we know, the adversarial training with PGD  is the only unbroken defense method , which is labeled as vanilla adversarial training and taken as the baseline in this work. Beyond that, several recent works also utilize similar concept as ours in their defense method are discussed as well, including certified robustness  and random self-ensemble .
4.2 PNI for adversarial attacks
4.2.1 PNI against white-box attacks
are shown. The learning rate of SGD optimizer is reduced at 80 and 120 epoch.
Optimization method of PNI
As the aforementioned discussion in Section 3.1, the noise scaling coefficient will not be properly trained without utilizing the adversarial training (i.e., solving the min-max problem). We conduct the experiments for training the layer-wise PNI on weight (PNI-W) of ResNet-20, to compare the convergence of trained noise. As tabulated in Table 1, simply performing the vanilla training using momentum SGD optimizer totally fails the adversarial defense, where the noise scaling coefficients are converged to the negligible values. On the contrary, with the aid of adversary training (i.e., optimization of Eq. 11), convolution layers in the network’s front-end has obtained relatively large which are the bold values in Table 1, and the corresponding evolution curve are shown in Fig. 2.
where the difference between Eq. 2 and Eq. 12 is with/without PNI within generation. It is noteworthy that, keeping the noise term in the model for both adversarial example generation (Eq. 12) and model parameter update is the critical factor for the PNI optimization with adversarial training, since the optimization of is also a min-max game. Increasing noise level enhances the defense strength, but hampers the network inference accuracy for natural clean image. Lowering , however, makes the network vulnerable to adversarial attack. As listed in Table 1, without PNI-W in generation indeed leads to the failure of PNI optimization, and the large value ( in Table 1
) is not converged due to the probable gradient explosion.
|Test with PNI||Test without PNI|
|Vanilla adv. train ||-||-||-||83.84||39.140.05||46.55|
|No defense||Vanilla adv. train||
Effect of PNI on weight, activation and input.
In this work, even though the scheme of injecting noise on the weight (PNI-W) is taken as the default PNI setup, more results about PNI on activation (PNI-A-a/b), input (PNI-I) and hybrid-mode (e.g. PNI-W+A) are provided in Table 2 for a comprehensive study. PNI-A-a/PNI-A-b denotes injecting noise on the output/input tensor of the convolution/fully-connected layer respectively. Moreover, PNI-A-b scheme intrinsically includes the PNI-I, since PNI-I is applying the noise on the input tensor of first layer. Note that, all models with PNI variants are jointly trained with PGD-based adversarial training  as discussed above. Then, with the same trained model, we report the accuracy with/without the trained noise term (left/right in Table 2) during the test phase. As shown in Table 2, with the noise term enabled during test phase, PNI-W on ResNet-20 gives the best performance to defend PGD and FGSM attack, in comparison to PNI on other locations. Although it is elusive to fully understand the mechanism that PNI-W outperforms other counterparts, the intuition is that PNI-W is the generalization of PNI-A in each connection instead of each output unit, similar as relation between the regularization technique DropConnect  and Dropout .
Furthermore, we also observe that disabling PNI during test phase leads to significant accuracy drop for defending PGD and FGSM attack, while the clean-data accuracy maintains the same level as PNI enabled. Such observation raises two concerns about our PNI techniques: 1) Does the improvement of clean-/perturbed-data accuracy with PNI mainly comes from the attack strength reduction caused by the randomness (potential gradient obfuscation )? 2) Is PNI just an negligible trick or it performs the model regularization to construct a more robust model? Our answers to both questions are negative, where the explanations are elaborated under Section 5.
Effect of network capacity.
In order to investigate the relation between network capacity (i.e., number of trainable parameters) and robustness improvement by PNI, we examine various network architectures in terms of both depth and width. For different network depths, experiments on ResNet 20/32/44/56  are conducted under vanilla adversarial training  and our proposed PNI robust optimization method. For different network widths, we adopt the original ResNet-20 as baseline and expand its input&output channel of each layer by 1.5/2/4 respectively. Same as Table 2, we report clean- and perturbed-data accuracy with/without PNI term during the test phase. The results in Table 3 indicates that increasing the model’s capacity indeed improves network robustness against white-box adversarial attacks, and our proposed PNI outperforms vanilla adversary training in terms of both clean-data accuracy and perturbed data accuracy for PGD and FGSM attack. Such observation demonstrates that the perturbed-data accuracy improvement does not come from trading off clean-data accuracy as reported in [34, 43]. Through increasing the network capacity, the drop perturbed-data accuracy, when disabling the PNI noise term during test phase, also becomes less significant. Although both adversarial training and PNI techniques perform regularization, the network structure still needs careful construction to prevent the over-fitting resulted from over-parameterization.
Robustness evaluation with C&W attack.
Improved robustness does not necessarily mean improving the test data accuracy against any particular attack method. Typically norm based C & W attack  should reach 100 % success rate against any defense. Thus average norm required to fool the network gives more insight about a network’s robustness in general . The result presented in Table 4 represents the overall performance of our model against C & W attack. Our method of training the noise parameter becomes more effective for more redundant network. We demonstrate this phenomena by performing comparison study between Resnet-20 and Resnet-18 architecture. Clearly Resnet-18 shows the improvement in robustness from Vanilla adv. training much more than Resnet-20 against C & W attack.
|Model||capacity||No defense||Vanilla adv. train||PNI-W|
4.2.2 PNI against black-box attack
In this section, we test our proposed PNI technique against transferable adversarial attack  and ZOO attack. Following the transferable adversarial attack , two trained neural network are taken as the source model () and target model (). The adversarial examples is generated from the source model then attack the target model using , which is denoted as . We take ResNet-18 on CIFAR-10 as an example. We train two ResNet-18 model (model-A and B) on CIFAR-10 dataset to attack each other, where model-A is optimized through vanilla adversarial training, while model-B is trained using our proposed PNI variants (i.e., PNI-W/A-a/W+A-a) robust optimization method. Table 5 shows almost equal perturbed-data accuracy for A B and B A under various PNI scenarios, which indicates that our PNI technique does not reduce the attack strength.
|Transferable attack||ZOO attack|
|Train. scheme of B||A B||B A||success rate|
For ZOO attack, we test our defense on 200 randomly selected test samples for un-targeted attack. The Attack success rate denotes the percentage of test sample change their classification to a wrong class after attack. ZOO attack success rate for vanilla Resnet-18 with adversarial training is close to 80 %. The robustness of PNI is more evident from Table 5 as the attack success rate drops significantly for PNI-W+A-a and PNI-W. However, PNI-A-a fails to resist ZOO attack even though it still maintains a lower success rate than baseline. The failure of PNI-A-a shows that just adding noise in-front of the activation does not necessarily achieves the desired robustness as claimed by some of the previous defenses [30, 29].
4.2.3 Comparison to competing methods
As discussed in Section 2.2, a large number of adversarial defense works have been proposed recently, however most of them are already broken by stronger attacks proposed in [44, 23]. As a result, in this work we choose to compare with the most effective one till date - PGD based adversarial training . Additionally, we compare with other randomness-based works [29, 30] in Table 6 for examining the effectiveness of PNI.
|PGD adv. train ||ResNet-20 (4)||87||46.1|
|PNI-W (this work)||ResNet-20 (4)||87.7||49.1|
Previous defense works [43, 34] have shown a trade-off between clean-data accuracy and perturbed-data accuracy, where the perturbed-data accuracy improvement normally at the cost of lowering the clean-data accuracy. It is worthy to highlight that our proposed PNI improves both clean- and perturbed data accuracy under white-box attack, in comparison to PGD-based adversarial training . Differential Privacy (DP)  is a similar method of utilizing noise injection at various locations within the network. Although their defense guarantees a certified defense it does not perform well against -norm based attack (e.g., PGD and FGSM). In order to achieve a higher level of certified defense, DP significantly sacrifices the clean-data accuracy as well. Another randomness-based approach is Random Self-ensemble (RSE) , which inserts noise-layer before all the convolution layer. Even though their defense performs well against C & W attack but poor against strong PGD attack. In our black-box attack simulation Table 5 we demonstrate that adding activation noise may not be as effective as weight noise. Beyond that, both DP and RSE manually configure the noise level which is extremely difficult to find the optimal setup. Whereas, in our proposed PNI method, the noise level is determined by a trainable layer-wise noise scaling coefficient and distribution of noise injected location.
The defense performance improvement led by our proposed PNI does not come from the stochastic gradients. The stochastic gradient is considered to incorrectly approximate the true gradient based on a single sample. We try to show that PNI is not relying on the gradient obfuscation from two perspectives: 1) Our proposed PNI method passes each inspection item proposed by  to identify gradient obfuscation. 2) Under PGD attack, through increasing the attack steps, our PNI robust optimization method still outperforms vanilla adversarial training (certified as non-obfuscated gradients in ).
|Characteristics to identify gradient obfuscation||Pass||Fail|
|1. One-step attack performs better than iterative attacks||✓|
|2. Black-box attacks are better than white-box attacks||✓|
|3. Unbounded attacks do not reach 100% success||✓|
|4. Random sampling finds adversarial examples||✓|
|5. Increasing distortion bound doesn’t increase success||✓|
Inspections of gradient obfuscation.
The famous gradient obfuscation work  enumerates several characteristic behaviors as listed in Table 7 which can be observed when the defense method owns gradient obfuscation. Our experiments show that PNI passes each inspection item in Table 7.
For item.1, all the experiments in Table 2 and Table 3 report that FGSM attack (one-step) performs worse than PGD attack (iterative). For item.2, our black-box attack experiment in Table 5 shows that the black-box attack strength is worse than white-box attack. For items.3, as plotted in Fig. 3, we run experiments through increasing the distortion bound-. The result shows that the unbounded attacks do lead to 0% accuracy under attack. For item.4, the prerequisite is the gradient-based attack (e.g., PGD and FGSM) cannot find the adversarial examples, however the experiments in Fig. 3 reveals that our method still can be broken when increasing the distortion bound. It just increases the resistance against the adversarial attacks, in comparison to the vanilla adversarial training. For item.5, again as shown in Fig. 3, increasing the distortion bound increase the attack success rate.
PNI does not rely on stochastic gradients.
As shown in Fig. 3, gradually increasing the PGD attack steps raises the attack strength , thus leading to perturbed-data accuracy degradation for both vanilla adversary training and our PNI technique. However, for both cases the perturbed-data accuracy start saturating and do not degrade any further when . If our PNI’s success comes from the stochastic gradient which gives incorrect gradient owing to the single sample, increasing the attack steps suppose to eventually break the PNI defense which is not observed here. Our PNI method still outperforms vanilla adversarial training even when is increased up to 100. Therefore, we can draw the conclusion that, even if PNI does include gradient obfuscation, the stochastic gradient is not the dominant role in PNI for the robustness improvement.
In this paper, we present a parametric noise injection technique where the noise intensity can be trained through solving the min-max optimization problem during adversarial training. Through extensive experiments, the proposed PNI method can outperforms the state-of-the-art defense method in terms of both clean-data accuracy and perturbed-data accuracy.
-  Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
-  Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012.
-  Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
-  Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving: Learning affordance for direct perception in autonomous driving. In Computer Vision (ICCV), 2015 IEEE International Conference on, pages 2722–2730. IEEE, 2015.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
-  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
-  Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
-  Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 39–57. IEEE, 2017.
-  Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darell, and Dawn Song. Can you fool ai with adversarial examples on a visual turing test? arXiv preprint arXiv:1709.08693, 2017.
-  Naveed Akhtar and Ajmal Mian. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 6:14410–14430, 2018.
-  Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, and Cho-Jui Hsieh. Show-and-fool: Crafting adversarial examples for neural image captioning. arXiv preprint arXiv:1712.02051, 2017.
-  Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, and Volker Fischer. Universal adversarial perturbations against semantic image segmentation. stat, 1050:19, 2017.
-  Minhao Cheng, Jinfeng Yi, Huan Zhang, Pin-Yu Chen, and Cho-Jui Hsieh. Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples. arXiv preprint arXiv:1803.01128, 2018.
-  Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944, 2018.
-  Mengying Sun, Fengyi Tang, Jinfeng Yi, Fei Wang, and Jiayu Zhou. Identify susceptible locations in medical records via adversarial attacks on deep predictive models. arXiv preprint arXiv:1802.04822, 2018.
-  Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016.
-  Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh.
Zoo: Zeroth order optimization based black-box attacks to deep neural
networks without training substitute models.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26. ACM, 2017.
-  Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
-  Jernej Kos and Dawn Song. Delving into adversarial attacks on deep policies. arXiv preprint arXiv:1705.06452, 2017.
-  Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016.
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard.
Deepfool: a simple and accurate method to fool deep neural networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016.
-  Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
-  Alberto Bietti, Grégoire Mialon, and Julien Mairal. On regularization and robustness of deep neural networks. arXiv preprint arXiv:1810.00363, 2018.
-  Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958, 2014.
-  Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456, 2015.
-  Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
-  Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems, pages 3123–3131, 2015.
-  Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks via random self-ensemble. arXiv preprint arXiv:1712.00673, 2017.
-  M Lecuyer, V Atlidakis, R Geambasu, D Hsu, and S Jana. Certified robustness to adversarial examples with differential privacy. ArXiv e-prints, 2018.
-  Yuichi Yoshida and Takeru Miyato. Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941, 2017.
-  Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
-  Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.
-  Colin Raffel Ian Goodfellow Jacob Buckman, Aurko Roy. Thermometer encoding: One hot way to resist adversarial examples. International Conference on Learning Representations, 2018. accepted as poster.
-  Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-GAN: Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018.
-  Gokula Krishnan Santhanam and Paulina Grnarova. Defending against adversarial attacks by leveraging an entire gan. CoRR, abs/1805.10652, 2018.
-  Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.
-  Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. On the connection between differential privacy and adversarial robustness in machine learning. arXiv preprint arXiv:1802.03471, 2018.
-  Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
-  Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
-  Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. Regularization of neural networks using dropconnect. In International Conference on Machine Learning, pages 1058–1066, 2013.
-  Anonymous. L2-nonexpansive neural networks. In Submitted to International Conference on Learning Representations, 2019. under review.
-  Anish Athalye and Nicholas Carlini. On the robustness of the CVPR 2018 white-box adversarial example defenses. CoRR, abs/1804.03286, 2018.