1 Introduction
Deep neural networks (DNNs) are vulnerable to adversarial examples, in which small and often imperceptible perturbations change the class label of an image
[33, 9, 23, 24]. Because of the security concerns this raises, there is increasing interest in studying these attacks themselves, and also designing mechanisms to defend against them.Adversarial examples were originally formed by selecting a single base image, and then sneaking that base image into a different class using a small perturbation [9, 7, 16]. This is done most effectively using (potentially expensive) iterative optimization procedures [8, 16, 2].
Different from perinstance perturbation attacks, MoosaviDezfooli et al. [19, 20] show there exists “universal” perturbations that can be added to any image to change its class label (fig. 1). Universal perturbations empower attackers who cannot generate perinstance adversarial examples on the go, or who want to change the identity of an object to be selected later in the field. What’s worse, universal perturbations have good crossmodel transferability, which facilitates blackbox attacks.
Among various methods for hardening networks to perinstance attacks, adversarial training [16] is known to dramatically increase robustness [2]. In this process, adversarial examples are produced for each minibatch during training, and injected into the training data. While effective at increasing robustness, the high cost of this process precludes its use on large and complex datasets. This cost comes from the adversarial example generation process, which frequently requires 530 iterations to produce an example. Unfortunately, adversarial training using cheap, noniterative methods generally does not result in robustness against stronger iterative adversaries [16].
Contributions
This paper studies effective methods for producing and deflecting universal adversarial attacks. First, we pose the creation of universal perturbations as an optimization problem that can be effectively solved by stochastic gradient methods. This method dramatically reduces the time needed to produce attacks as compared to [19]. The efficiency of this formulation empowers us to consider universal adversarial training. We formulate the adversarial training problem as a minmax optimization where the minimization is over the network parameters and the maximization is over the universal perturbation. This problem can be solved quickly using alternating stochastic gradient methods with no inner loops, making it far more efficient than perinstance adversarial training with a strong adversary.
Interestingly, universal adversarial training has a number of unexpected and useful side effects. While our models are trained to resist universal perturbations, they are also quite resistant to blackbox perinstance attacks – achieving resistance comparable to 7step PGD adversarial training at a fraction of the cost. Furthermore, perinstance adversarial examples built for attacking our universally hardened model transfer to other (blackbox) natural and robust models very well.
2 Related work
We briefly review perinstance perturbation attack techniques that are closely related to our paper and will be used in our experiments. The Fast Gradient Sign Method (FGSM) [9] is one of the most popular onestep gradientbased approaches for
bounded attacks. FGSM applies one step of gradient ascent in the direction of the sign of the gradient of the loss function with respect to the input image. When a model is adversarially trained, the gradient of the loss function may be very small near unmodified images. In this case, the RFGSM method remains effective by first using a random perturbation to step off the image manifold, and then applying FGSM
[34]. Projected Gradient Descent (PGD) [15, 16] iteratively applies FGSM multiple times, and is one of the strongest perinstance attacks [16, 2]. The PGD version we use in this paper is adopted from [16] and applies an initial random perturbation before multiple steps of gradient ascent. Finally, DeepFool [21] is an iterative method based on a linear approximation of the training loss objective. This method formed the backbone of the original method for producing universal adversarial examples.Adversarial training, in which adversarial attacks are injected into the dataset during training, is an effective method to learn a robust model resistant to attacks [16, 2, 11, 29, 31]. Robust models adversarially trained with FGSM can resist FGSM attacks [15], but can be vulnerable to PGD attacks [16]. Madry et al. [16] suggest strong attacks are important, and they use the iterative PGD method in the inner loop for generating adversarial examples when optimizing the minmax problem. PGD adversarial training is effective but timeconsuming. The cost of the inner PGD loop is high, although this can sometimes be replaced with neural models for attack generation [3, 26, 36]. These robust models are adversarially trained to fend off perinstance perturbations and have not been designed for, or tested against, universal perturbations.
Unlike perinstance perturbations, universal perturbations can be directly added to any test image to fool the classifier. In [19], universal perturbations for image classification are generated by iteratively optimizing the perinstance adversarial loss for training samples using DeepFool [21]. In addition to classification tasks, universal perturbations are also shown to exist for semantic segmentation [18]. Robust universal adversarial examples are generated as a universal targeted adversarial patch in [5]. They are targeted since they cause misclassification of the images to a given target class. MoosaviDezfooli et al. [20] prove the existence of small universal perturbations under certain curvature conditions of decision boundaries. Dataindependent universal perturbations are also shown to exist and can be generated by maximizing spurious activations at each layer. These universal perturbations are slightly weaker than the data dependent approaches [22].
There has been very little work on defending against universal attacks. To the best of our knowledge, the only dedicated study is by Akhtar et al., who propose a perturbation rectifying network that preprocesses input images to remove the universal perturbation [1]. The rectifying network is trained on universal perturbations that are built for the downstream classifier. While other methods of data sanitization exist [28, 17] , it has been shown (at least for perinstance adversarial examples) that this type of defense is easily subverted by an attacker who is aware that a defense network is being used [6].
A recent preprint [25] models the problem of defending against universal perturbations as a twoplayer minmax game. However, unlike us, and similar to perinstance adversarial training, after each gradient descent iteration for updating the DNN parameters, they generate a universal adversarial example in an iterative fashion. Since the generation of universal adversarial perturbations is very timeconsuming [1], this makes their approach very slow in practice and prevents them from training the neural network parameters for many iterations.
3 Optimization for universal perturbation
Given a set of training samples and a network with frozen parameter that maps images onto labels, MoosaviDezfooli et al. [19] propose to find universal perturbations that satisfy,
(1) 
represents the “fooling ratio,” which is the fraction of images whose perturbed class label differs from the original label The parameter controls the diameter of the bounded perturbation, and
is a small tolerance hyperparameter. Problem (
1) is solved by the iterative method in algorithm 1 [19]. This iterative solver relies on an inner loop to apply DeepFool [21] to each training instance, which makes the solver slow. Moreover, the outer loop of algorithm 1 is not guaranteed to converge.Different from [19], we consider the following optimization problem for building universal perturbations,
(2) 
where represents the loss used for training DNNs. This simple formulation (2) searches for a universal perturbation that maximizes the training loss, and thus forces images into the wrong class.
The naive formulation (2) suffers from a potentially significant drawback; the crossentropy loss is unbounded from above, and can be arbitrarily large when evaluated on a single image. In the worstcase, a perturbation that causes misclassification of just a single image can maximize (2) by forcing the average loss to infinity. To force the optimizer to find a perturbation that fools many instances, we propose a “clipped” version of the cross entropy loss,
(3) 
We cap the loss function at to prevent any single image from dominating the objective in (2), and giving us a better surrogate of misclassification accuracy. In section 5.2, we investigate the effect of clipping with different .
We directly solve eq. 2 by a stochastic gradient method described in algorithm 2. Each iteration begins by using gradient ascent to update the universal perturbation to maximize the loss. Then, is projected onto the
norm ball to prevent it from growing too large. We experiment with various optimizers for this ascent step, including Stochastic Gradient Descent (SGD), Momentum SGD (MSGD), Projected Gradient Descent (PGD), and ADAM
[12].We test this method by attacking a naturally trained WideResnet CIFAR10 model from [16]. Stochastic gradient methods that use “normalized” gradients (ADAM and PGD) are less sensitive to learning rate and converge faster, as shown in fig. 2. We visualize the generated universal perturbation from different optimizers in fig. 3. Compared to the noisy perturbation generated by SGD, normalized gradient methods produced stronger attacks with more welldefined geometric structures and checkerboard patterns. The final evaluation accuracies (on testexamples) after adding universal perturbations with were 42.56% for the SGD perturbation, 13.08% for MSGD, 13.30% for ADAM, and 13.79% for PGD. The clean test accuracy of WideResnet is 95.2%.
The proposed method of universal attack using a clipped loss function has several advantages. It is based on a standard stochastic gradient method that comes with convergence guarantees when a decreasing learning rate is used [4]. Also, each iteration is based on a minibatch of samples instead of one instance, which accelerates computation on a GPU. Finally, each iteration requires a simple gradient update instead of the complex DeepFool inner loop; we empirically verify fast convergence and good performance of the proposed method (see section 5).
4 Universal adversarial training
We now consider training robust classifiers that are resistant to universal perturbations. In particular, we consider universal adversarial training, and formulate this problem as a minmax optimization problem,
(4) 
where represents the neural network weights, represents training samples, represents universal perturbation noise, and is the loss function. We solve eq. 4 by alternating stochastic gradient methods in algorithm 3. Each iteration alternatively updates the neural network weights using gradient descent, and then updates the universal perturbation using ascent.
We compare our formulation (4) and algorithm 3 with PGDbased adversarial training in [16], which trains a robust model by optimizing the following minmax problem,
(5) 
The standard formulation (5) searches for perinstance perturbed images , while our formulation in (4) maximizes using a universal perturbation . Madry et al. [16] solve (5) by a stochastic method. In each iteration, an adversarial example is generated for an input instance by the PGD iterative method, and the neural network parameter is updated once [16]. Our formulation (algorithm 3) only maintains one single perturbation that is used and refined in all iterations. For this reason, we need only update and once per step (i.e., there is no expensive inner loop), and these updates accumulate for both and through training.
In fig. 4, we present training curves for the universal adversarial training process on the WideResnet model from [16] using the CIFAR10 dataset. We consider different rules for updating during universal adversarial training,
FGSM  (6)  
SGD  (7) 
and ADAM [12]. We found that the FGSM update rule was most effective when combined with the SGD optimizer for updating neural network weights .
One way to assess the update rule is to plot the model accuracy before and after the ascent step (i.e., the perturbation update). It is wellknown that adversarial training is more effective when stronger attacks are used. In the extreme case of a donothing adversary, the adversarial training method degenerates to natural training. In fig. 5, we see a gap between the accuracy curves plotted before and after gradient ascent. We find that the FGSM update rule leads to a larger gap, indicating a stronger adversary. Correspondingly, we find that the FGSM update rule yields networks that are more robust to attacks as compared to SGD update (see fig. 5). Interestingly, although universal training using an FGSM update and ADAM update are both robust to universal perturbation attacks, a robust model with FGSM update is more robust to perinstance attacks than that which uses the ADAM update rule. The accuracy of a universally hardened network against a whitebox perinstance PGD attack is 17.21% for FGSM universal training, and only 2.57% for ADAM universal training.
4.1 Attacking hardened models
We evaluate the robustness of different models by applying algorithm 2 to try to find universal perturbations. We attack universally adversarial trained models (produced by eq. 4) using the FGSM universal update rule (uFGSM), or the SGD universal update rule (uSGD). We also consider a robust model from perinstance adversarial training (eq. 5) with adversarial steps of the FGSM and PGD type [16].
The training curves for the robust WideResnet models on CIFAR10 are plotted in fig. 5. Robust models adversarially trained with weaker attackers such as uSGD and FGSM are relatively vulnerable to universal perturbations, while robust models from PGD [16] and uFGSM can resist universal perturbations. We apply PGD (using the sign of the gradient) and ADAM in algorithm 3 to generate universal perturbations for these robust models, and show such perturbations in fig. 6. Comparing fig. 6 (a,b,c,d) with fig. 6 (e,f,g,h), we see that universal perturbations generated by PGD and ADAM are different but have similar patterns. Universal perturbations generated for weaker robust models have more textures, as shown in fig. 6 (a,d,e,h) .
5 Universal perturbations for ImageNet
To validate the performance of our proposed optimization on different architectures and more complex datasets, we apply algorithm 2 to various popular architectures designed for classification on the ImageNet dataset [27]. We compare our method of universal perturbation generation with the current stateoftheart method, Iterative DeepFool (iDeepFool for short) [19]. We use the authors’ code to run the iDeepFool attack on these classification networks. For fair comparison, we execute both our method and iDeepFool on the exact same 5000 training data points and terminate both methods after 10 epochs. We use for constraint following [19], use a stepsize of 1.0 for our method, and use suggested parameters for iDeepFool. We independently execute iDeepFool since we are interested in the accuracy of the classifier on attacked images – a metric not reported in their paper ^{1}^{1}1They report “fooling ratio” which is the ratio of examples who’s label prediction changes after applying the universal perturbation. This has become an uncommon metric since the fooling ratio can increase if the universal perturbation causes an example that was originally missclassified to become correctly classified..
5.1 Benefits of the proposed method
We compare the performance of our stochastic gradient method for eq. 2 and the iDeepFool method for eq. 1 in [19]. We generate universal perturbation for Inception [32] and VGG [30] networks trained on ImageNet [27], and report the top1 accuracy in table 1. Universal perturbation generated by both iDeepFool and our method can fool networks and degrade the classification accuracy. Universal perturbations generated for the training samples generalize well and cause the accuracy of the validation samples to drop. However, when given a fixed computation budget such as number of passes on the training data (i.e., epochs), our method outperforms iDeepFool by a large margin. Our stochastic gradient method generates the universal perturbations at a much faster pace than iDeepFool. About 20 faster on InceptionV1 and 6 on VGG16 (13 on average).
After verifying the effectiveness and efficiency of our proposed stochastic gradient method^{2}^{2}2Unless otherwise specified, we use the signofgradient PGD for our stochastic gradient optimizer in algorithm 2., we use our algorithm 2 to generate universal perturbations for more advanced architectures such as ResNetV1 152 [10] and InceptionV3 [32] (and for other experiments in the remaining sections). Our attacks degrade the validation accuracy of ResNetV1 152 and InceptionV3 from 76.8% and 78% to 16.4% and 20.1%, respectively. The final universal perturbations used for the results presented in this section are illustrated in fig. 7.
InceptionV1  VGG16  
Natural  Train  76.9%  81.4 % 
Val  69.7%  70.9%  
iDeepFool  Train  43.5%  39.5% 
Val  40.7%  36.0%  
Ours  Train  17.2%  23.1% 
Val  19.8%  22.5%  
iDeepFool time (s)  9856  6076  
our time (s)  482  953 
5.2 The effect of clipping
In this section we analyze the effect of the “clipping” loss parameter in eq. 2. For this purpose, similar to our other ablation experiments, we generate universal perturbation by solving eq. 2 using PGD for InceptionV3 on ImageNet.
Since the results and performance could slightly vary with different random initializations, we run each experiment with 5 random subsets of training data. The accuracy reported is the classification accuracy on the entire validation set of ImageNet after adding the universal perturbation. The results are summarized in fig. 8. The results showcase the value of our proposed loss function for finding universal adversarial perturbations.
5.3 How much training data does the attack need?
As in [19], we analyze how the number of training points () affects the strength of universal perturbations in fig. 9. In particular, we build using varying amounts of training data. For each experiment, we report the accuracy on the entire validation set after we add the perturbation . We consider four cases for : 500, 1000, 2000, and 4000 ^{3}^{3}3The number of epochs ( in algorithm 2) was 100 epochs for 500 data samples, 40 for 1000 and 2000 samples, and 10 for 4000 samples..
6 Experiment: universal adversarial training
In this section, we analyze our robust models that are universal adversarially trained by solving the minmax problem (section 4) using algorithm 3. We use for the constraint for CIFAR10 following [16], and for ImageNet following [19].
6.1 Defense against whitebox attack
Attack method  
UnivPert  FGSM  RFGSM  PGD  

Natural  9.2%  13.3%  7.3%  0.0%  
FGSM  51.0%  95.2%  90.2%  0.0%  
RFGSM  57.0%  97.5%  96.1%  0.0%  
PGD  86.1%  56.2%  67.2%  45.8%  
Ours  91.8%  37.3%  48.6%  17.2% 
We compare our universal adversarilly trained model’s robustness with other hardened models against whitebox attacks, where the (robust) models are fully revealed to the attackers. We attack the hardened and natural models using universal perturbations (section 3) and perinstance perturbations (FGSM [9], RFGSM [34], and a 20step bounded PGD attack with stepsize 2 [16]). We also report the performance of perinstance adversarially trained models which are trained with perinstance attacks such as FGSM, RFGSM and PGD [16]. We use universal adversarial training from algorithm 3 to build a robust WideResnet [37] on CIFAR10 [13], and a robust AlexNet [14] using 5000 training samples of ImageNet [27]. The PGD perinstance adversarial training is done by training on adversarial examples that are built using steps of PGD following [16], which makes it slower than the noniterative adversarial training methods such as our universal adversarial training, FGSM, and RFGSM adversarial training.
We summarize the CIFAR10 results in table 2. The natural model, as also seen in section 3, is vulnerable to universal and perinstance perturbations. Our robust model achieves best classification (i.e. highest robustness) accuracy against universal perturbation attacks. The 20step PGD attack fools the natural, FGSM robust, and RFGSM robust models almost every time. Interestingly, our model is relatively resistant to the PGD attack, though not as robust as the PGDbased robust model. This result is particularly interesting when we consider that our method is hardened using universal perturbations. While the computational cost of our method is similar to that of noniterative perinstance adversarial training methods (FGSM, and RFGSM), our model is considerably more robust against the PGD attack that is known to be the strongest perinstance attack.
Since our universal adversarial training algorithm is cheap, it scales to large datasets such as ImageNet. As seen in fig. 10 (a), the AlexNet trained using our universal adversarial training algorithm (algorithm 3) is robust against universal attacks generated using both algorithm 1 and algorithm 2. The naturally trained AlexNet is susceptible to universal attacks. The final attacks generated for the robust and natural models are presented in fig. 10 (b,c). The universal perturbation generated for the robust AlexNet model has little structure compared to the universal perturbation built for the naturally trained AlexNet. This is similar to the trend we observed in fig. 3 and fig. 6 for the WideResnet models trained on CIFAR10.
6.2 Transferability and blackbox robustness
We study the transferability of our robust model in the blackbox threat setting, in which we generate adversarial examples based on a source model and use them to attack a target model. We study the transferability of the adversarial 20step PGD perinstance examples between various models with the WideResnet architecture that are trained on CIFAR10: natural trained model, FGSM trained robust model, RFGSM trained robust model, PGD trained robust model [16], and our robust model.
Attack source  
Natural  FGSM  RFGSM  PGD  Ours  
Natural    34.1%  64.9%  77.4%  22.0% 
FGSM  53.9%    14.1%  69.6%  22.7% 
RFGSM  71.5%  16.0%    71.7%  20.3% 
PGD  84.1%  86.3%  86.3%    76.3% 
Ours  90.0%  90.8%  91.0%  70.4%   
Average  74.9%  56.8%  64.1%  72.3%  35.4% 
The results are summarized in table 3. By examining rows of table 3, both the PGDbased robust model and our robust model are fairly hardened to blackbox attacks made for various source models. By examining columns of table 3, we can compare the transferability of the attacks made for various source models. In this metric, the attacks built for our robust model are the strongest in terms of transferability and can deteriorate the performance of both natural and other robust models. An adversary can enhance her blackbox attack by first making her source model universally robust!
6.3 Visualizing attacks on robust models
Tsipras et al. [35] use several visualization techniques to analyze PGDbased robust models and show some unexpected benefits of adversarial robustness. Similarly, we generate large perinstance adversarial examples using a PGD attack without random initialization. Large perturbations make the perturbations visible. Adversarial examples built in this way for both a natural model and our robust model are illustrated in fig. 11. Many of the adversarial examples of the natural model look similar to the original image and have a lot of “random” noise on the background, while adversarial examples for our robust model produce salient characteristics of another class and align well with human perception. The elimination of structured universal perturbations during universal adversarial training seems to have this interesting sideeffect that was only recently shown for PGD adversarial training.
7 Conclusion
We proposed using stochastic gradient methods and a “clipped” loss function as an effective universal attack that generates universal perturbations much faster than previous methods. To defend against these universal adversaries, we proposed to train robust models by optimizing a minmax problem using alternating stochastic gradient methods. We systematically study the robustness of our robust model under the whitebox and blackbox threat models. Our experiments suggest that our robust model can resist whitebox universal perturbations, and to some extent perinstance perturbations. In the blackbox threat model, our robust model is as robust as the much more expensive PGD adversarial training. Moreover, the perinstance adversarial examples generated for our robust model transfer better to attack other models. Due to the relatively cheap computational overhead of our proposed universal adversarial training algorithm, we can easily train robust models for largescale datasets such as ImageNet.
Acknowledgements: Goldstein and his students were supported by DARPA’s Lifelong Learning Machines and YFA programs, the Office of Naval Research, the AFOSR MURI program, and the Sloan Foundation. Davis and his students were supported by the Office of the Director of National Intelligence (ODNI), and IARPA (201414071600012). The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.”
References
 [1] N. Akhtar, J. Liu, and A. Mian. Defense against universal adversarial perturbations. CVPR, 2018.
 [2] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML, 2018.

[3]
S. Baluja and I. Fischer.
Adversarial transformation networks: Learning to generate adversarial examples.
AAAI, 2018. 
[4]
L. Bottou, F. E. Curtis, and J. Nocedal.
Optimization methods for largescale machine learning.
SIAM Review, 60(2):223–311, 2018.  [5] T. B. Brown, D. Mané, A. Roy, M. Abadi, and J. Gilmer. Adversarial patch. arXiv preprint arXiv:1712.09665, 2017.

[6]
N. Carlini and D. Wagner.
Adversarial examples are not easily detected: Bypassing ten detection
methods.
In
ACM Workshop on Artificial Intelligence and Security
, pages 3–14. ACM, 2017.  [7] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
 [8] Y. Dong, F. Liao, T. Pang, H. Su, X. Hu, J. Li, and J. Zhu. Boosting adversarial attacks with momentum. CVPR, 2017.
 [9] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. ICLR, 2014.
 [10] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
 [11] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvári. Learning with a strong adversary. arXiv preprint arXiv: 1511.03034, 2015.
 [12] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2014.
 [13] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.

[14]
A. Krizhevsky, I. Sutskever, and G. E. Hinton.
Imagenet classification with deep convolutional neural networks.
In NIPS, 2012.  [15] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. ICLR, 2017.

[16]
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu.
Towards deep learning models resistant to adversarial attacks.
ICLR, 2018.  [17] D. Meng and H. Chen. Magnet: a twopronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 135–147. ACM, 2017.
 [18] J. H. Metzen, M. C. Kumar, T. Brox, and V. Fischer. Universal adversarial perturbations against semantic image segmentation. In ICCV, 2017.
 [19] S.M. MoosaviDezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. In CVPR, pages 1765–1773, 2017.
 [20] S.M. MoosaviDezfooli, A. Fawzi, O. Fawzi, P. Frossard, and S. Soatto. Analysis of universal adversarial perturbations. arXiv preprint arXiv:1705.09554, 2017.
 [21] S.M. MoosaviDezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, pages 2574–2582, 2016.
 [22] K. R. Mopuri, U. Garg, and R. V. Babu. Fast feature fool: A data independent approach to universal adversarial perturbations. BMVC, 2017.
 [23] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR, pages 427–436, 2015.
 [24] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016.
 [25] J. Perolat, M. Malinowski, B. Piot, and O. Pietquin. Playing the game of universal adversarial perturbations. arXiv preprint arXiv:1809.07802, 2018.
 [26] O. Poursaeed, I. Katsman, B. Gao, and S. Belongie. Generative adversarial perturbations. CVPR, 2018.
 [27] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 2015.
 [28] P. Samangouei, M. Kabkab, and R. Chellappa. Defensegan: Protecting classifiers against adversarial attacks using generative models. ICLR, 2018.
 [29] U. Shaham, Y. Yamada, and S. Negahban. Understanding adversarial training: Increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432, 2015.
 [30] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint, 2014.
 [31] A. Sinha, H. Namkoong, and J. Duchi. Certifying some distributional robustness with principled adversarial training. ICLR, 2018.

[32]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna.
Rethinking the inception architecture for computer vision.
In CVPR, pages 2818–2826, 2016.  [33] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. ICLR, 2013.
 [34] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel. Ensemble adversarial training: Attacks and defenses. ICLR, 2018.
 [35] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry. Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152, 2018.
 [36] C. Xiao, B. Li, J.Y. Zhu, W. He, M. Liu, and D. Song. Generating adversarial examples with adversarial networks. IJCAI, 2018.
 [37] S. Zagoruyko and N. Komodakis. Wide residual networks. arXiv preprint, 2016.