1 Introduction
Neural networks produce stateoftheart results across a large number of domains ([16], [36], [35], [12]). Despite increasing adoption of neural networks in commercial settings, recent work has shown that such algorithms are susceptible to inputs with imperceptible perturbations meant to cause misclassification ([33], [10]). It is thus important to investigate additional vulnerabilities as well as defenses against them.
In this paper we investigate the problem of adversarial attacks on image classification systems. Attacks so far have only considered the conventional neural network architecture which outputs softmax predictions and is trained by minimizing the crossentropy loss function. We thus propose and evaluate the robustness of neural networks against adversarial attacks with the following modifications:

Train the model to minimize meansquared error (MSE), rather than crossentropy.

Replace traditional onehot target representations with codewords generated from a random codebook.
We evaluate our proposed modifications from multiple angles. First, we measure the robustness of the modified model using attacks under multiple threat scenarios. Secondly, we introduce an attack which, without sacrificing its efficacy towards conventional architectures, is tailored to our proposed modifications. Finally, we conduct spectral analysis on the model’s parameters to compute their upper Lipschitz bounds, a measure that has been shown to be correlated with a model’s robustness. Our results in Section 5 demonstrate that, across all three evaluations, our proposed model displays increased robustness compared to its conventional counterpart.
2 Background
2.1 Neural networks
A neural network is a nonlinear function that maps data to targets , where are the dimensions of the input and target spaces, respectively, and represents the parameters of the neural network. For conventional neural networks and classification tasks, is typically a onehot representation of the class label and is the number of classes in the dataset. In this work, we use the DenseNet architecture [13] as the existing benchmark, which has recently produced stateoftheart results on several image datasets.
2.2 Adversarial examples
The goal of an adversarial attack is to cause some misclassification from the target neural network. In particular, [33] has shown that it is possible to construct some by adding minimal perturbations to the original input such that the model misclassifies . Here, is commonly referred to as an adversarial example, while the original data is referred to as a clean example. Apart from image classification, adversarial attacks have been proposed in both natural language and audio domains ([6], [2], [40]).
2.3 Attacks
Settings.
We explore two adversarial settings, namely whitebox and blackbox scenarios. In the whitebox setting, the attacker has access to and utilizes the model’s parameters, outputs, target representations, and loss function to generate adversarial examples. In the blackbox scenario, the attacker has no access to the model’s parameters or specifications and only has the ability to query it for predictions. In this work, we employ transfer attacks, a type of blackbox attack where adversarial examples are generated using a proxy model which the adversary has access to.
Types.
There are mainly two types of attacks. In a targeted attack, an adversary generates an adversarial example so that the target model returns some target class . A targeted attack is evaluated by its success rate, which is the proportion of images for which the target class was successfully predicted (the lower the better from the perspective of the defense). On the other hand, in an untargeted attack, the attacker causes the model to simply return some prediction . It is evaluated by the accuracy of the target model, which denotes the proportion of images which failed to get misclassified (the higher the better from the perspective of the defense).
The following sections describe the attacks used in this work.
Fast Gradient Sign Method (FGSM).
The Fast Gradient Sign Method [10], one of the earliest gradientbased attacks, generates adversarial examples via:
where is the loss function of the neural network, is the target class, and is a parameter which controls the magnitude of the perturbations made to the original input . The gradient, which is taken w.r.t the input, determines which direction each pixel should be perturbed in order to maximize the loss function and cause a misclassification.
Basic Iterative Method (BIM).
The Basic Iterative Method, proposed by [18], applies FGSM iteratively to find more effective adversarial examples.
Momentum Iterative Method (MIM).
LBFGS Attack.
[33]
proposed the LBFGS attack, the first targeted whitebox attack on convolutional neural networks, which solves the following constrained optimization problem:
minimize  
s.t. 
The above formulation aims to minimize two objectives; the left term measures the distance ( norm) between the input and the adversarial example, while the right term represents the crossentropy loss. It is used only as a targeted attack.
Deep Fool.
The Deep Fool attack, proposed by [24]
, is an attack which imagines the decision boundaries of neural networks to be linear hyperplanes and uses an iterative optimization algorithm similar to the NewtonRaphson method to find the smallest
perturbation which causes a misclassification. It is used only as an untargeted attack.Madry et al.
[22] proposed an attack based on projected gradient descent (PGD), which relies on local first order information of the target model. The method is similar to FGSM and BIM, except that it uses random starting positions for generating adversarial examples.
Carlini & Wagner L2 (CWL2)
. The Carlini & Wagner L2 attack [5]
follows an optimization problem similar to that of LBFGS but replaces crossentropy with a cost function that depends on the presoftmax logits of the network. In particular, the attack solves the following problem:
minimize  
s.t. 
where is the perturbation made to the input and is the objective function:
Here, represents the presoftmax logits of the network. In short, the attack aims to maximize the logit value of the target class while minimizing the norm of the input perturbations.
3 Improving adversarial robustness
In this work we have two proposals. First, we propose changes to the conventional neural network architecture and target representations to defend against adversarial attacks described in Section 2.3. Second, we propose a modified, more effective CWL2 attack that is specifically tailored to our proposed defense.
3.1 Training on meansquared error
Instead of the conventional crossentropy loss, we propose to use MSE to compute the error between the output of the model and the target , where is the set of target representations for all classes. During inference, we select the output class for which its target representation yields the smallest euclidean distance to .
3.2 Randomized target representations
Instead of using onehot encoding as target representations, we represent each target class as a codeword from a random codebook. Specifically, the
target representations corresponding to theclasses are sampled once at the beginning of training from a uniform distribution
based on a secret key. To match the representation space of the network output and the targets, the conventional softmax layer is replaced with a tanh activation with
outputs.3.3 Modified CWL2 attack
The Carlini & Wagner L2 attack makes several assumptions about the target network’s architecture based on its cost function mentioned in Section 2.3, namely that the highest logit value corresponds to the most likely class. However, applying our proposed neural network modifications breaks such assumptions, for the output of the network would be tanh activations and the length of the output would not correspond to the number of classes in the dataset. We thus propose a simple modification to the CWL2 attack where the cost function considers the distance in some metric space between the logits and the targets:
Like with the Carlini & Wagner L2 attack, if and only if the model predicts the target class. Using the changeofvariables formulation utilized in [5] to enforce box constraints on the perturbations, our attack finds some which optimizes the following objective:
where is a tradeoff constant that controls the importance of the size of perturbations (larger values of allow for larger distortions). For our experiments, we have defined as the euclidean distance.
3.4 Lipschitz bounds and robustness
Earlier works have suggested that the sensitivity of neural networks towards adversarial perturbations can be measured with the upper Lipschitz bound of each network layer [33]. Parseval Networks [7], for example, have introduced a layerwise regularization technique for improving robustness by enforcing smaller global Lipschitz bounds. More specifically, [7] have shown that:
where , and are the upper Lipschitz bounds of and , respectively. In other words, the efficacy of an adversarial attack depends on the generalization error of the target model as well as the Lipschitz bounds of its layers. This suggests that smaller Lipschitz bounds indicate a more robust model. For both fullyconnected and convolutional layers, this can be measured by calculating their operator norms. The operator norm of the
th fullyconnected layer is simply the largest singular value of the weight matrix. The Lipschitz constant of the
th layer is then:For convolutional kernels, we rely on the formulation in [33]
, which involves applying the twodimensional discrete Fourier Transform to find the largest singular values.
Section 5.6 presents empirical results which demonstrate that simply changing the loss function from crossentropy to meansquared error can yield model parameters with significantly smaller Lipschitz bounds.
4 Experimental setup
In this section we describe the evaluation datasets, evaluation models and adversarial image generation process.
4.1 Datasets
CIFAR10 [16] is a small image classification dataset with 10 classes. It contains 60,000 thumbnailsize images of dimensions 32x32x3, of which 10,000 images are withheld for testing.
MNIST [21] is another image classification dataset containing monochromatic thumbnails (28x28) of handwritten digits. It is comprised of 60,000 training images and 10,000 testing images.
FashionMNIST [38] is a relatively new image classification dataset containing thumbnail images of 10 different types of clothing (shoes, shirts, etc.) which acts as a dropin replacement to MNIST.
4.2 Models evaluated
We use three variants of the DenseNet model to generate adversarial examples:

O:SOFTMAX:CE refers to a DenseNet model with softmax activations trained on crossentropy loss and onehot target representations.

O:SOFTMAX:MSE refers to a DenseNet model with softmax activations trained on MSE and onehot target representations.

R:TANH:MSE refers to a DenseNet model with tanh activations trained on MSE using codeword target representations. We used a codeword length of .
We have evaluated the robustness of the R:TANH:MSE model with different codeword lengths (64, 256, and 1024) but found no significant discrepancies in the results.
Attack  Modified Parameter 

Basic Iterative Method  epsilon () 
Carlini & Wagner L2  initial constant () 
Deep Fool  max iterations () 
Fast Gradient Sign Method  epsilon () 
LBFGS Attack  initial constant () 
Madry et al.  epsilon () 
Momentum Iterative Method  epsilon () 
Attack  Parameters  
Basic Iterative Method  eps_iter  nb_iter 
0.05  10  
Carlini & Wagner L2  binary_search_steps  max_iterations 
5  1000  
Deep Fool  nb_candidate  overshoot 
10  0.02  
LBFGS  binary_search_steps  max_iterations 
5  1000  
Madry et al.  eps_iter  nb_iter 
0.01  40  
Momentum Iterative Method  eps_iter  nb_iter 
0.06  10 
CWL2 ()  MIM ()  Deep Fool ()  
Setting  0.01  0.1  1  10  0.01  0.05  0.1  0.2  10  20  30  40 
CIFAR10  
O:SOFTMAX:CE  0.022  0.022  0.022  0.022  0.682  0.043  0.041  0.041  0.159  0.049  0.034  0.031 
O:SOFTMAX:MSE  0.078  0.044  0.039  0.039  0.838  0.595  0.509  0.467  0.112  0.069  0.065  0.061 
R:TANH:MSE  0.583  0.584  0.586  0.585  0.919  0.701  0.593  0.536  0.583  0.582  0.582  0.582 
MNIST  
O:SOFTMAX:CE  0.008  0.008  0.008  0.008  0.994  0.661  0.012  0.007  0.009  0.008  0.008  0.008 
O:SOFTMAX:MSE  0.897  0.182  0.123  0.118  0.997  0.986  0.956  0.831  0.100  0.074  0.059  0.049 
R:TANH:MSE  0.995  0.995  0.975  0.983  0.995  0.995  0.994  0.973  0.815  0.815  0.815  0.815 
FMNIST  
O:SOFTMAX:CE  0.041  0.041  0.041  0.041  0.196  0.038  0.035  0.034  0.049  0.042  0.041  0.041 
O:SOFTMAX:MSE  0.156  0.076  0.057  0.049  0.836  0.304  0.211  0.142  0.082  0.064  0.059  0.056 
R:TANH:MSE  0.946  0.942  0.946  0.945  0.902  0.691  0.574  0.568  0.935  0.935  0.935  0.935 
BIM ()  FGSM ()  Madry et al. ()  
Setting  0.01  0.05  0.1  0.2  0.01  0.05  0.1  0.2  0.02  0.04  0.08  0.1 
CIFAR10  
O:SOFTMAX:CE  0.751  0.053  0.042  0.042  0.743  0.291  0.193  0.139  0.301  0.050  0.041  0.041 
O:SOFTMAX:MSE  0.807  0.424  0.240  0.174  0.879  0.729  0.666  0.535  0.790  0.707  0.668  0.608 
R:TANH:MSE  0.850  0.634  0.390  0.213  0.923  0.699  0.604  0.451  0.923  0.897  0.877  0.839 
MNIST  
O:SOFTMAX:CE  0.994  0.628  0.015  0.008  0.994  0.949  0.654  0.227  0.983  0.809  0.263  0.008 
O:SOFTMAX:MSE  0.997  0.929  0.490  0.196  0.997  0.988  0.985  0.774  0.992  0.983  0.975  0.896 
R:TANH:MSE  0.995  0.882  0.429  0.196  0.995  0.995  0.918  0.332  0.995  0.995  0.995  0.993 
FMNIST  
O:SOFTMAX:CE  0.564  0.038  0.037  0.036  0.659  0.321  0.225  0.147  0.046  0.036  0.033  0.029 
O:SOFTMAX:MSE  0.815  0.296  0.176  0.142  0.882  0.509  0.362  0.224  0.731  0.542  0.425  0.315 
R:TANH:MSE  0.799  0.233  0.089  0.051  0.905  0.671  0.389  0.185  0.901  0.863  0.829  0.802 
4.3 Generating adversarial examples
For each dataset mentioned in Section 4.1, we train a model on the training set and generate adversarial examples using the test set. For targeted attacks, we randomly sample a target class for each image in the test set.
We evaluate each model’s (listed in Section 4.2) robustness against attacks (listed in Table 1) under the whitebox setting. For the R:TANH:MSE model, the attacker has access to the codeword representations. We also evaluate model robustness against transfer attacks, a type of blackbox attack where adversarial examples are generated using a proxy model which the adversary has access to. Finally, we further measure the robustness of our proposed model using the modified CWL2 attack.
All experiments are implemented using TensorFlow
[1], a popular framework for building deep learning algorithms.
4.3.1 Attack parameters
For a given attack, we generate adversarial examples across a range of values for a particular parameter which controls the magnitude of the perturbations made. Table 1 lists the parameters which are modified for each attack, whereas Table 2 lists the parameters held constant. We use the default values defined in Cleverhans for our constant parameters.
4.3.2 Adapting attacks to our proposed techniques
The attacks described in Section 2.3 are implemented using the Cleverhans library [25]. By default, the attacks assume that the model outputs softmax predictions and that the targets are represented as onehot vectors. Hence the internal loss function for some attacks (e.g. gradientbased iterative attacks) is predefined as crossentropy. However, because the crossentropy loss function is not compatible with the R:TANH:MSE model, we have adapted the library to use meansquared error when the target model has also been trained on meansquared error. These adaptations are important in preserving the whitebox assumption of each attack.
5 Experimental observations
In this section, we present and analyze the performance of the evaluation models under different attack scenarios: untargeted and targeted attacks (Section 5.2), blackbox attacks (Section 5.3), and our modified CWL2 attack (Section 5.4). Benchmark performances on the original datasets are presented in Section 5.1.
5.1 Clean test performance
Table 4 lists the accuracy of each model across each clean test dataset. We observe minimal differences in accuracies across the models, and hence our proposed modifications can maintain stateoftheart classification performances.
CIFAR10  MNIST  FMNIST  

O:SOFTMAX:CE  0.933  0.996  0.948 
O:SOFTMAX:MSE  0.931  0.997  0.948 
R:TANH:MSE  0.930  0.996  0.945 
LBFGS ()  BIM ()  Madry et al. ()  
Setting  0.01  0.1  1  10  0.1  0.2  0.3  0.4  0.04  0.06  0.08  0.1 
CIFAR10  
O:SOFTMAX:CE  1.00  1.00  1.00  1.00  0.997  1.00  1.00  1.00  0.934  0.998  1.00  1.00 
O:SOFTMAX:MSE  0.667  0.864  0.955  0.994  0.461  0.624  0.658  0.664  0.266  0.343  0.402  0.441 
R:TANH:MSE  0.272  0.475  0.554  0.564  0.230  0.337  0.353  0.353  0.242  0.345  0.426  0.467 
MNIST  
O:SOFTMAX:CE  1.00  1.00  1.00  1.00  0.851  0.992  0.997  0.999  0.057  0.464  0.828  0.94 
O:SOFTMAX:MSE  0.040  0.536  0.92  0.991  0.316  0.539  0.597  0.612  0.008  0.042  0.163  0.269 
R:TANH:MSE  0.045  0.457  0.72  0.776  0.057  0.129  0.169  0.184  0.007  0.068  0.154  0.245 
FMNIST  
O:SOFTMAX:CE  1.00  1.00  1.00  1.00  0.957  1.00  1.00  1.00  0.999  1.00  1.00  1.00 
O:SOFTMAX:MSE  0.571  0.87  0.97  0.992  0.457  0.581  0.600  0.603  0.464  0.589  0.648  0.659 
R:TANH:MSE  0.644  0.808  0.826  0.832  0.807  0.926  0.938  0.940  0.626  0.724  0.794  0.834 
CWL2 ()  MIM ()  FGSM ()  
Setting  0.01  0.1  1  10  0.1  0.2  0.3  0.4  0.1  0.2  0.3  0.4 
CIFAR10  
O:SOFTMAX:CE  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.445  0.316  0.231  0.182 
O:SOFTMAX:MSE  0.756  0.842  0.861  0.867  0.351  0.433  0.459  0.468  0.046  0.061  0.071  0.083 
R:TANH:MSE  0.368  0.362  0.361  0.346  0.095  0.136  0.137  0.160  0.028  0.044  0.081  0.082 
MNIST  
O:SOFTMAX:CE  1.00  1.00  1.00  1.00  0.908  0.997  0.998  0.998  0.095  0.131  0.115  0.113 
O:SOFTMAX:MSE  0.176  0.592  0.677  0.669  0.153  0.319  0.347  0.357  0.014  0.028  0.052  0.067 
R:TANH:MSE  0.006  0.006  0.006  0.002  0.023  0.040  0.052  0.050  0.007  0.031  0.047  0.067 
FMNIST  
O:SOFTMAX:CE  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.213  0.158  0.122  0.109 
O:SOFTMAX:MSE  0.658  0.812  0.845  0.851  0.394  0.396  0.394  0.382  0.073  0.085  0.102  0.107 
R:TANH:MSE  0.583  0.592  0.576  0.548  0.114  0.140  0.143  0.147  0.048  0.081  0.088  0.097 
5.2 Untargeted and targeted attacks
Table 3 lists the accuracies of the models against untargeted whitebox attacks. Both O:SOFTMAX:MSE and R:TANH:MSE models demonstrate higher accuracies on the adversarial examples compared to the O:SOFTMAX:CE model; we observe an increase in accuracies of up to 98.7%. Similar results can be observed in Table 5, where the O:SOFTMAX:MSE and R:TANH:MSE models achieve a consistent decrease in attack success rates of up to 99.8%.
5.3 Black box attacks
Table 6 shows the accuracies of transfer attacks against the O:SOFTMAX:MSE and R:TANH:MSE models. Our proposed models demonstrate more robustness towards blackbox attacks compared to the whitebox versions with the same configurations. Though this is expected behavior, it is imperative to evaluate a defense under multiple threat scenarios.
Setting  CWL2  Deep Fool  MIM 
Parameter  
CIFAR10  
O:SOFTMAX:MSE  0.483  0.451  0.895 
R:TANH:MSE  0.612  0.617  0.926 
MNIST  
O:SOFTMAX:MSE  0.996  0.984  0.997 
R:TANH:MSE  0.996  0.973  0.995 
FMNIST  
O:SOFTMAX:MSE  0.937  0.933  0.839 
R:TANH:MSE  0.952  0.946  0.935 
CWL2 ()  Ours ()  
Setting  0.1  1.0  0.1  1.0 
CIFAR10  
O:SOFTMAX:CE  1.000  1.000  1.000  1.000 
R:TANH:MSE  0.368  0.362  0.859  0.868 
MNIST  
O:SOFTMAX:CE  1.000  1.000  1.000  1.000 
R:TANH:MSE  0.006  0.006  0.715  0.772 
FMNIST  
O:SOFTMAX:CE  1.000  1.000  1.000  1.000 
R:TANH:MSE  0.583  0.592  0.798  0.829 
5.4 Modified CWL2 attack
Table 7 compares our proposed attack with the CWL2 attack. The results show that our attack maintains its efficacy against O:SOFTMAX:CE models while significantly increasing its success rate against the R:TANH:MSE model up to 70.9%. We note that increasing the initial constant for our attack yields increased success rates, which is aligned with the intuition that the parameter controls the importance of the attack’s success as highlighted in Section 3.3. We also observe that, despite the increase in the attack’s efficacy, the R:TANH:MSE model displays more robustness compared to the O:SOFTMAX:CE model, with a decrease in success rates of up to 28.5%.
5.5 Distortion vs. performance
On page 1, Figure 1 displays adversarial images generated from targeted whitebox Madry et al. attacks on the O:SOFTMAX:CE and R:TANH:MSE models respectively. We choose the lowest for which the attack achieves success rates of 100%. It is clear that the R:TANH:MSE model requires much larger perturbations for an attack to achieve the same success rates as against the O:SOFTMAX:CE model.
Figure 2 displays adversarial images generated using the Momentum Iterative Method against both O:SOFTMAX:CE and R:TANH:MSE models where . We observe that the R:TANH:MSE model is robust even against adversarial images where the perturbations are clearly perceptible to humans.
Finally, we visualize adversarial examples generated using our modified CWL2 attack and the R:TANH:MSE model in Figure 3, where the attack achieves higher success rates compared to the original attack. The perturbations made to the images are much less perceptible compared to the adversarial examples displayed in Figures 1 and 2.
5.6 Comparing upper Lipschitz bounds
Figure 4 compares the upper Lipschitz bounds of convolutional layers between the O:SOFTMAX:CE and O:SOFTMAX:MSE models. The upper bounds for the O:SOFTMAX:MSE model are consistently smaller than those of the O:SOFTMAX:CE model across each dataset up to a factor of three, supporting our hypothesis that models trained to minimize meansquared error are more robust to small perturbations.
6 Related work
Several defenses have also been proposed. To date, the most effective defense technique is adversarial training ([19], [37], [31], [34]), where the model is trained on a mix of clean and adversarial data. This has shown to provide a regularization effect that makes models more robust towards attacks.
[27]
proposed defensive distillation, a mechanism whereby a model is trained based on soft labels generated by another ‘teacher’ network in order to prevent overfitting. Other methods include introducing randomness to or applying transformations on the input data and/or the layers of the network (
[11], [8], [28], [39]). However, [3] have identified that the apparent robustness of several defenses can be attributed to the introduction of computation and transformations that mask the gradients and thus break existing attacks that rely on gradients to generate adversarial examples. Their work demonstrates that small, tailored modifications to the attacks can circumvent these defenses completely.7 Conclusion
We have reported interesting experimental results demonstrating the adversarial robustness of models that do not follow conventional specifications. We have observed that simply changing the loss function that is minimized during training can greatly impact the robustness of a neural network against adversarial attacks. Our evaluation strategy is manifold, consisting of existing attacks, new attacks adjusted to our proposed modifications, and a spectral analysis of the model’s parameters. The increase in robustness observed from experimental results suggests the importance of considering alternatives to conventional design choices when making neural networks more secure. Future work would involve further investigation into the reasons for such modifications to improve the robustness of neural networks.
References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,
Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin,
Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard,
Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh
Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah,
Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar,
Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol
Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang
Zheng.
TensorFlow: Largescale machine learning on heterogeneous systems, 2015.
Software available from tensorflow.org.  [2] Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, BoJhang Ho, Mani Srivastava, and KaiWei Chang. Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998, 2018.
 [3] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
 [4] Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. Exploring the space of blackbox attacks on deep neural networks. arXiv preprint arXiv:1712.09491, 2017.
 [5] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016.
 [6] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speechtotext. arXiv preprint arXiv:1801.01944, 2018.
 [7] Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. arXiv preprint arXiv:1704.08847, 2017.
 [8] Guneet S Dhillon, Kamyar Azizzadenesheli, Zachary C Lipton, Jeremy Bernstein, Jean Kossaifi, Aran Khanna, and Anima Anandkumar. Stochastic activation pruning for robust adversarial defense. arXiv preprint arXiv:1803.01442, 2018.

[9]
Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and
Jianguo Li.
Boosting adversarial attacks with momentum.
In
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 2018.  [10] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
 [11] Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens van der Maaten. Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117, 2017.
 [12] Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al. Deep speech: Scaling up endtoend speech recognition. arXiv preprint arXiv:1412.5567, 2014.
 [13] Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, volume 1, page 3, 2017.
 [14] Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush. Characteraware neural language models. In AAAI, pages 2741–2749, 2016.
 [15] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [16] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
 [17] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
 [18] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
 [19] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
 [20] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 [21] Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010.
 [22] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
 [23] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
 [24] SeyedMohsen MoosaviDezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016.
 [25] Ian Goodfellow Reuben Feinman Fartash Faghri Alexander Matyasko Karen Hambardzumyan YiLin Juang Alexey Kurakin Ryan Sheatsley Abhibhav Garg YenChen Lin Nicolas Papernot, Nicholas Carlini. cleverhans v2.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768, 2017.
 [26] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016.
 [27] Nicolas Papernot, Patrick D McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. corr vol. abs/1511.04508 (2015), 2015.
 [28] Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defensegan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018.
 [29] John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International Conference on Machine Learning, pages 1889–1897, 2015.
 [30] Hanie Sedghi, Vineet Gupta, and Philip M. Long. The singular values of convolutional layers. CoRR, abs/1805.10408, 2018.
 [31] Aman Sinha, Hongseok Namkoong, and John Duchi. Certifying some distributional robustness with principled adversarial training. 2018.
 [32] Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766, 2017.
 [33] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 [34] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
 [35] Aäron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. In SSW, page 125, 2016.
 [36] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017.
 [37] Xi Wu, Uyeong Jang, Jiefeng Chen, Lingjiao Chen, and Somesh Jha. Reinforcing adversarial robustness using model confidence induced by adversarial training. In International Conference on Machine Learning, pages 5330–5338, 2018.
 [38] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashionmnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
 [39] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991, 2017.
 [40] Hiromu Yakura and Jun Sakuma. Robust audio adversarial example for a physical attack. arXiv preprint arXiv:1810.11793, 2018.
 [41] Zhuolin Yang, Bo Li, PinYu Chen, and Dawn Song. Characterizing audio adversarial examples using temporal dependency. arXiv preprint arXiv:1809.10875, 2018.
 [42] Xiaoyong Yuan, Pan He, Qile Zhu, Rajendra Rana Bhat, and Xiaolin Li. Adversarial examples: Attacks and defenses for deep learning. arXiv preprint arXiv:1712.07107, 2017.