1 Introduction
Deep neural networks (DNNs) have been widely used for tackling numerous machine learning problems that were once believed to be challenging. With their remarkable ability of fitting training data, DNNs have achieved revolutionary successes in many fields such as computer vision, natural language progressing, and robotics. However, they were shown to be vulnerable to adversarial examples that are generated by adding carefully crafted perturbations to original images. The adversarial perturbations can arbitrarily change the network’s prediction but often too small to affect human recognition
[26, 12]. This phenomenon brings out security concerns for practical applications of deep learning.
Two main types of attack settings have been considered in recent research [10, 3, 6, 22]: blackbox and whitebox settings. In the blackbox setting, the attacker can provide any inputs and receive the corresponding predictions. However, the attacker cannot get access to the gradients or model parameters under this setting; whereas in the whitebox setting, the attacker is allowed to analytically compute the model’s gradients, and have full access to the model architecture and weights. In this paper, we focus on defending against the whitebox attack which is the harder task.
Recent work [25]
presented both theoretical arguments and an empirical onetoone relationship between input dimension and adversarial vulnerability, showing that the vulnerability of neural networks grows with the input dimension. Therefore, reducing the data dimension may help improve the robustness of deep neural networks. Furthermore, a consensus in the highdimensional data analysis community is that, a method working well on the highdimensional data is because the data is not really of highdimension
[14]. These highdimensional data, such as images, are actually embedded in a much lower dimensional space. Hence, carefully reducing the input dimension may improve the robustness of the model without sacrificing performance.Inspired by the observation that the intrinsic dimension of image data is actually much smaller than its pixel space dimension [14] and the vulnerability of a model grows with its input dimension [25]
, we propose a defense framework that embeds input images into a lowdimensional space using a deep encoder and performs classification based on the latent embedding with a classifier network. However, arbitrarily projecting input images to a lowdimensional space based on a deep encoder does not guarantee improving the robustness of the model, because there are a lot of mapping functions including pathological ones from the raw input space to the lowdimensional space capable of minimizing the classification loss. To constrain the mapping function, we employ distribution regularization in the embedding space leveraging optimal transport theory. We call our new classification framework Optimal Transport Classifier (OTClassifier). To be more specific, we introduce a discriminator in the latent space which tries to separate the generated code vectors from the encoder network and the ideal code vectors sampled from a prior distribution, i.e., a standard Gaussian distribution. Employing a similar powerful competitive mechanism as demonstrated by Generative Adversarial Networks
[9], the discriminator enforces the embedding space of the model to follow the prior distribution.In our OTClassifier framework, the encoder and discriminator structures together project the input data to a lowdimensional space with a nice shape, then the classifier performs prediction based on the lowdimensional embedding. Based on the optimal transport theory, the proposed OTClassifier minimizes the discrepancy between the distribution of the true label and the distribution of the framework output, thus only retaining important features for classification in the embedding space. With a small embedding dimension, the effect of the adversarial perturbation is largely diminished through the projection process.
We compare OTClassifier with other stateoftheart defense methods on MNIST, CIFAR10, STL10 and Tiny Imagenet. Experimental results demonstrate that our proposed OTClassifier outperforms other defense methods by a large margin. To sum up, this paper makes the following three main contributions:

[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt]

A novel unified endtoend robust deep neural network framework against adversarial attacks is proposed, where the input image is first projected to a lowdimensional space and then classified.

An objective is induced to minimize the optimal transport cost between the true class distribution and the framework output distribution, guiding the encoder and discriminator to project the input image to a lowdimensional space without losing important features for classification.

Extensive experiments demonstrate the robustness of our proposed OTClassifier framework under the whitebox attacks, and show that OTClassifier combined with adversarial training outperforms other stateoftheart approaches on several benchmark image datasets.
2 Related Work
In this section, we summarize related work into three categories: attack methods, defense mechanisms and optimal transport theory. We first discuss different whitebox attack methods, followed by a description of different defense mechanisms against these attacks, and finally optimal transport theory.
2.1 Attack Methods
Under the whitebox setting, attackers have all information about the targeted neural network, including network structure and gradients. Most whitebox attacks generate adversarial examples based on the gradient of loss function with respect to the input. An algorithm called fast gradient sign method (FGSM) was proposed in
[10] which generates adversarial examples based on the sign of gradient. Many other whitebox attack methods have been proposed recently [20, 5, 17, 4], and among them C&W and PGD attacks have been widely used to test the robustness of machine learning models.CW attack: The adversarial attack method proposed by Carlini and Wagner [4]
is one of the strongest whitebox attack methods. They formulate the adversarial example generating process as an optimization problem. The proposed objective function aims at increasing the probability of the target class and minimizing the distance between the adversarial example and the original input image. Therefore, C
W attack can be viewed as a gradientdescent based adversarial attack.PGD attack: The projected gradient descent attack is proposed by [17], which finds adversarial examples in an ball of the image. The PGD attack updates in the direction that decreases the probability of the original class most, then projects the result back to the ball of the input. An advantage of PGD attack over CW attack is that it allows direct control of distortion level by changing , while for CW attack, one can only do so indirectly via hyperparameter tuning.
Both CW attack and PGD attack have been frequently used to benchmark the defense algorithms due to their effectiveness [2]. In this paper, we mainly use PGD untargeted attack to evaluate the effectiveness of the defense method under whitebox setting.
Instead of crafting different adversarial perturbation for different input image, an algorithm was proposed by [19] to construct a universal perturbation that causes natural images to be misclassified. However, since this universal perturbation is imageagnostic, it is usually larger than the imagespecific perturbation generated by PGD and C&W.
2.2 Defense Mechanisms
Many works have been done to improve the robustness of deep neural networks. To defend against adversarial examples, defenses that aim to increase model robustness fall into three main categories: i) augmenting the training data with adversarial examples to enhance the existing classifiers [17, 21, 10]; ii) leveraging modelspecific strategies to enforce model properties such as smoothness [23]; and, iii) trying to remove adversarial perturbations from the inputs [28, 24, 18]. We select three representative methods that are effective under whitebox setting.
Adversarial training: Augmenting the training data with adversarial examples can increase the robustness of the deep neural network. Madry et al. [17] recently introduced a minmax formulation against adversarial attacks. The proposed model is not only trained on the original dataset but also adversarial example in the ball of each input image.
Random SelfEnsemble: Another effective defense method under whitebox setting is RSE [15]. The authors proposed a “noise layer”, which fuses output of each layer with Gaussian noise. They empirically show that the noise layer can help improve the robustness of deep neural networks. The noise layer is applied in both training and testing phases, so the prediction accuracy will not be largely affected.
DefenseGAN: DefenseGAN [24] leverages the expressive capability of GANs to defend deep neural networks against adversarial examples. It is trained to project input images onto the range of the GAN’s generator to remove the effect of the adversarial perturbation. Another defense method that uses the generative model to filter out noise is MagNet proposed by [18]. However, the differences between OTClassifier and the two methods are obvious. OTClassifier focus on reducing the dimension, and performing classification based on the lowdimensional embedding, while DefenseGAN and MagNet mainly apply the generative model to filter out the adversarial noise, and both DefenseGAN and MagNet perform classification on the original dimension space. [24] showed that DefenseGAN is more robust than MagNet, so we only compare with DefenseGAN in the experiment.
2.3 Optimal Transport Theory
There are various ways to define the distance or divergence between the target distribution and the model distribution. In this paper, we turn to the optimal transport theory^{1}^{1}1More details available at https://optimaltransport.github.io/slides/, which provides a much weaker topology than many others. In real applications, data is usually embedded in a space of a much lower dimension, such as a nonlinear manifold. KullbackLeibler divergence, JensenShannon divergence and Total Variation distance are not sensible cost functions when learning distributions supported by lower dimensional manifolds [1]. In contrast, the optimal transport cost is more sensible in this setting.
Kantorovich’s distance induced by the optimal transport problem is given by
where
is the set of all joint distributions of
with marginals and , and is any measurable cost function.measures the divergence between probability distributions
and .When the probability measures are on a metric space, the th root of is called the Wasserstein distance. Recently, Tolstikhin [27] introduced a new algorithm to build a generative model of the target data distribution based on the Wasserstein distance. The proposed generative model can generate samples of better quality, as measured by the FID score.
3 Proposed Framework: Optimal Transport Classifier
We propose a novel defense framework, OTClassifier, which aims at projecting the image data to a lowdimensional space to remove noise and stabilize the classification model by minimizing the optimal transport cost between the true label distribution and the distribution of the OTClassifier output (). The encoder and discriminator structures together help diminish the effect of the adversarial perturbation by projecting input data to a space of lower dimension, then the classifier part performs classification based on the lowdimensional embedding.
3.1 Notations
In this paper, we use and distortion metrics to measure similarity. We report distance in the normalized space, so that a distortion of corresponds to , and distance as the total rootmeansquare distortion normalized by the total number of pixels.
We use calligraphic letters for sets (i.e.,
), capital letters for random variables (i.e.,
), and lower case letters for their values (i.e., ). The probability distributions are denoted with capital letters (i.e., ) and corresponding densities with lower case letters (i.e., ).Images are projected to a lowdimensional embedding vector through the encoder . The discriminator discriminates between the generated code and the ideal code . The classifier performs classification based on the generated code , producing output , where is the number of classes. The label of is denoted as . An overview of the framework is shown in Figure 1.
3.2 Framework Details
At training stage, the encoder first maps the input to a lowdimensional space, resulting in generated code (). Another ideal code () is sampled from the prior distribution, and the discriminator discriminates between the ideal code (positive data) and the generated code (negative data). The classifier () predicts the image label based on the generated code (). Details of training process can be found in Algorithm 1.
At inference time, only the encoder and the classifier are used. The input image is first mapped to a lowdimensional space by the encoder (), then the latent code is fed into the classifier to obtain the predicted label.
Our framework can be combined with other stateoftheart defense methods, such as adversarial training. Since the dimension of the input images are reduced to a much lower dimension, adversarial training also benefits from this dimension reduction. In the experiments, we combine OTclassifier with adversarial training and compare it with other defense methods.
3.3 Theoretical Analysis
The OTClassifier framework embeds important classification features by minimizing the discrepancy between the distribution of the true label () and the distribution of the framework output (). In the framework, the classifier () maps a latent code sampled from a fixed distribution in a latent space , to the output . The density of OTClassifier output is defined as follow:
(1) 
In this paper we apply standard Gaussian as our prior distribution , but other priors may be used for different cases. Assume there is an oracle assigning the image data () its true label (). To minimize the optimal transport cost between the distribution of the true label () and the distribution of the OTClassifier output (), it is sufficient to find a conditional distribution such that its marginal distribution is identical to the prior distribution .
Theorem 1
For as defined above with a deterministic and any function
where is the set of all joint distributions of with marginals and , and is any measurable cost function. is the marginal distribution of when and . (The proof is deferred to the Appendix. )
Therefore, optimizing over the objective on the r.h.s is equivalent to minimizing the discrepancy between the true label distribution () and the output distribution , thus the important classification features are embedded in the lowdimensional space. This is the core idea of the paper, summarizing the highdimensional data in a space of much lower dimension without losing important features for classification. To implement the r.h.s objective, the constraint on can be relaxed by adding a penalty term. The final objective of OTClassifier is:
(2) 
where is any nonparametric set of probabilistic encoders, is a hyperparameter and is an arbitrary divergence between and .
To estimate the divergences between
and , we apply a GANbased framework, fitting a discriminator to minimize the 1Wasserstein distance between and :We have also tried the JsensenShannon divergence, but as expected, Wasserstein distance provides more stable training and better results. When training the framework, the weight clipping method proposed in Wasserstein GAN [1] is applied to help stabilize the training of discriminator .
4 Experiments
In this section, we compare the performance of our proposed algorithm (OTClassifier) with other stateoftheart defense methods on several benchmark datasets:

[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt]

MNIST [13]: handwritten digit dataset, which consists of training images and testing images. Theses are black and white images in ten different classes.

CIFAR10 [11]: natural image dataset, which contains training images and testing images in ten different classes. These are low resolution color images.

STL10 [7]: color image dataset similar to CIFAR10, but contains only training images and testing images in ten different classes. The images are of higher resolution .

Tiny Imagenet [8]: a subset of Imagenet dataset. Tiny Imagenet has classes, and each class has training images, testing images, making it a challenging benchmark for defense task. The resolution of the images is .
Various defense methods have been proposed to improve the robustness of deep neural networks. Here we compare our algorithm with stateoftheart methods that are robust in whitebox setting. Madry’s adversarial training (Madry’s Adv) is proposed in [17], which has been recognized as one of the most successful defense method in whitebox setting, as shown in [2].
Random SelfEnsemble (RSE) method introduced by [15] adds stochastic components in the neural network, achieving similar performance to Madry’s adversarial training algorithm.
Another method we would like to compare with is DefenseGAN [24]. It first trains a generative adversarial network to model the distribution of the training data. At inference time, it finds a close output to the input image and feed that output into the classifier. This process “projects” input images onto the range of GAN’s generator, which helps remove the effect of adversarial perturbations. In [24], the author demonstrated the performance of DefenseGAN on MNIST and FashionMNIST, so we will compare our method with DefenseGAN on MNIST.
Optimal transport classifier can be combined with other stateoftheart defense methods. In general, Madry’s adversarial training is more robust than RSE, so we combine OTClassifier with adversarial training (OTCLA+Adv) in our experiments.
4.1 Evaluate Models Under Whitebox PGD Attack
In this section, we evaluate the defense methods against PGD untargeted attack, which is one of the strongest whitebox attack methods. Starting from , PGD attack conducts projected gradient descent iteratively to update the adversarial example:
where M is the targeted model, is the projection to the set , is the label of , and is the step size. It is obvious that larger allows larger distortion of the original image. Models are evaluated under different distortion level (), and the larger the distortion the stronger the attack. Depending on the image scale and type, different datasets are sensitive to different strength of attack.
Models on MNIST are evaluated under distortion level from to by . Models on CIFAR10 and STL10 are evaluated under . Models on Tiny Imagenet are evaluated under . As mentioned in the notation part, all the distortion levels are reported in the normalized space. The experimental results are shown in Figure 2. To demonstrate the results more clearly, we show part of the results in Table 1.
Data  Defense  0  0.1  0.2  0.3  0.4 

MNIST  Adv. Training  99.2  97.3  86.8  35.4  2.7 
OTCLA+Adv  99.1  98.7  97.2  94.9  71.1  
Data  Defense  0  0.015  0.03  0.045  0.06 
CIFAR10  Adv. Training  82.6  68.0  42.3  21.6  12.0 
OTCLA+Adv  84.0  67.5  51.3  35.8  23.3  
STL10  Adv. Training  63.6  53.5  36.8  25.0  18.7 
OTCLA+Adv  60.7  52.1  40.3  30.6  24.5  
Data  Defense  0  0.004  0.01  0.016  0.02 
Tiny Imagenet  Adv. Training  57.3  48.6  26.5  15.1  12.0 
OTCLA+Adv  54.6  50.0  36.7  25.6  21.1 
Based on Figure 2 and Table 1, we can see that OTClassifier can improve the robustness of deep neural networks. Compare the performance of OTClassifier with the performance of model without defense method, we can see that OTClassifier is much more robust than the model with no defense method on all benchmark datasets. Besides, when the distortion level () is large, OTClassifier tends to perform better than other stateoftheart defense methods on MNIST, CIFAR10 and Tiny Imagenet. This phenomenon is obvious on CIFAR10 and it even performs better than OTCLA+Adv when the attack strength is strong.
In general, OTClassifier combined with adversarial training (OTCLA+Adv) is the most robust one on a variety of datasets. Though, on some datasets, when there is no attack, the testing accuracy of OTCLA+Adv are slightly worse than Madry’s adversarial training.
We also compare DefenseGAN with our method OTCLA+Adv on MNIST. Both methods are evaluated against the CW untargeted attack, one of the strongest whitebox attack proposed in [4]. DefenseGAN is evaluated using the method proposed in [2], and the code is available on github ^{2}^{2}2Publicly available at https://github.com/anishathalye/obfuscatedgradients/tree/master/defensegan. OTCLA+Adv is evaluated against CW untargeted attack with the same hyperparameter values as those used in the evaluation of DefenseGAN. The results under threshold are shown in Table 2.
Method  Testing Accuracy 

DefenseGAN  55.0 
OTCLA+Adv  99.1 
Based on Table 2, OTCLA+Adv is much more robust than DefenseGAN under the threshold.
4.2 Evaluate the Effect of Discriminator
OTClassifier framework consists of three parts, and the classification task is done by the encoder and classifier . Without the discriminator part, the encoder can also project the input images to a lowdimensional space. However, arbitrarily projecting the images to a lowdimensional space with only the encoder part can not improve the robustness of the model. In contrast, sometimes it even decreases the robustness of the model.
To show that arbitrarily projecting the input images to a lowdimensional space can not improve the robustness, we fit a framework with only the encoder and classifier part (ECLA), where the encoder and classifier have the same structures as in OTClassifier, and compare ECLA with the OTClassifier framework. The results are shown in Figure 3.
Based on Figure 3, we can observe that OTClassifier is much more robust than just the encoder and classifier structure on MNIST, CIFAR10 and Tiny Imagenet. It is also more robust on STL10 but not that much. The reason might be that there are only training images in STL10 and the resolution is . Therefore, it is harder to learn a good embedding with limited amount of images. However, even when the number of training images is limited, OTClassifier is still much more robust than the ECLA structure. This observation demonstrates that OTClassifier is able to learn a robust embedding. Notice that the performance of ECLA structure is similar to the performance of model without defense method on CIFAR10, STL10 and Tiny Imagenet, and worse on MNIST, which means the robustness of OTClassifier does not come from the structure design.
4.3 Dimension of Embedding Space
One important hyperparameter for the OTClassifier is the dimension of the embedding space. If the dimension is too small, important features are “collapsed” onto the same dimension, and if the dimension is too large, the projection will not extract useful information, which results in too much noise and instability. The maximum likelihood estimation of intrinsic dimension proposed in [14]^{3}^{3}3Code publicly available at https://github.com/OFAI/hubtoolboxpython3 is used to calculate the intrinsic dimension of each image dataset, serving as a guide for selecting the embedding dimension. The sample size used in calculating the intrinsic dimension is , and changing the sample size does not influence the results much. Based on the intrinsic dimension calculated by [14], we test several different values around the suggested intrinsic dimension and evaluate the models against PGD attack. The experimental results are shown in Figure 4.
The final embedding dimension is chosen based on robustness, number of parameters, and testing accuracy when there is no attack. The final embedding dimensions and suggested intrinsic dimensions are shown in Table 3.
Data  Data dim.  Intrinsic dim.  Embedding dim. 

MNIST  13  4  
CIFAR10  17  16  
STL10  20  16  
Tiny Imagenet  19  20 
Based on Figure 4, the embedding dimension close to the calculated intrinsic dimension usually offers better results except on MNIST. One explanation may be that MNIST is a simple handwritten digit dataset, so performing classification on MNIST may not require that many dimensions.
4.4 Embedding Visualization
In this section, we compare the embedding learned by Encoder+Classifier structure (ECLA) and the embedding learned by OTClassifier on several datasets. We first generate embedding of testing data using the encoder (), then project the embedding points () to 2D space by tSNE[16]. Then we generate adversarial images () against ECLA and OTClassifier using PGD attack. The adversarial embedding is generated by feeding the adversarial images into the encoder (). Finally, we project the adversarial embedding points () to 2D space. The results are shown in Figure 5. The plots in the first row are embedding visualization plots for ECLA, and the plots in the second row are the embedding visualization plots for OTClassifier. In adversarial embedding visualization plots, the misclassified point is marked as “down triangle”, which means the PGD attack successfully changed the prediction, and the correctly classified point is marked as “point”, which means the attack fails.
Based on Figure 5, we can see that ECLA can learn a good embedding on legitimate images of MNIST. Embedding points for different classes are separated on the 2D space, but under adversarial attack, some embedding points of different classes are mixed together. However, OTClassifier can generate good separated embeddings on both legitimate and adversarial images. On CIFAR10, the ECLA can not generate good separated embeddings on either legitimate images or adversarial images, while OTClassifier can generate good separated embeddings for both.
5 Conclusion
In this paper, we propose a new defense framework, OTClassifier, which projects the input images to a lowdimensional space to remove adversarial perturbation and stabilize the model through minimizing the discrepancy between the true label distribution and the framework output distribution. We empirically show that OTCLA+Adv is much more robust than other stateoftheart defense methods on several benchmark datasets. Future work will include further exploration of the lowdimensional space to improve the robustness of deep neural network.
6 Appendix
6.1 Proof of Theorem 1
The proof of Theorem 1 is adapted from the proof of Theorem 1 in [27]. Consider certain sets of joint probability distributions of three random variables . can be taken as the input images, as the output of the framework, and as the latent codes. represents a joint distribution of a variable pair , where is first sampled from and then from . defined in (1) is the marginal distribution of when .
The joint distributions or couplings between values of and can be written as due to the marginal constraint. can be decomposed into an encoding distribution and the generating distribution , and Theorem 1 mainly shows how to factor it through .
In the first part, we will show that if are Dirac measures, we have
(3) 
where denotes the set of all joint distributions of with marginals , and likewise for . The set of all joint distributions of such that , , and are denoted by . and denote the sets of marginals on and induced by .
From the definition, it is clear that . Therefore, we have
(4) 
The identity is satisfied if are Dirac measures, such as . This is proved by the following Lemma in [27].
Lemma 1
with identity if are Dirac for all . (see details in [27].)
In the following part, we show that
(5) 
Based on the definition, , and depend on the choice of conditional distributions , but does not. It is also easy to check that . The tower rule of expectation, and the conditional independence property of implies
(6) 
References
 [1] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
 [2] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.

[3]
N. Carlini and D. Wagner.
Adversarial examples are not easily detected: Bypassing ten detection
methods.
In
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
, AISec ’17, pages 3–14, New York, NY, USA, 2017. ACM.  [4] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 39–57. IEEE, 2017.
 [5] P.Y. Chen, Y. Sharma, H. Zhang, J. Yi, and C.J. Hsieh. Ead: elasticnet attacks to deep neural networks via adversarial examples. In AAAI, 2018.
 [6] P.Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.J. Hsieh. Zoo: Zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26. ACM, 2017.
 [7] A. Coates, A. Ng, and H. Lee. An analysis of singlelayer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215–223, 2011.

[8]
J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. FeiFei.
Imagenet: A largescale hierarchical image database.
In
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on
, pages 248–255. Ieee, 2009.  [9] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 [10] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples (2014). arXiv preprint arXiv:1412.6572.
 [11] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
 [12] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.

[13]
Y. LeCun.
The mnist database of handwritten digits.
http://yann. lecun. com/exdb/mnist/, 1998.  [14] E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsic dimension. In Advances in neural information processing systems, pages 777–784, 2005.
 [15] X. Liu, M. Cheng, H. Zhang, and C.J. Hsieh. Towards robust neural networks via random selfensemble. arXiv preprint arXiv:1712.00673, 2017.
 [16] L. v. d. Maaten and G. Hinton. Visualizing data using tsne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
 [17] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
 [18] D. Meng and H. Chen. Magnet: a twopronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 135–147. ACM, 2017.
 [19] S.M. MoosaviDezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. arXiv preprint, 2017.
 [20] S.M. MoosaviDezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016.
 [21] T. Na, J. H. Ko, and S. Mukhopadhyay. Cascade adversarial machine learning regularized with a unified embedding. arXiv preprint arXiv:1708.02582, 2017.
 [22] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.
 [23] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 582–597. IEEE, 2016.
 [24] P. Samangouei, M. Kabkab, and R. Chellappa. Defensegan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018.
 [25] C.J. SimonGabriel, Y. Ollivier, B. Schölkopf, L. Bottou, and D. LopezPaz. Adversarial vulnerability of neural networks increases with input dimension. arXiv preprint arXiv:1802.01421, 2018.
 [26] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 [27] I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf. Wasserstein autoencoders. arXiv preprint arXiv:1711.01558, 2017.
 [28] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991, 2017.
Comments
There are no comments yet.