1 Introduction
Deep neural networks (DNNs) have achieved remarkable performance on many visual
(Sutskever et al., 2012; He et al., 2016) and speech (Hinton et al., 2012) recognition tasks, but recent studies have shown that stateoftheart DNNs are surprisingly vulnerable to adversarial perturbations, small imperceptible input transformations that are designed to switch the prediction of the classifier (Szegedy et al., 2014; Goodfellow et al., 2015). This has led to a vigorous arms race between heuristic defenses
(Papernot et al., 2016; Madry et al., 2018; Chakraborty et al., 2018; Wang et al., 2019) that propose ways to defend against existing attacks and newlydevised attacks (Carlini and Wagner, 2017; Athalye et al., 2018; Tramer et al., 2020) that are able to penetrate such defenses. Reliable defenses appear to be elusive, despite progress on provable defenses, including formal verification (Katz et al., 2017; Tjeng et al., 2019) and relaxationbased certification methods (Sinha et al., 2018; Raghunathan et al., 2018; Wong and Kolter, 2018; Gowal et al., 2019; Wang et al., 2018). Even the strongest of these defenses leave large opportunities for adversaries to find adversarial examples, while suffering from high computation costs and scalability issues.Witnessing the difficulties of constructing robust classifiers, a line of recent works (Gilmer et al., 2018; Fawzi et al., 2018; Mahloujifar et al., 2019a; Shafahi et al., 2019) aims to understand the limitations of robust learning by providing theoretical bounds on adversarial robustness for arbitrary classifiers. By imposing different assumptions on the underlying data distributions and allowable perturbations, all of these theoretical works show that no adversarially robust classifiers exist for an assumed metric probability space, as long as the perturbation strength is sublinear in the typical norm of the inputs. Although such impossibility results seem disheartening to the goal of building robust classifiers, it remains unknown to what extent real image distributions satisfy the assumptions needed to obtain these results.
In this paper, we aim to bridge the gap between the theoretical robustness analyses on wellbehaved data distributions and the maximum achievable adversarial robustness, which we call intrinsic robustness (formally defined by Definition 3), for typical image distributions. More specifically, we assume the underlying data lie on a separable lowdimensional manifold, which can be captured using a conditional generative model, then systematically study the intrinsic robustness based on the conditional generating process from both theoretical and experimental perspectives. Our main contributions are:

Building upon a trained conditional generative model that mimics the underlying data generating process, we empirically evaluate the intrinsic robustness on image distributions based on MNIST and ImageNet (Section
5.2). Our estimates of intrinsic robustness demonstrate that there is still a large gap between the limits implied by our theory and the stateoftheart robustness achieved by robust training methods (Section
5.3). 
We theoretically characterize the fundamental relationship between the indistribution adversarial risk (which restricts adversarial examples to lie on the image manifold, and is formally defined by Definition 3) and the intrinsic robustness (Remark 4), and propose an optimization method to search for indistribution adversarial examples with respect to a given classifier. Our estimated indistribution robustness for stateoftheart adversarially trained classifiers, together with the derived intrinsic robustness bound, provide a better understanding on the intrinsic robustness for natural image distributions (Section 5.4).
Notation. We use lower boldfaced letters such as
to denote vectors, and
to denote the index set . For any and , denote by the ball around with radius in some distance metric . When the metric is free of context, we simply write . We use to denote thedimensional standard Gaussian distribution, and let
be its probability measure. For the one dimensional case, we useto denote the cumulative distribution function (CDF) of
, and use to denote its inverse function. For any function and probability measure defined over , denotes the pushforward measure of . The norm of a vector is defined as .2 Related Work
Several recent works (Gilmer et al., 2018; Mahloujifar et al., 2019a; Shafahi et al., 2019; Dohmatob, 2019; Bhagoji et al., 2019)
derived theoretical bounds on maximum achievable adversarial robustness using isoperimetric inequality under different assumptions of the input space. For instance, based on the assumption that the input data are uniformly distributed over two concentric
spheres (Gilmer et al., 2018) or the underlying metric probability space satisfies a concentrated property (Mahloujifar et al., 2019a), any classifier with constant test error was proven to be vulnerable to adversarial perturbations sublinear to the input dimension. Shafahi et al. (2019) showed that adversarial examples are inevitable, provided the maximum density of the underlying input distribution is small relative to uniform density. However, none of the above theoretical works provide any experiments to justify the imposed assumptions hold for real datasets, thus it is unclear whether the derived theoretical bounds are meaningful for typical image distributions. Our work belongs to this line of research, but encompasses the practical goal of understanding the robustness limits for real image distributions.The most related literature to ours is Fawzi et al. (2018), which proved a classifierindependent upper bound on intrinsic robustness, provided the underlying distribution is well captured by a smoothed generative model with Gaussian latent space and small Lipschitz parameter. However, their proposed theory cannot be applied to image distributions that lie on a lowdimensional, nonsmooth manifold, as their framework requires examples from different classes to be close enough in the latent space. In contrast, our proposed theoretical bounds on intrinsic robustness are more general in that they can be applied to nonsmoothed data manifolds, such as image distributions generated by conditional models. In addition, we propose an empirical method to estimate the intrinsic robustness on the generated image distributions under worstcase perturbations.
Mahloujifar et al. (2019b) proposed to understand the inherent limitations of robust learning using heuristic methods to measure the concentration of measure based on a given set of i.i.d. samples. However, it is unclear to what extent the estimated samplebased concentration approximates the actual intrinsic robustness with respect to the underlying data distribution. In comparison, we assume the underlying data distribution can be captured by a conditional generative model and directly study the robustness limit on the generated data distribution.
3 Preliminaries
We focus on the task of image classification. Let be a metric probability space, where denotes the input space,
is a probability distribution over
and is some distance metric defined on . Suppose there exists a groundtruth function, , that gives a label to any image , where denotes the set of all possible class labels. The objective of classification is to learn a function that approximates well. In the context of adversarial examples, is typically evaluated based on risk, which captures the classification accuracy of on normal examples, and adversarial risk, which captures the classifier’s robustness against adversarial perturbations:Let be a metric probability space and be the groundtruth classifier. For any classifier , the risk of is defined as:
The adversarial risk of against perturbations with strength in metric is defined as:
Other definitions of adversarial risk also exist in literature, such as the definition used in Madry et al. (2018) and the one proposed in Fawzi et al. (2018). However, these definitions are equivalent to each other under the assumption that small perturbations do not change the groundtruth labels. Another closelyrelated definition for adversarial robustness is the expected distance to the nearest error (see Diochnos et al. (2018) for the relation between these definitions). Our results can be applied to this definition as well.
Under different assumptions of the input metric probability space, previous works proved modelindependent bounds on adversarial robustness. Intrinsic robustness, defined originally by Mahloujifar et al. (2019b), captures the maximum adversarial robustness that can be achieved for a given robust learning problem:
Using the same settings as in Definition 3 and let be some class of classifiers. The intrinsic robustness with respect to is defined as:
In this work, we consider the class of imperfect classifiers that have risk at least some .
Motivated by the great success of producing naturallooking images using conditional generative adversarial nets (GANs) (Mirza and Osindero, 2014; Odena et al., 2017; Brock et al., 2019), we assume the underlying data distribution can be modeled by some conditional generative model. A generative model can be seen as a function that maps some latent distribution, usually assumed to be multivariate Gaussian, to some generated distribution over .
Conditional generative models incorporate the additional class information into the data generating process. A conditional generative model can be considered as a set of generative models , where images from the th class can be generated by transforming latent Gaussian vectors through . More rigorously, we say a probability distribution can be generated by a conditional generative model , if , where is the total number of different class labels, and represents the probability of sampling an image from class .
Based on the conditional model, we introduce the definition of indistribution adversarial risk:
Consider the same settings as in Definition 3. Suppose can be captured by a conditional generative model . For any given classifier , the indistribution adversarial risk of against perturbations is defined as:
Given the fact that the indistribution adversarial risk restricts the adversarial examples to be on the image manifold, it holds that, for any classifier , . As will be shown in the next section, such a notion of indistribution adversarial risk is closely related to the intrinsic robustness for the considered class of imperfect classifiers.
4 Main Theoretical Results
In this section, we present our main theoretical results on intrinsic robustness, provided the underlying distribution can be modeled by some conditional generative model (our results and proof techniques could also be easily applied to unconditional generative models). Based on the underlying generative process, the following local Lipschitz condition connects perturbations in the image space to the latent space.
Condition .
Let be a generative model that maps the latent Gaussian distribution to some generated distribution. Consider Euclidean distance as the distance metric for , and as the metric for . Given , is said to be locally Lipschitz with probability at least , if it satisfies
As the main tool for bounding the intrinsic robustness, we present the Gaussian Isoperimetric inequality for the sake of completeness. This inequality, proved by Borell (1975) and Sudakov and Tsirelson (1978), bounds the minimum expansion of any subset with respect to the standard Gaussian measure.
[Gaussian Isoperimetric Inequality] Consider metric probability space , where is the probability measure for dimensional standard Gaussian distribution , and denotes the Euclidean distance. For any subset and , let be the expansion of , then it holds that
(1) 
where is the CDF of , and denotes its inverse.
In particular, when belongs to the set of halfspaces, the equality is achieved in (1).
Making use of the Gaussian Isoperimetric Inequality and the local Lipschitz condition of the conditional generator, the following theorem proves a lower bound on the (indistribution) adversarial risk for any given classifier, provided the underlying distribution can be captured by a conditional generative model.
Let be a metric probability space and be the underlying groundtruth. Suppose can be generated by a conditional generative model . Given , suppose there exist constants and such that for any , satisfies local Lipschitz property with probability at least and . Then for any classifier , it holds that
where is the pushforward measure of though , for any .
We provide a proof in Appendix A.1. Theorem 4 suggests the (indistribution) adversarial risk is related to the risk on each data manifold and the ratio between the perturbation strength and the Lipschitz constant.
The following theorem, proved in Appendix A.2, gives a theoretical upper bound on the intrinsic robustness with respect to the class of imperfect classifiers.
Under the same setting as in Theorem 4, let . Consider the class of imperfect classifiers with , then the intrinsic robustness with respect to can be bounded as,
provided that for any . In addition, if we consider the family of classifiers that have conditional risk at least for each class, namely , then the intrinsic robustness with respect to can be bounded by
Theorem 4 shows that if the data distribution can be captured by a conditional generative model, the intrinsic robustness bound with respect to imperfect classifiers will largely depend on the ratio . For instance, if we assume the ratio , then Theorem 4 suggests that no classifier with initial risk at least can achieve robust accuracy exceeding for the assumed data generating process. In addition, if we assume the local Lipschitz parameter
is some constant, then adversarial robustness is indeed not achievable for highdimensional data distributions, provided the perturbation strength
is sublinear to the input dimension, which is the typical setting considered.The intrinsic robustness is closely related to the indistribution adversarial risk. For the class of classifiers , one can prove that the intrinsic robustness is equivalent to the maximum achievable indistribution adversarial robustness:
(2) 
Trivially, holds for any . For a given , one can construct an such that if and otherwise, where denotes the error region of and is the considered image manifold. The construction immediately suggests , which implies,
Combining both directions proves the soundness of (2). This equivalence suggests the indistribution adversarial robustness of any classifier in can be viewed as a lower bound on the actual intrinsic robustness, which motivates us to study the intrinsic robustness by estimating the indistribution adversarial robustness of trained robust models in our experiments.
5 Experiments
This section provides our empirical evaluations of the intrinsic robustness on real image distributions to evaluate the tightness of our bound. We test our bound on two image distributions generated using MNIST (LeCun et al., 1998) and ImageNet (Deng et al., 2009) datasets.
5.1 Conditional GAN Models
Instead of directly evaluating the robustness on real datasets, we make use of conditional GAN models to generate datasets from the learned data distributions and evaluate the robustness of several stateoftheart robust models trained on the generated dataset for a fair comparison with the theoretical robustness limits. Note that this approach is only feasible with conditional generative models as unconditional models cannot provide the corresponding labels for the generated data samples. For MNIST, we adopt ACGAN (Odena et al., 2017) which features an additional auxiliary classifier for better conditional image generation. The ACGAN model generates images from a dimension latent space concatenated with an addition
dimension onehot encoding of the conditional class labels. For ImageNet, we adopt the BigGAN model
(Brock et al., 2019) which is the stateoftheart GAN model in conditional image generation. It generates images from a dimension latent space. We downsampled the generated images to for efficiency propose. We consider a standard Gaussian^{1}^{1}1The original BigGAN model uses truncated Gaussian. We adapted it to standard Gaussian distribution. as the latent distribution for both conditional generative models. Figure 1 shows examples of the generated MNIST and ImageNet images. For both figures, each column of images corresponds to a particular label class of the considered dataset.5.2 Local Lipschitz Constant Estimation
From Theorem 4, we observe that given a class of classifiers with risk at least , the derived intrinsic robustness upper bound is mainly decided by the perturbation strength and the local Lipschitz constant . While is usually predesignated in common robustness evaluation settings, the local Lipschitz constant is unknown for most real world tasks. Computing an exact Lipschitz constant of a deep neural network is a difficult open problem. Thus, instead of obtaining the exact value, we approximate using a samplebased approach with respect to the generative models.
Recalling Definition 4, we consider as the distance and and are easy to compute via the generator network. Computing , however, is much more complicated as it requires obtaining a maximum value within a radius ball. To deal with this, our approach approximates by sampling points in the neighborhood around and takes the maximum value as the estimation of the true maximum value within the ball. Since the definition of local Lipschitz is probabilistic, we take multiple samples of the latent vectors to estimate the local Lipschitz constant . The estimation procedure is summarized in Algorithm 1
, which gives an underestimate of the underlying truth. Developing better Lipschitz estimation methods is an active area in machine learning research, but is not the main focus of this work.
Tables 1 and 2 summarize the local Lipschitz constants estimated for the trained ACGAN and BigGAN generators conditioned on each class. In particular, we report both the mean estimates averaged over
repeated trials and the standard deviations. For both conditional generators, we set
, , and in Algorithm 1 for Lipschitz estimation. For BigGAN, the specifically selected classes from ImageNet are reported in Table 2.Compared with unconditional generative models, conditional ones generate each class using a separate generator. Thus, the local Lipschitz constant of each classconditioned generator is expected to be smaller than that of unconditional ones, as the withinclass variation is usually much smaller than the betweenclass variation for a given classification dataset. For instance, we trained an unconditional GAN generator (Goodfellow et al., 2014) on MNIST dataset, which yields an overall local Lipschitz constant of from Algorithm 1 under the same parameter settings. If we plug in this estimated Lipschitz constant into the theoretical results in Fawzi et al. (2018), the implied intrinsic robustness bound is in fact vacuous (above ) with perturbations strength in distance.
Class  digit 0  digit 1  digit 2  digit 3  digit 4 
Lipschitz  
Class  digit 5  digit 6  digit 7  digit 8  digit 9 
Lipschitz 
Class  airliner  jeep  goldfinch  tabby cat  hartebeest 
Lipschitz  
Class  Maltese dog  bullfrog  sorrel  pirate ship  pickup 
Lipschitz 
5.3 Comparisons with Robust Classifiers
We compare our derived intrinsic robustness upper bound with the empirical adversarial robustness achieved by the current stateoftheart defense methods under perturbations. Specifically, we consider three robust training methods: LPCertify: optimizationbased certified robust defense (Wong et al., 2018); AdvTrain: PGD attack based adversarial training (Madry et al., 2018); and TRADES: adversarial training by accuracy and robustness tradeoff (Zhang et al., 2019). We adopt these robust training methods to train robust classifiers over a set of generated training images and evaluate their robustness on the corresponding generated test set.
For MNIST, we use our trained ACGAN model to generate classes of handwritten digits with training images and testing images. For ImageNet, we use the BigGAN model to generate selected classes of images, which contains images for training set and images for test set. We refer to the class BigGAN generated dataset as ‘ImageNet10’. We set for training robust models using AdvTrain and TRADES for both generated datasets, whereas we only train the LPbased certified robust classifier with on generated MNIST data, as it is not able to scale with ImageNet10 as well as generated MNIST with larger (see Appendix B.1 for all the selected hyperparameters and network architectures).
A commonlyused method to evaluate the robustness of a given model is by performing carefullydesigned adversarial attacks. Here we adopt the PGD attack (Madry et al., 2018), and report the robust accuracy (classification accuracy on inputs generated using the PGD attack) as the empirically measured model robustness. We test both the natural classification accuracy and the robustness of the aforementioned adversarially trained classifiers under perturbations with perturbation strength selected from . See Appendix B.1 for PGD parameter settings.
Dataset  Method  Natural Accuracy  Adversarial Robustness  
Generated MNIST  LPCertify  
AdvTrain  
TRADES  
Our Bound    
ImageNet10  AdvTrain  
TRADES  
Our Bound   
Table 3 compares the empirically measured robustness of the trained robust classifiers and the derived theoretical upper bound on intrinsic robustness. For empirically measured adversarial robustness, we report both the mean and the standard deviation with respect to repeated trials. For computing our theoretical robust bounds, we plug the estimated local Lipschitz constants into Theorem 4 with risk threshold for generated MNIST and for ImageNet10, to reflect the best natural accuracy achieved by the considered robust classifiers.
Under most settings, there exists a large gap between the robust limit implied by our theory and the best adversarial robustness achieved by stateoftheart robust classifiers. For instance, AdvTrain and TRADES only achieve less than robust accuracy on the generated ImageNet10 data with , whereas the estimated robustness bound is as high as . The gap becomes even larger when we increase the perturbation strength . In contrast to the previous theoretical results on artificial distributions, for these image classification problems we cannot simply conclude from the intrinsic robustness bound that adversarial examples are inevitable. This huge gap between the empirical robustness of the best current image classifiers and the estimated theoretical bound suggests that either there is a way to train better robust models or that there exist other explanations for the inherent limitations of robust learning against adversarial examples.
5.4 Indistribution Adversarial Robustness
In Section 5.3, we empirically show the unconstrained robustness of existing robust classifiers is far below the intrinsic robustness upper bound implied by our theory for real distributions. However, it is not clear whether the reason is that current robust training methods are far from perfect, or that our derived upper bound is not tight enough due to the Lipschitz relaxation step used for proving such bound. In this section, we empirically study the indistribution adversarial risk for a better characterization of the actual intrinsic robustness. As shown in Remark 4, the indistribution adversarial robustness of any classifier with risk at least can be regarded as a lower bound for the intrinsic robustness . This provides us a more accurate characterization of the intrinsic robustness bound and enables better understanding of intrinsic robustness.
While there are many types of attack algorithms in the literature that can be used to evaluate the unconstrained robustness of a given classifier in the image space, little has been done in terms of how to evaluate the indistribution robustness. In order to empirically evaluate the indistribution robustness, we straightforwardly formulate the following optimization problem to find adversarial examples on the image manifold:
(3) 
where , is the data sample in the image space to be attacked, is the given classifier, and
denotes the adversarial loss function. The goal of (
3) is to optimize the latent vector to lower the adversarial loss (make the robust classifier misclassify some generated images) while keeping the distance between the generated image and the test image within perturbation limit. The key difficulty in solving (3) lies in the fact that we cannot perform any type of projection operations as we are optimizing over but the constraints are imposed on the generated image space . This prohibits the use of common attack algorithms such as PGD. In order to solve (3), we transform (3) into the following Lagrangian formulation:(4) 
This formulation ignores the perturbation constraint of and tries to find the indistribution adversarial examples with the smallest possible perturbation. In order to evaluate the intrinsic robustness under a given perturbation budget, we need to further check all indistribution adversarial examples found and only count those with perturbations within the constraint. Note that even though (4) provides us a feasible way to compute the indistribution robustness of a classifier, equation (4) itself could be hard to solve in general. First, it is not obvious how to initialize . Random initialization of could lead to bad local optima which prevent the optimizer from efficiently solving (4) or even finding a that could make close enough to . Second, the hyperparameter could be quite sensitive to different test examples. Failing to choose a proper could also lead to failures in finding indistribution adversarial examples within constraint. In order to the tackle the aforementioned challenges, we propose to solve another optimization problem for the initialization of and adopt binary search for the best choice of (see Appendix B.2 for more details of our implementation).
Figure 2 summarizes results from our empirical evaluations on intrinsic robustness of the generated MNIST and ImageNet10 data. We evaluate the empirical robustness of three types of robust training methods at different time points during the training procedure. To be more specific, we evaluate the robustness of the intermediate models produced every
training epochs. For each method, we plot both the unconstrained robustness measured by PGD attacks and the indistribution robustness measured using the aforementioned strategies. In addition, based on the local Lipschitz constants estimated in Section
5.2, we plot the implied theoretical bound on intrinsic robustness as the dotted line curve for direct comparison.Compared with the intrinsic robustness upper bound (dotted curve line), the unconstrained robustness of various robustlytrained models is much smaller, and the gap between them becomes more obvious as we increase . This aligns with our observations in Section 5.3. However under all the considered settings, the estimated indistribution adversarial robustness is much higher than the unconstrained one and closer to the theoretical upper bound, especially for the ImageNet10 data. Note that according to Remark 4, the actual intrinsic robustness should lie between the indistribution robustness of any given classifier with risk at least and the derived intrinsic robustness upper bound. Observing the big gap between the estimated indistribution and unconstrained robustness of various robustly trained models, one would expect the current stateoftheart robust models are still far from approaching the actual intrinsic robustness limit for real image distributions.
6 Conclusions
We studied the intrinsic robustness of typical image distributions using conditional generative models. By deriving theoretical upper bounds on intrinsic robustness and providing empirical estimates on the generated image distributions, we observed a large gap between the theoretical intrinsic robust limit and the best robustness achieved by stateoftheart robust classifiers. Our results imply that the inevitability of adversarial examples claimed in recent theoretical studies, such as Fawzi et al. (2018), do not apply to real image distributions, and suggest that there is a need for deeper understanding on the intrinsic robustness limitations for real data distributions.
Appendix A Proof of Main Theorem
a.1 Proof of Theorem 4
Proof.
Let be the error region in the image space and be the expansion of in metric . By Definition 3, we have
Since according to Definition 3, we have for any . Thus, it remains to lower bound each term individually. For any classifier , we have
(5) 
where the first inequality is due to , and the second inequality holds because is locally Lipschitz with probability at least and for any .
a.2 Proof of Theorem 4
Proof.
According to Definition 3 and Theorem 4, for any , we have
(7) 
where the last inequality holds because is monotonically increasing. For any , let be the error region and be the measure of under the th conditional distribution.
Thus, to obtain an upper bound on using (A.2), it remains to solve the following optimization problem:
(8) 
Note that for classifier in , by definition, we can simply replace in (8), which proves the upper bound on .
Next, we are going to show that the optimal value of (8) is achieved, only if there exists a class such that and for any . Consider the simplest case where . Note that and are both monotonically increasing functions, which implies that holds when optimum achieved, thus the optimization problem for can be formulated as follows
(9) 
Suppose holds for the initial setting. Now consider another setting where , . Let and . According to the equality constraint of the optimization problem (9), we have
(10) 
Let for simplicity. By simple algebra, we have
where the first inequality holds because for any , the second inequality follows from (10) and the fact that , and the last inequality holds because for any . Therefore, the optimal value of (9) will be achieved when or . For general setting with , since are independent in the objective, we can fix and optimize and first, then deal with incrementally using the same technique. ∎
Appendix B Experimental Details
This section provides additional details for our experiments.
b.1 Network Architectures and Hyperparameter Settings
For the certified robust defense (LPCertify), we adopt the the same fourlayer neural network architecture as implemented in Wong et al. (2018), with two convolutional layers and two fully connected layers, and use the an Adam optimizer with learning rate and batch size for training the robust classifier. In particular, the adversarial loss function is based on the robust certificate under proposed in Wong et al. (2018).
For training attackbased robust models (AdvTrain and TRADES), we use a sevenlayer CNN architecture which contains four convolution layers and three fully connected layers. We use a SGD optimizer to minimize the attackbased adversarial loss with learning rate on MNIST and learning rate on ImageNet10. Table 4 summarizes all the hyperparameters we used for training the robust models ( is an additional parameter specifically used in TRADES).
For evaluating the unconstrained adversarial robustness, we implemented PGD attack with metric. Table 5 shows all the hyperparameters we used for robustness evaluation.
Para.  Generated MNIST  ImageNet10  
LPCertified  Adv Training  TRADES  Adv Training  TRADES  
(in )  
optimizer  ADAM  SGD  SGD  SGD  SGD 
learning rate  
#epochs  
attack step size    
#attack steps    
     
Para.  Generated MNIST  ImageNet10  
attack step size  
#attack steps 
b.2 Strategies for Estimating Indistribution Adversarial Robustness
Initialization of : For MNIST data, we design an initialization strategy for in order to make sure the perturbation term can be efficiently optimized. To be more specific, starting from random noise, we first solve another optimization problem:
By setting as our initial point, we minimize the initial perturbation distance. Here can start from any random initial point as we will then optimize the generated image under distance.
For ImageNet10 data, even applying the above optimization procedure doesn’t result in an initial such that when is small. Therefore, we use another strategy by recording the when generating the test sample , i.e., . And we adopt as the initial point for in solving (4). This makes sure that the whole optimization procedure could at least find one point satisfying the perturbation constraint^{2}^{2}2We didn’t use as the initialization for MNIST data as our empirical study shows that the optimizationbased initialization achieves better performances on MNIST..
The choice of : Inspired by Carlini and Wagner (2017), we also adopt binary search strategy for finding better regularization parameter . Specifically, we set initial and if we successfully find an adversarial example, we lower the value of via binary search. Otherwise, we raise the value of . For each batch of examples, we perform times binary search in order to find qualified indistribution adversarial examples.
Hyperparameters: We use Adam optimizer with learning rate for finding indistribution adversarial examples. We set maximum iterations for each binary search as .
Acknowledgements
This research was sponsored in part by the National Science Foundation SaTC1717950 and SaTC1804603, and additional support from Amazon, Baidu, and Intel. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies.
References
 Athalye et al. (2018) Athalye, A., Carlini, N., and Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning (ICML).
 Bhagoji et al. (2019) Bhagoji, A. N., Cullina, D., and Mittal, P. (2019). Lower bounds on adversarial robustness from optimal transport. In Advances in Neural Information Processing Systems (NeurIPS).
 Borell (1975) Borell, C. (1975). The BrunnMinkowski inequality in Gauss space. Inventiones mathematicae, 30(2):207–216.
 Brock et al. (2019) Brock, A., Donahue, J., and Simonyan, K. (2019). Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations (ICLR).
 Carlini and Wagner (2017) Carlini, N. and Wagner, D. (2017). Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy.
 Chakraborty et al. (2018) Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A., and Mukhopadhyay, D. (2018). Adversarial attacks and defences: A survey. arXiv preprint arXiv:1810.00069.

Deng et al. (2009)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and FeiFei, L. (2009).
ImageNet: A LargeScale Hierarchical Image Database.
In
Conference on Computer Vision and Pattern Recognition (CVPR)
.  Diochnos et al. (2018) Diochnos, D., Mahloujifar, S., and Mahmoody, M. (2018). Adversarial risk and robustness: General definitions and implications for the uniform distribution. In Advances in Neural Information Processing Systems (NeurIPS).
 Dohmatob (2019) Dohmatob, E. (2019). Generalized no free lunch theorem for adversarial robustness. In International Conference on Machine Learning (ICML).
 Fawzi et al. (2018) Fawzi, A., Fawzi, H., and Fawzi, O. (2018). Adversarial vulnerability for any classifier. In Advances in Neural Information Processing Systems (NeurIPS).
 Gilmer et al. (2018) Gilmer, J., Metz, L., Faghri, F., Schoenholz, S. S., Raghu, M., Wattenberg, M., and Goodfellow, I. (2018). Adversarial spheres. arXiv preprint arXiv:1801.02774.
 Goodfellow et al. (2014) Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS).
 Goodfellow et al. (2015) Goodfellow, I., Shlens, J., and Szegedy, C. (2015). Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR).
 Gowal et al. (2019) Gowal, S., Dvijotham, K., Stanforth, R., Bunel, R., Qin, C., Uesato, J., Mann, T., and Kohli, P. (2019). Scalable verified training for provably robust image classification. In International Conference on Computer Vision (ICCV).
 He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR).
 Hinton et al. (2012) Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Kingsbury, B., et al. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine, 29.
 Katz et al. (2017) Katz, G., Barrett, C., Dill, D. L., Julian, K., and Kochenderfer, M. J. (2017). Reluplex: An efficient SMT solver for verifying deep neural networks. In International Conference on Computer Aided Verification.
 LeCun et al. (1998) LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. (1998). Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.

Madry et al. (2018)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (2018).
Towards deep learning models resistant to adversarial attacks.
In International Conference on Learning Representations (ICLR). 
Mahloujifar et al. (2019a)
Mahloujifar, S., Diochnos, D. I., and Mahmoody, M. (2019a).
The curse of concentration in robust learning: Evasion and poisoning
attacks from concentration of measure.
In
AAAI Conference on Artificial Intelligence
.  Mahloujifar et al. (2019b) Mahloujifar, S., Zhang, X., Mahmoody, M., and Evans, D. (2019b). Empirically measuring concentration: Fundamental limits on intrinsic robustness. In Advances in Neural Information Processing Systems (NeurIPS).
 Mirza and Osindero (2014) Mirza, M. and Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.
 Odena et al. (2017) Odena, A., Olah, C., and Shlens, J. (2017). Conditional image synthesis with auxiliary classifier gans. In International Conference on Machine Learning (ICML).
 Papernot et al. (2016) Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami, A. (2016). Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy.
 Raghunathan et al. (2018) Raghunathan, A., Steinhardt, J., and Liang, P. (2018). Certified defenses against adversarial examples. In International Conference on Learning Representations (ICLR).
 Shafahi et al. (2019) Shafahi, A., Huang, W. R., Studer, C., Feizi, S., and Goldstein, T. (2019). Are adversarial examples inevitable? In International Conference on Learning Representations (ICLR).
 Sinha et al. (2018) Sinha, A., Namkoong, H., and Duchi, J. (2018). Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations (ICLR).
 Sudakov and Tsirelson (1978) Sudakov, V. N. and Tsirelson, B. S. (1978). Extremal properties of halfspaces for spherically invariant measures. Journal of Soviet Mathematics, 9(1):9–18.

Sutskever et al. (2012)
Sutskever, I., Hinton, G. E., and Krizhevsky, A. (2012).
ImageNet classification with deep convolutional neural networks.
Advances in Neural Information Processing Systems (NeurIPS).  Szegedy et al. (2014) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2014). Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR).
 Tjeng et al. (2019) Tjeng, V., Xiao, K. Y., and Tedrake, R. (2019). Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations (ICLR).
 Tramer et al. (2020) Tramer, F., Carlini, N., Brendel, W., and Madry, A. (2020). On adaptive attacks to adversarial example defenses. arXiv preprint arXiv:2002.08347.
 Wang et al. (2018) Wang, S., Chen, Y., Abdou, A., and Jana, S. (2018). MixTrain: Scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625.
 Wang et al. (2019) Wang, Y., Ma, X., Bailey, J., Yi, J., Zhou, B., and Gu, Q. (2019). On the convergence and robustness of adversarial training. In International Conference on Machine Learning (ICML).
 Wong and Kolter (2018) Wong, E. and Kolter, Z. (2018). Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning (ICML).
 Wong et al. (2018) Wong, E., Schmidt, F., Metzen, J. H., and Kolter, J. Z. (2018). Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems (NeurIPS).
 Zhang et al. (2019) Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., and Jordan, M. (2019). Theoretically principled tradeoff between robustness and accuracy. In International Conference on Machine Learning (ICML).
Comments
There are no comments yet.