# Limitations of adversarial robustness: strong No Free Lunch Theorem

This manuscript presents some new results on adversarial robustness in machine learning, a very important yet largely open problem. We show that if conditioned on a class label the data distribution satisfies the generalized Talagrand transportation-cost inequality (for example, this condition is satisfied if the conditional distribution has density which is log-concave), any classifier can be adversarially fooled with high probability once the perturbations are slightly greater than the natural noise level in the problem. We call this result The Strong "No Free Lunch" Theorem as some recent results (Tsipras et al. 2018, Fawzi et al. 2018, etc.) on the subject can be immediately recovered as very particular cases. Our theoretical bounds are demonstrated on both simulated and real data (MNIST). These bounds readily extend to distributional ro- bustness (with 0/1 loss). We conclude the manuscript with some speculation on possible future research directions.

## Authors

• 14 publications
• ### Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models

Starting with Gilmer et al. (2018), several works have demonstrated the ...
03/01/2020 ∙ by Xiao Zhang, et al. ∙ 4

• ### Two Problems about Monomial Bent Functions

In 2008, Langevin and Leander determined the dual function of three clas...
02/24/2021 ∙ by Honggang Hu, et al. ∙ 0

• ### Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

We develop a new approach to obtaining high probability regret bounds fo...
06/14/2020 ∙ by Chung-Wei Lee, et al. ∙ 0

• ### A Strong XOR Lemma for Randomized Query Complexity

We give a strong direct sum theorem for computing xor ∘ g. Specifically,...
07/10/2020 ∙ by Joshua Brody, et al. ∙ 0

• ### High Probability Lower Bounds for the Total Variation Distance

The statistics and machine learning communities have recently seen a gro...
05/12/2020 ∙ by Loris Michel, et al. ∙ 0

• ### On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Generalization error (also known as the out-of-sample error) measures ho...
02/02/2019 ∙ by Jian Li, et al. ∙ 0

• ### Learning Based Distributed Tracking

Inspired by the great success of machine learning in the past decade, pe...
06/23/2020 ∙ by Hao Wu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

An adversarial attack operates as follows:

• A classifier is trained and deployed (e.g the road traffic sign recognition system on a self-driving car).

• At test / inference time, an attacker may submit queries to the classifier by sampling a real data point with true label , and modifying it according to a prescribed threat model. For example, modifying a few pixels on a road traffic sign  Su et al. (2017), modifying intensity of pixels by a limited amount determined by a prescribed tolerance level  Tsipras et al. (2018), etc. , on it.

• The goal of the attacker is to fool the classifier into classifying as label different from .

• A robust classifier tries to limit this failure mode, at a prescribed tolerance .

### 1.1 Terminology

will denote the feature space and will be the set of class labels, where is the number of classes, with for binary classification.

will be the (unknown) joint probability distribution over

, of two prototypical random variables

and referred to the features and the target variable, which take values in and respectively. Random variables will be denoted by capital letters , , , etc., and realizations thereof will be denoted , , , etc. respectively.

For a given class label , will denote the set of all samples whose label is with positive probability under . It is the support of the restriction of onto the plane . This restriction is denoted or just , and defines the conditional distribution of the features given that the class label has the value . We will assume that all the ’s are finite-dimensional smooth Riemannian manifolds. This is the so-called manifold assumption, and is not unpopular in machine learning literature. A classifier is just a mapping , from features to class labels.

#### Threat models.

Let be a distance / metric on the input space and be a tolerance level. The threat model at tolerance is a scenario where the attacker is allowed to perturb any input point , with the constraint that . When is a manifold, the threat model considered will be that induced by the geodesic distance, and will be naturally referred to as the geodesic threat model.

#### Flat threat models.

In the special case of euclidean space , we will always consider the distances defined for by , where

 ∥a∥q:=⎧⎨⎩(∑pj=1|aj|q)1/q, if 1≤q<∞,max{|a1|,…,|ap|}, if q=∞. (1)

The / sup case where  Tsipras et al. (2018) is particularly important: the corresponding threat model allows the adversary to separately increase or decrease each feature by an amount at most . The sparse case is a convex proxy for so-called “few-pixel” attacks Su et al. (2017) wherein the total number of features that can be tampered-with by the adversary is limited.

#### Adversarial robustness accuracy and test error.

The adversarial robustness accuracy of at tolerance for a class label and w.r.t the threat model, denoted , is defined by

 accdX,ϵ(h|k):=PX|k(h(x′)=k∀x′∈BallX(X;ϵ)). (2)

This is simply the probability that a sample point with true class label can be perturbed by an amount measured by the distance , so that it get misclassified by . This is an adversarial version of the standard class-conditional accuracy corresponding to . The corresponding adversarial robustness error is then . This is the adversarial analogue of the standard notion of the class-conditional generalization / test error, corresponding to .

Similarly, one defines the unconditional adversarial accuracy

 accϵ(h)=P(X,Y)(h(x′)=Y∀x′∈BallX(X;ϵ)), (3)

which is an adversarial version of the standard accuracy . Finally, adversarial robustness radius of on class

 d(h|k):=EX|k[d(X,B(h,k))]. (4)

This is the average distance of sample point with true label , from the set of samples classified by as being of another label.

### 1.2 Highlight of main contributions

In this manuscript, we prove that under some “curvature conditions” (to be precised later) on the conditional density of the data, it holds that

• For geodesic / faithful attacks:

• Every classifier can be adversarially fooled with high probability by moving sample points an amount along the data manifold, where is the “natural noise level” in the data points with class label .

• Moreover, the average distance of a sample point of true label to the error set is upper-bounded:

 r(h|k)≤ϵ(h|k)+σk√π2=O(σk).
• For attacks in flat space :

• In particular, if the data points live in , where is the number of features), then every classifier can be adversarially fooled with high probability, by changing each feature by an amount , or more precisely, once

 ϵ≥ϵ∞(h|k):=σk√2log(1/err(h|k))/p=O(σk/√p).
• Moreover, we have the bound

 r(h|k)≤ϵ∞(h|k)+σk√p√π2=O(σk/√p).

We call this result The Strong “No Free Lunch” Theorem as some recent results (e.g  Tsipras et al. (2018), Fawzi et al. (2018a), Gilmer et al. (2018b)), etc.) on the subject can be immediately recovered as very particular cases. Thus adversarial (non-)robustness should really be thought of as a measure of complexity of a problem. A similar remark has been recently made in  Bubeck et al. (2018).

The sufficient “curvature conditions” alluded to above imply concentration of measure phenomena, which in turn imply our impossibility bounds. These conditions are satisfied in a large number of situations, including cases where the class-conditional data manifold is a compact homogeneous Riemannian manifold; the class-conditional data distribution is supported on a smooth manifold and has log-concave density w.r.t the curvature of the manifold; or the manifold is compact; is the pushforward via a Lipschitz continuous map, of another distribution which verifies these curvature conditions; etc.

###### Remark 1.

By the properties of expectation and conditioning, it holds that , where . Thus, bounds on the ’s imply bounds on .

### 1.3 High-level overview of the manuscript

In section 1.4, we start off by presenting a simple motivating classification problem from  Tsipras et al. (2018), which as shown by the authors, already exhibits the “No Free Lunch” issue. In section 2.1

we present some relevant notions from geometric probability theory which will be relevant for our work, especially Talagrand’s

transportation-cost inequality and also Marton’s blowup Lemma. Then in section 2.3, we present the main result of this manuscript, namely, that on a rich set of distributions no classifier can be robust even to modest perturbations (comparable to the natural noise level in the problem). This generalizes the results of  Tsipras et al. (2018), Gilmer et al. (2018b) and to some extent,  Fawzi et al. (2018a). Section 2.5 extends the results to distributional robustness, a much more difficult setting. All proofs are presented in Appendix A. An in-depth presentation of related works is given in section 3.

Section 4 presents experiments on both simulated and real data that confirm our theoretical results. Finally, section 5 concludes the manuscript with possible future research directions.

### 1.4 A toy example illustrating the fundamental issue

To motivate things, consider the following "toy" problem from  Tsipras et al. (2018), which consists of classifying a target based on explanatory variables given by

 X1|Y={+Y,w.p 70%,−Y, w.p 30%,

and , where is a fixed scalar which (as we wll see) controls the difficulty of the problem. Now, as was shown in  Tsipras et al. (2018), the above problem can be solved perfectly with generalization accuracy

, but the "champion" estimator can also be fooled, perfectly! Indeed, the linear estimator given by

with , where we allow -perturbations of maximum size , has the afore-mentioned properties. Indeed,

 acc(havg):=P(X,Y)(havg(X)=Y)=P(YwTX≥0)=PY((Y/(p−1))∑j≥2N(ηY,1)≥0)=P(N(η,1/(p−1))≥0)=P(N(0,1/(p−1))≥−η)=P(N(0,1/(p−1))≤η)≥1−e−(p−1)η2/2,

which is if . Likewise, for , the adversarial robustness accuracy of writes

 accϵ(h%avg):=P(X,Y)(Yhavg(X+Δx)≥0∀∥Δx∥∞≤ϵ)=P(X,Y)(inf∥Δx∥∞≤ϵYwT(X+Δx)≥0)=P(X,Y)(YwTX−ϵ∥Yw∥1≥0)=P(X,Y)(YwTX−ϵ≥0)=P(N(0,1/(p−1))≥ϵ−η)≤e−(p−1)(ϵ−η)2/2.

Thus for

By the way, we note that an optimal adversarial attack can be done by taking and for all .

#### An autopsy of what is going on.

Recall that the entropy of a univariate Gaussian is nats. Now, for , the distribution of feature is a Gaussian mixture and so one computes the mutual information between and the class label as

 MI(Xj;Y):=Ent(Xj)−Ent(Xj|Y)=Ent(12∑y=±1N(ηy,1))−12∑y=±1Ent(N(ηy,1))=ln(√2πe)+η2−r−2(1/2)ln(√2πe)=η2−r≤η2,

where (see  Michalowicz et al. (2008) for the details)

 r:=2√2πηe−η2/2∫∞0e−z22η2cosh(z)ln(cosh(z))dz≥0.

Thus . Since , we conclude that these features barely share any information with the target variable . Indeed,  Tsipras et al. (2018)

showed improved robustness on the above problem, with feature-selection based on mutual information.

#### Basic “No Free Lunch” Theorem.

Reading the information calculations above, a skeptic could point out that the underlying issue here is that the estimator over-exploits the fragile / non-robust variables to boost ordinary generalization accuracy, at the expense of adversarial robustness. However, it was rigorously shown in  Tsipras et al. (2018) that on this particular problem, every estimator is vulnerable. Precisely, the authors proved the following basic “No Free Lunch” theorem.

###### Theorem 1 (Basic No Free Lunch,  Tsipras et al. (2018)).

For the problem above, any estimator which has ordinary accuracy at least must have robust adversarial robustness accuracy at most against -perturbations of maximum size .

## 2 Strong “No Free Lunch” Theorem for adversarial robustness

### 2.1 Terminology and background

#### Blowups and sample point robustness radius.

The -blowup (aka -fattening, aka -enlargement) of a subset of a metric space , denoted , is defined by , where is the distance of from . Note that is an increasing function of both and ; that is, if and , then . In particular, and . Also observe that each can be rewritten in the form

 (5)

where the closed ball in with center and radius . Refer to Fig. 1.

In a bid to simplify notation, when there is no confusion about the underlying metric space, we will simply write for . When there is no confusion about the the underlying set but not the metric thereupon, we will write . For example, in the metric space , we will write instead of for the -blowup of .

An example which will be central to us is when is a classifier, is a class label, and we take to be the “bad set” of inputs which are classified which are assigned a label different from , i.e

 (6)

is then nothing but the event that there is data point with a “bad -neighbor”, i.e the example can be missclassified by applying a small perturbation of size . This interpretation of blowups will be central in the sequel, and we will be concerned with lower-bounding the probability of the event under the conditional measure . This is the proportion of points with true class label , such that assigns a label to some -neighbor of . Alternatively, one could study the local robustness radii , for , as was done in  Fawzi et al. (2018a), albeit for a very specific problem setting (generative models with Guassian noise). More on this in section 3. Indeed .

### 2.2 Measure concentration on metric spaces

For our main results, we will need some classical inequalities from optimal transport theory, mainly the Talagrand transportation-cost inequality and Marton’s Blowup inequality (see definitions below). Let be a probability distribution on a metric space and let .

###### Definition 1 (T2(c) property –a.k.a Talagrand W2 transportation-cost inequality).

is said to satisfy if for every other distribution on , which is absolutely continuous w.r.t (written ), one has

 W2(ν,μ)≤√2ckl(ν∥μ), (7)

where for , is the Wasserstein -distance between and defined by

 Ws(ν,μ):=(inflaw(X′)=ν,law(X)=μE[dX(X′,X)s])1/s. (8)

Note that if , then . The inequality (7) in the above definition is a generalization of the well-known Pinker’s inequality for the total variation distance between probability measures. Unlike Pinker’s inequality which holds unconditionally, (7) is a privilege only enjoyed by special classes of reference distributions

. These include: log-concave distributions on manifolds (e.g multi-variate Gaussian), distributions on compact

homogeneous manifolds (e.g hyper-spheres), pushforwards of distributions that satisfy some inequality, etc. In section 2.4, these classes of distributions will be discussed in detail as sufficient conditions for our impossibility theorems.

###### Definition 2 (Blowup(c) property).

is said to satisfy BLOWUP() if for every Borel with and for every , it holds that

 μ(Bϵ)≥1−e−12c(ϵ−√2clog(1/μ(B)))2. (9)

It is a classical result that the Gaussian distribution on has BLOWUP() and , a phenomenon known as Gaussian isoperimetry. This results date back to at least works of E. Borel, P. Lévy, M. Talagrand and of K. Marton  Boucheron et al. (2013).

The following lemma is the most important tool we will use to derive our bounds.

###### Lemma 1 (Marton’s Blowup lemma).

On a fixed metric space, it holds that .

###### Proof.

The proof is classical, and is a variation of original arguments by Marton. We provide it in Appendix A, for the sake of completeness. ∎

### 2.3 Strong “No Free Lunch” Theorem

It is now ripe to present the main results of this manuscript.

###### Theorem 2 (Strong “No Free Lunch” on curved space).

Suppose that for some , has the property on the conditional manifold . Given a classifier for which (i.e the classifier is not perfect on the class ), define

 ϵ(h|k):=σk√2log(1/err(h|k))=O(σk). (10)

Then for the geodesic threat model, we have

• Bound on adversarial robustness accuracy:

 accϵ(h|k)≤min(acc(h|k),e−12σ2k(ϵ−ϵ(h|k))2+). (11)

Furthermore, if , then

 accϵ(h|k)≤min(acc(h|k),err(h|k)). (12)
• Bound on average distance to error set:

 d(h|k)≤σk(√log(1/err(h|k))+√π2). (13)
###### Proof.

The main idea is to invoke Lemma 1, and then apply the bound (9) with , , and . See Appendix A for details. ∎

In the particular case of attacks happening in euclidean space (this is the default setting in the literature), the above theorem has the following corollary.

###### Corollary 1 (Strong “No Free Lunch” Theorem on flat space).

Let . If in addition to the assumptions of Theorem 2 the conditional data manifold is flat, i.e , then for the threat model

• Bound on adversarial robustness accuracy:

 accϵ(h|k)≤min(acc(h|k),e−p1−2/q2σ2k(ϵ−ϵq(h|k))2+), (14)

where Furthermore, if , then

 accϵ(h|k)≤min(acc(h|k),err(h|k)). (15)
• Bound on average distance to error set:

 d(h|k)≤σkp1/2−1/q(√log(1/err(h|k))+√π2). (16)

In particular, for the threat model, we have

• Bound on adversarial robustness accuracy:

 accϵ(h|k)≤min(acc(h|k),e−p2σ2k(ϵ−ϵ(h|k)/√p)2+). (17)

Furthermore, if , then

 accϵ(h|k)≤min(acc(h|k),err(h|k)). (18)
• Bound on average distance to error set:

 d(h|k)≤σk√p(√log(1/err(h|k))+√π2). (19)
###### Proof.

See Appendix A. ∎

#### Making sense of the theorems.

Fig. 2 gives an instructive illustration of bounds in the above theorems. For perfect classifiers, the test error is zero and so the factor appearing in definitions for and is ; else this classifier-specific factor grows only very slowly (the log function grows very slowly) as increases towards the perfect limit where . As predicted by Corollary 1, we observe in Fig. 2 that beyond the critical value , the adversarial accuracy decays at a Gaussian rate, and eventually passes below the as soon as .

Comparing to the Gaussian special case below, we see that the curvature parameter appearing in the theorems is an analogue to the natural noise-level in the problem. The flat case with an threat model is particularly instructive. The critical values of , namely and beyond which the compromising conclusions of the Corollary 1 come into play is proportional to .

Finally note that the threat model corresponding to in Corollary 1, is a convex proxy for the “few-pixel” threat model which was investigated in  Su et al. (2017).

### 2.4 Some applications of the bounds

It turns out that the general “No Free Lunch” Theorem 2 and Corollary 1 apply to a broad range of problems. We discuss some of them hereunder.

#### Conditional log-concave data distributions on manifolds.

Consider a conditional data model of the form supported a complete -dimensional smooth Riemannian manifold satisfying the Bakry-Emeŕy curvature condition Bakry and Émery (1985)

 Hessx(vk)+Ricx(X)⪰(1/σ2k)Ip, (20)

for some . Such a distribution is called log-concave. By  (Otto and Villani, 2000, Corollary 1.1),  (Bobkov and Goetze, 1999, Corollary 3.2), has the property and therefore by Lemma 1, the BLOWUP() property, and Theorem 2 (and Corollary 1 for flat space) applies.

#### Elliptical Gaussian conditional data distributions.

Consider the flat manifold and multi-variate Gaussian distribution thereupon, where

, for some vector

(called the mean) and positive-definite matrix

(called the covariance matrix) all of whose eigenvalues are

. A direct computation gives for all . So this is an instance of the above log-concave example, and so the same bounds hold. Thus we get an elliptical version (and therefore a strict generalization) of the basic “No Free Lunch” theorem in  Tsipras et al. (2018), with exactly the same constants in the bounds.

#### Perturbed log-concave distributions.

The Holley-Stroock perturbation Theorem ensures that if where is bounded, then Theorem 2 (and Corollary 1 for flat space) holds with the noise parameter degraded to , where .

#### Distributions on compact homogeneous manifolds.

By  Rothaus (1998), such distributions satisfy Log-Sobolev Inequalities (LSI) which imply . The constant can be taken to be any positive scalar less than the hyper-contractivity constant of the manifold. A prime example of a compact homogeneous manifold is a hyper-sphere of radius . For this example, one can take . The “’concentric spheres” dataset considered in  Gilmer et al. (2018b) is an instance (more on this in section 3).

#### Lipschitzian pushforward of distributions having T2 property.

Lemma 2.1 of  Djellout et al. (2004) ensures that if is the pushforward via an -Lipschitz map ()

between metric spaces (an assumption which is implicitly made when machine learning practitioners model images using generative neural networks

111

The Lipschitz constant of a feed-forward neural network with 1-Lipschitz activation function, e.g ReLU, sigmoid, etc., is bounded by the product of operator norms of the layer-to-layer parameter matrices.

, for example), of a distribution which satisfies on for some , then satisfies on , and so Theorem 2 (and Corollary 1 for flat space) holds with . This is precisely the data model assumed by  Fawzi et al. (2018a), with and for all .

### 2.5 Distributional No “Free Lunch” Theorem

As before, let be a classifier and be a tolerance level. Let denote the distributional robustness accuracy of at tolerance , that is the worst possible classification accuracy at test time, when the conditional distribution is changed by at most in the Wasserstein-1 sense. More precisely,

 ˜accϵ(h):=infQ∈P(X×Y),W1(Q,P)≤ϵQ(h(x)=y), (21)

where the Wasserstein -distance (see equation (8) for definition) in the constraint is with respect to the pseudo-metric on defined by

 ~d((x′,y′),(x,y)):={d(x′,x),% if y′=y,∞, else.

The choice of ensures that we only consider alternative distributions that conserve the marginals ; robustness is only considered w.r.t to changes in the class-conditional distributions .

Note that we can rewrite ,

 ˜errϵ(h):=supQ∈P(X×Y),W1(Q,P)≤ϵQ(X∈B(h,Y)), (22)

where is the distributional robustness test error and as before. Of course, the goal of a machine learning algorithm is to select a classifier (perhaps from a restricted family) for which the average adversarial accuracy is maximized. This can be seen as a two player game: the machine learner chooses a strategy , to which an adversary replies by choosing a perturbed version of the data distribution, used to measure the bad event “”.

It turns out that the lower bounds on adversarial accuracy obtained in Theorem 2 apply to distributional robustness as well.

###### Corollary 2 (No “Free Lunch” for distributional robustness).

Theorem 2 holds for distributional robustness, i.e with replaced with .

###### Proof.

See Appendix A. ∎

## 3 Related works

There is now a rich literature trying to understand adversarial robustness. Just to name a few, let us mention  Tsipras et al. (2018), Schmidt et al. (2018), Bubeck et al. (2018), Gilmer et al. (2018b), Fawzi et al. (2018a), Mahloujifar et al. (2018), Sinha et al. (2017), Blanchet and Murthy (2016), Mohajerin Esfahani and Kuhn (2017). Below, we discuss a representative subset of these works, which is most relevant to our own contributions presented in this manuscript. These all use some kind of Gaussian isoperimetric inequality Boucheron et al. (2013), and turn our to be very special cases of the general bounds presented in Theorem 2 and Corollary 1.

#### Gaussian and Bernoulli models.

We have already mentioned the work  Tsipras et al. (2018), which first showed that motivating problem presented in section 1.4, every classifier can be fooled with high probability. In a followup paper Schmidt et al. (2018), the authors have also suggested that the sample complexity for robust generalization is much higher than for standard generalization. These observations are also strengthened by independent works of  Bubeck et al. (2018).

#### Generative models.

The authors posit a model in which data is generated by pushing-forward a multivariate Gaussian distribution through a (surjective) Lipschitz continuous mapping222Strictly speaking,  Fawzi et al. (2018a) imposes a condition on the pushforward map which is slightly weaker than Lipschitz continuity. , called the generator. The authors then studied the per-sample robustness radius defined by . In the notation of our manuscript, this can be rewritten as , from which it is clear that iff . Using the basic Gaussian isoperimetric inequality Boucheron et al. (2013), the authors then proceed to obtain bounds on the probability that the classifier changes its output on an -perturbation of some point on manifold the data manifold, namely , where and is the annulus in Fig. 1. Our bounds in Theorem 2 and Corollary 1 can then be seen as generalizing the methods and bounds in  Fawzi et al. (2018a) to more general data distributions satisfying transportation-cost inequalities , with .

The work which is most similar in flavor to ours is the recent “Adversarial Spheres” paper  Gilmer et al. (2018b), wherein the authors consider a 2-class problem on a so-called “concentric spheres” dataset. This problem can be described in our notation as: uniform distribution on -dimensional sphere of unit radius and uniform distribution on -dimensional sphere of radius . Thus, the classification problem is to decide which of the two concentric spheres a sampled point came from. One first observes that these two class-conditional distributions are constant (and therefore log-concave) over manifolds of constant curvature, namely and respectively. The situation is therefore an instance of the Bakry-Emeŕy curvature condition (20), with potentials . Whence, these distributions satisfy and respectively. Consequently, Theorem 2 and Corollary 1 kick-in and bound the average distance of sample points with true label , to the error set (set of misclassified samples): for the threat model , and for the threat model (spheres are locally flat, so this makes sense). To link more explicitly with the bound proposed in (Gilmer et al., 2018b, Theorem 5.1)

, one notes the following elementary (and very crude) approximation of Gaussian quantile function

: for . Thus, and are of the same order, for large . Consequently, our bounds can be seen as a strict generalization of the bounds in  Gilmer et al. (2018b).

#### Distributional robustness and regularization.

On a completely different footing,  Blanchet and Murthy (2016), Mohajerin Esfahani and Kuhn (2017), Sinha et al. (2017)

have linked distributional robustness to robust estimation theory from classical statistics and regularization. An interesting bi-product of these developments is that penalized regression problems like the square-root Lasso and sparse logistic regression have been recovered as distributional robust counterparts of the unregularized problems.

## 4 Experimental evaluation

### 4.1 Simulated data

The simulated data are discussed in section 1.4: , , with where is an SNR parameter which controls the difficulty of the problem. The results are are shown in Fig. 2. Here the classifier is the linear classifier presented in section 1.4. As predicted by the theorem, we observe that beyond the critical value , where , the adversarial accuracy decays exponential fast, and passes below the horizontal line as soon as .

### 4.2 Real data

Wondering whether the phase transition and bounds predicted by Theorem 2 and Corollary 2 holds for real data, we trained a deep feed-forward CNN for classification on the MNIST dataset LeCun and Cortes (2010), a standard benchmark problem in supervised machine-learning. The results are shown in Fig. 3. This model attains a classification accuracy of 98% on held-out data. We consider the performance of the model on adversarialy modified images according to the threat model, at a given tolerance level (maximum allowed modification per pixel) . As is increased, the performance degrades slowly and then eventually hits a phase-transition point; it then decays exponentially fast and the performance is eventually reduced to chance level. This behavior is in accordance with Corollary 1, and suggests that the range of applicability of our results may be much larger than what we have been able to theoretically establish in Theorem 2 and Corollary 1.

Of course, a more extensive experimental study would be required to strengthen this empirical observation.

## 5 Conclusion and Future Work

Our results would encourage one to conjecture that the modulus of concentration of probability distribution (e.g in inequalities) on a manifold completely characterizes the adversarial or distributional robust accuracy in classification problems. Since under mild conditions every distribution can be approximated by a Gaussian mixture and is therefore locally log-concave, one could conjecture that the adversarial robustness of a classifier varies over the input space as a function of the local curvature of the density of the distribution. Such a conjecture is also supported by empirical studies in  Fawzi et al. (2018b) where the authors observed that the local curvature of the decision boundary of a classifier around a point dictates the degree of success of adversarial attacks of points sampled around that point.

One could consider the following open questions, as natural continuation of our work:

• Extend Theorem 2 and Corollary 1 to more general data distributions.

• Study more complex threat models, e.g small deformations.

• Fine grained analysis of sample complexity and complexity of hypotheses class, with respect to adversarial and distributional robustness. This question has been partially studied in  Schmidt et al. (2018), Bubeck et al. (2018) in the adversarial case, and  Sinha et al. (2017) in the distributional robust scenario.

• Study more general threat models. Gilmer et al. (2018a) has argued that most of the proof-of-concept problems studied in theory papers might not be completely aligned with real security concerns faced by machine learning applications. It would be interesting to see how the theoretical bounds presented in our manuscript translate on real-world datasets, beyond the MNIST on which we showed some preliminary experimental results.

• Develop more geometric insights linking adversarial robustness and curvature of decision boundaries. This view was first introduced in  Fawzi et al. (2018b).

#### Acknowledgments.

I would wish to thank Noureddine El Karoui for stimulating discussions; Alberto Bietti and Albert Thomas for their useful comments and remarks.

## Appendix A Proofs

###### Proof of Lemma 1.

Let be a Borel subset of with , and let be the restriction of onto defined by for every Borel . Note that with Radon-Nikodym derivative . A direct computation then reveals that

 kl(μ|B∥μ)=∫log(dμ|Bdμ)dμ|B=∫log(1/μ(B))1Bdμ|B=log(1/μ(B))μ|B(B)=log(1/μ(B)).

On the other hand, if is a random variable with law and is a random variable with law , then the definition of ensures that , and so by definition (8), one has . Putting things together yields

 ϵ≤W2(μ|B,μX∖Bϵ)≤W2(μ|B,μ|X∖Bϵ)+W2(μ|X∖Bϵ,μ)≤√2ckl(μ|B∥μ)+√2ckl(μ|X∖Bϵ∥μ)≤√2clog(1/μ(B))+√2clog(1/μ(X∖Bϵ))=√2clog(1/μ(B))+√2clog(1/(1−μ(Bϵ)),

where the first inequality is the triangle inequality for and the second is the property assumed in the Lemma. Rearranging the above inequality gives

 √2clog(1/(1−μ(Bϵ)))≥ϵ−√2clog(1/μ(B)),

and if , we can square both sides, multiply by and apply the increasing function , to get the claimed inequality. ∎

###### Proof of Theorem 2.

Let be a classifier, and for a fixed class label , define the set . Because we only consider -a.e continuous classifiers, each is Borel. Conditioned on the event “”, the probability of is precisely the average error made by the classifier on the class label . That is, . Now, the assumptions imply by virtue of Lemma 1, that has the BLOWUP() property. Thus, if , then one has

 accϵ(h|k)=1−PX|k(B(h,k)ϵdgeo)≤e−12σ2k(ϵ−σk√2log(1/(PX|k(B(h,k)))2=e−12σ2k(ϵ−σk√2log(1/err(h|k))2=e−12σ2k(ϵ−ϵ(h|k))2≤e−12σ2kϵ(h|k)2=err(h|k), if ϵ≥2ϵ(h|k).

On the other hand, it is clear that for any since