Monge beats Bayes: Hardness Results for Adversarial Training

06/08/2018 ∙ by Zac Cranko, et al. ∙ University of Illinois at Chicago 0

The last few years have seen extensive empirical study of the robustness of neural networks, with a concerning conclusion: several state-of-the-art approaches are highly sensitive to adversarial perturbations of their inputs. There has been an accompanying surge of interest in learning including defense mechanisms against specific adversaries, known as adversarial training. Despite some impressive advances, little remains known on how to best frame a resource-bounded adversary so that it can be severely detrimental to learning, a non-trivial problem which entails at a minimum the choice of loss and classifiers. We suggest here a formal answer to this question, and pin down a simple sufficient property for any given class of adversaries to be detrimental to learning. This property involves a central measure of "harmfulness" which generalizes the well-known class of integral probability metrics, and thus the maximum mean discrepancy. A key feature of our result is that it holds for all proper losses, and for a popular subset of these, the optimisation of this central measure appears to be independent of the loss. We then deliver a sufficient condition for this sufficient property to hold for Lipschitz classifiers, which relies on framing it into optimal transport theory. We finally deliver a negative boosting result which shows how weakly contractive adversaries for a RKHS can be combined to build a maximally detrimental adversary, show that some implemented existing adversaries involve proxies of our optimal transport adversaries and finally provide a toy experiment assessing such adversaries in a simple context, displaying that additional robustness on testing can be granted through adversarial training.



There are no comments yet.


page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Starting from the observation that deep nets are sensitive to imperceptible perturbations of their inputs (Szegedy et al., 2013), a surge of recent work has focussed on new adversarial training

approaches to supervised learning

(Athalye et al., 2018a, b; Bastani et al., 2016; Buckman et al., 2018; Bubeck et al., 2018; Cai et al., 2018; Dhillon et al., 2018; Fawzi et al., 2018; Gilmer et al., 2018; Goswami et al., 2018; Guo et al., 2018; Ilias et al., 2018; Kurakin et al., 2017; Ma et al., 2018; Madry et al., 2018; Samangouei et al., 2018; Song et al., 2018; Tramèr et al., 2018; Uesato et al., 2018; Wang et al., 2018; Wong and Zico Kolter, 2018) (and references within). In the now popular model of Madry et al. (2018), we want to learn a classifier from a set , given a distribution of clean examples and loss . Adversarial training then seeks to find


where is a norm and is the budget of the adversary. It has recently been observed that adversarial training damages standard accuracy as data size and adversary’s budget () increases Tsipras et al. (2019). A Bayesian explanation is given for a particular in Tsipras et al. (2019), and the authors conclude their findings questioning the interplay between adversarial robustness and standard accuracy.

In this paper, we dig into this relationship (i) by casting the standard accuracy and loss in (1) in the broad context of Bayesian decision theory Grünwald and Dawid (2004) and (ii) by considering a general form of adversaries, not restricted to the ones used in (1). In particular, we assume that the loss is proper, which is just a general form of statistical unbiasedness that many popular choices meet (Hendrickson and Buehler, 1971; Reid and Williamson, 2010). The minimization of a proper loss gives guarantees on the accuracy (for example, Kearns and Mansour (1996)), so it directly connects to the setting of Tsipras et al. (2019). Regarding the adversaries, instead of relying on the local adversarial modification for some , we consider a set of possible local modifications for some ( fixed). We then analyze the conditions on under which, for some ,


where is the loss of the "blunt" predictor which predicts nothing. If has range

, this blunt predictor is in general 0 (for the log loss, square loss, etc), which translates into a class probability estimate of

for all observations and global accuracy of for two classes, i.e. that of an unbiased coin. We see the connection of (2) to the accuracy: as , the learner will be tricked into converging to an extremely poorly accurate predictor. How one can design such provably efficient adversaries, furthermore under tight budget constraints, is the starting point of our paper.

Our first contribution (Section 4) analyzes budgeted adversaries that can enforce (2). Our main finding shows that (2) is implied by a very simple condition involving a central quantity generalizing the celebrated integral probability metrics (Sriperumbudur et al., 2009). Furthermore, under some additional condition on the loss, satisfied by the log, square and Matsushita losses, the adversarial optimization of does not depend on the loss. In other words, [colframe=blue,boxrule=0.5pt,arc=4pt,left=6pt,right=6pt,top=6pt,bottom=6pt,boxsep=0pt]

the adversary can attack the learner disregarding its loss.

clean class marginals adversarial class marginals OT plan
transformations from digit-1 transformations from digit-3
0 0.15 0.3 0.45 0.6 0 0.15 0.3 0.45 0.6
Table 1: Top table: compression of the optimal transport (OT) plan for a Mixup adversary on a toy 1D data. Bottom table: transformations performed by a Monge adversary for the digit-1 vs digit-3 classification problem on four USPS digits (noted to ), for various adversarial budgets (0 = clean data, see Section 7 for details).

Our second contribution (Section 5) considers the adversarial optimisation of when the classifiers in satisfy a generalized form of Lipschitz continuity. Controlling Lipschitz continuity has recently emerged as a solution to limit the impact of adversarial examples (Cissé et al., 2017). In this context, efficient budgeted adversaries take a particular form: we show that, for an adversary to minimize , [colframe=blue,boxrule=0.5pt,arc=4pt,left=6pt,right=6pt,top=6pt,bottom=6pt,boxsep=0pt]

it is sufficient to compress the optimal transport plan between class marginals using the Lipschitz function as transportation cost, disregarding the learner’s .

This result puts the machinery of optimal transport (OT) to the table of adversarial design Villani (2009), with a new purpose (the compression of OT plans). These two findings turn out to be very useful from an experimental standpoint: we have implemented two kinds of adversaries inspired by our theory (called Mixup and Monge for their respective links with Zhang et al. (2018); Villani (2009)); Table 1 displays their behaviour on two simple problems. We have observed that training a learner against a ”weak” (severely budgeted) adversary improves generalization on clean data, a phenomenon also observed elsewhere Tsipras et al. (2019); Zhang et al. (2018). The digit experiment displays how our adversaries progressively transform observations of one class into credible observations of the other (See Experiments in Section 7, and the Supplement, ).

Our third contribution (Section 6) is an adversarial boosting result: it answers the question as to whether one can efficiently craft an arbitrarily strong adversary from the sole access to a black box weak adversary. In the framework of reproducing kernel Hilbert spaces (RKHS), we show that [colframe=blue,boxrule=0.5pt,arc=4pt,left=6pt,right=6pt,top=6pt,bottom=6pt,boxsep=0pt]

this "weak adversary" "strong adversary" design does exist, and our proof is constructive: we build one.

Our proof revolves around a standard concept of fixed point theory: contractive mappings. We insist on the computational efficiency of this design, linear in the coding size of the Wasserstein distance between class marginals. It shows that, on some adversarial training problems, the existence of the weakest forms of adversaries implies that much stronger ones may be available at cheap (computational) cost.

2 Related work

Formal approaches to the problem of adversarial training are sparse compared to the growing literature on the arms race of experimental results. The formal trend has started on adversarial changes to a loss to be optimized (Sinha et al., 2018) or more directly on a classifier’s output (Hein and Andriushchenko, 2017; Raghunathan et al., 2018). For example, (Sinha et al., 2018) add a Wasserstein penalty to a loss, computing the distance between the true and adversarial distributions. They provide smoothness guarantees for the loss and robustness in generalization for its minimization. (Raghunathan et al., 2018) directly penalize the classifier’s output (not a loss per se), in the context of shallow networks, and compute adversarial perturbations in a bounded ball. A similar approach (but in -norm) is taken in (Hein and Andriushchenko, 2017) for kernel methods and shallow networks. Recent ones also focus on introducing general robustness constraints (Bastani et al., 2016).

More recently, a handful of work have started to investigate the limits of learning in an adversarial training setting, but they are limited in that they address particular simulated domains with a particular loss to be optimized, and consider particular adversaries (Bubeck et al., 2018; Fawzi et al., 2018; Gilmer et al., 2018). The distribution can involve Gaussians of mixtures (Bubeck et al., 2018; Fawzi et al., 2018) or the data lies on concentric spheres (Gilmer et al., 2018). The loss involves a distance based on a norm for all, and the adversary makes local shifts to data of bounded radius. In the case of Bubeck et al. (2018), the access to the data is restricted to statistical queries. The essential results are either that robustness requires too much information compared to not requiring robustness (Bubeck et al., 2018), or the "safety" radius of inoffensive modifications is in fact small relative to some of the problem’s parameters, meaning even "cheap" adversaries can sometimes generate damaging adversarial examples (Fawzi et al., 2018; Gilmer et al., 2018). This depicts a pretty negative picture of adversarial training — negative but local: all these results share the same common design pattern of relying on particular choices for all key components of the problem: domain, loss and adversaries (and eventually classifiers). There is no approach to date that would relax any of these choices, even less so one that would simultaneously relax all.

3 Definitions and notations

We present some important definitions and notations.

Proper losses. Many of our notations follow Reid and Williamson (2010). Suppose we have a prediction problem with binary labels. We let

denote a general loss function to be minimized, where the left argument is a class

and the right argument is a class probability estimate ( is the closure of ). Its conditional Bayes risk function is the best achievable loss when labels are drawn with a particular positive base-rate,


where , so that and . We call the loss proper iff Bayes prediction locally achieves the minimum everywhere222Losses for which properness makes particular sense are called class probability estimation losses (Reid and Williamson, 2010).: . One value of is interesting in our context, the one which corresponds to Bayes rule returning maximal "uncertainty", i.e. for ,


Without further ado, we give the key definition which makes more precise the framework sketched in (2). For any proper loss and integrable with respect to some distribution , the adversarial loss is defined as


For any , we say that is -defeated by on iff


Intuitively, if the adversary can modify instances such that the learner does not do much better than a trivial blunt constant predictor, the adversary can declare victory. The additional quantities (such as the integrability condition) are given later in this section. To finish up with general proper losses, as an example, the log-loss given by and is proper, with conditional Bayes risk given by the Shannon entropy .

Composite, canonical proper losses. We let denote a set of classifiers. To convert real valued predictions into class probability estimates (McCullagh and Nelder, 1989), one traditionally uses an invertible link function , forming a composite loss (Reid and Williamson, 2010). We shall leave hereafter the adjective composite for simplicity, and the link implicit from context whenever appropriate. The unique (up to multiplication or addition by a scalar (Buja et al., 2005)) canonical link for a proper loss is defined from the conditional Bayes risk as (Reid and Williamson, 2010, Section 6.1), (Buja et al., 2005). As an example, for log-loss we find the canonical link , with inverse the well-known sigmoid . A proper loss will also be assumed to be twice differentiable. Twice differentiability is a technical convenience to simplify derivations. It can be removed (Reid and Williamson, 2010, Footnote 6). A canonical proper loss is a proper loss using the canonical link.

Adversaries. Let denote a set of adversaries, so that any is allowed to transform instances in some way (e.g., change pixel values on an image). Suppose (fixed) denotes a distribution over and (resp. ) is the corresponding distribution conditioned on (resp. ). The only assumption we make about adversaries is a measurability one. We assume that , is integrable with respect to and : . For the sake of simplicity, we shall denote the tuple integrable with respect to . Assuming loss is proper composite with link , there is one interesting constant :


because this value delivers the real valued prediction corresponding to maximal uncertainty in (4). For example, when the loss is proper canonical and furthermore required to be symmetric, i.e. there is no class-dependent misclassification cost, we have (Nock and Nielsen, 2008)


which corresponds to a classifier always abstaining and indeed delivering maximal uncertainty on prediction. It is not hard to check that is the loss of constant . So we can now see that in Definition 3, as , training against the adversarial loss essentially produces a classifier no better than predicting nothing. We do not assume that , but keep in mind that such prediction with maximal uncertainty is the baseline against which a learner has to compete to "learn" something.

The adversarial distortion parameter . We now unveil the key parameter used earlier in the Introduction. For any , , we let:


For any , the adversarial distortion is:



Finally, . While abstract, we shall shortly see that quantities

relate to a well-known object in the study of distances between probability distributions. Let


As an example, we have for the the log-loss , with the convention . We remark that in (11) is related to in (10):


for and the singleton classifier which makes the hard prediction over and over (Hereafter, we note instead of for short). Remark that such a classifier is not affected by a particular adversary, but it is not implementable in the general case as it would require to know the class of an observation.

4 Main result: the hardness theorem

We now show a lower bound on the adversarial loss of (5).

For any proper loss , link and any integrable with respect to , the following holds true:




(all other parameters implicit in the definition of , Proof in , Section 10) This pins down a simple condition for the adversary to defeat . Under the conditions and with notations of Theorem 4, if there exists and such that


then is -defeated by on . (Proof in , Section 10) We remark that whenever is canonical, and so


We also note that constants get out of the maximization problem in (10) so when is canonical, the optimisation of does not depend on the loss at hand — hence, its optimisation by an adversary could be done without knowing the loss that the learner is going to minimise. We also remark that the condition for to be -defeated by does not involve an algorithmic component: it means that any learning algorithm minimising loss will end up with a poor predictor if (15) is satisfied, regardless of its computational resources.

Relationships with integral probability metrics. In a special case, the somewhat abstract quantity can be related to the more familiar class of integral probability metrics (IPMs) (Sriperumbudur et al., 2009). The latter are a class of metrics on probability distributions, capturing e.g. the total variation divergence, Wasserstein distance, and maximum mean discrepancy. The proof of the following Corollary is immediate.

Suppose and is closed by negation. Then

which is the integral probability metric for the class on and . Here, is defined in (14). We may now interpret Theorem 4 as saying: for an adversary to defeat a learner minimising a proper loss, it suffices to make a suitable IPM between the class-conditionals small. The particular choice of IPM arises from the learner’s choice of hypothesis class, . Of particular interest is when this comprises kernelized scorers, as we now detail.

Relationships with the maximum mean discrepancy. The maximum mean discrepancy (MMD) (Gretton et al., 2006) corresponds to an IPM where is the unit-ball in an RKHS. We have the following re-expression of for this hypothesis class, which turns out to involve the MMD.

Figure 1: Suppose an adversary can guarantee an upperbound on as displayed in thick red. For some fixed and , we display the range of values (in pink) for which -defeats . Notice that outside this interval, it may not be possible for to -defeat (in grey, tagged "?"), and if is large enough (orange, tagged "safe"), then it is not possible for condition (15) to be satisfied anymore.

Suppose is proper canonical and let denote the unit ball of a reproducing kernel Hilbert space (RKHS) of functions with reproducing kernel . Denote


the adversarial mean embedding of on . If and , then


The constraints on are for readability: the proof (in , Section 11) shows a more general result, with unrestricted . The right-hand side of (18) is proportional to the MMD between and . In the more general case, the right-hand side of (18) is replaced by . Figure 1 displays an example picture (for unrestricted ) for some canonical proper but asymmetric loss () when an adversary with a given upperbound guarantee on can indeed -defeat some . We remark that while this may be possible for a whole range of , this may not be possible for all. The picture would be different if the loss were symmetric (Corollary 4 below), since in this case a guarantee to -defeat for some would imply a guarantee for all. Loss asymmetry thus brings a difficulty for the adversary which, we recall, cannot act on .

Simultaneously defeating over sets of losses. Satisfying (15) involves at least the knowledge of one value of the loss, if not of the loss itself. It turns out that if the loss is canonical and the adversary has just a partial knowledge of it, it may in fact still be possible for him to guess whether (15) can be satisfied over this set, as we now show.

Let be a set of canonical proper losses satisfying the following property: such that . Assuming integrable with respect to , if


then is jointly -defeated by on all losses of . Notice that all the adversary needs to know is . The result easily follows from remarking that we have in this case:

which we then plug in (15) to get the statement of the Corollary. Corollary 4 is interesting for two reasons. First, it applies to all proper symmetric losses (Nock and Nielsen, 2008; Reid and Williamson, 2010), which includes popular losses like the square, logistic and Matsushita losses. Finally, it does not just offer the adversarial strategy to defeat classifiers that would be learned on any of such losses, it also applies to more sophisticated learning strategies that would tune the loss at learning time (Nock and Nielsen, 2008; Reid and Williamson, 2010) or tailor the loss to specific constraints (Buja et al., 2005).

5 Monge efficient adversaries

We now highlight a sufficient condition on adversaries for (15) to be satisfied, which considers classifiers in the increasingly popular framework of "Lipschitz classification" for adversarial training (Cissé et al., 2017), and turns out to frame adversaries in optimal transport (OT) theory (Villani, 2009). We proceed in three steps, first framing OT adversaries, then Lipschitz classifiers and finally showing how the former defeats the latter. Given any and some , we say that is -Monge efficient for cost on marginals iff , with

and is the set of all joint probability measures whose marginals are and . Hence, Monge efficiency relates to an efficient compression of the transport plan between class marginals. In fact, we should require to satisfy some mild additional assumptions for the existence of optimal couplings (Villani, 2009, Theorem 4.1), such as lower semicontinuity. We skip them for the sake of simplicity, but note that infinite costs are possible without endangering the existence of optimal couplings of (Villani, 2009), which is convenient for the following generalized notion of Lipschitz continuity. Let . For some and , set is said to be -Lipschitz with respect to iff


We shall also write that is -Lipschitz if Definition 5 holds for ( implicit). Actual Lipschitz continuity would restrict to involve a distance, and the state of the art of adversarial training would restrict further the distance to be based on a norm (Cissé et al., 2017). Equipped with this, we obtain the main result of this Section.

Fix any and proper canonical loss . Suppose such that:

  1. is -Lipschitz with respect to ;

  2. is -Monge efficient for cost on marginals for


Then is -defeated by on . The proof (in , Section 12) is given for the more general case where is not necessarily and any proper loss, not necessarily canonical. We also show in the proof that unless , cannot be a distance in the general case. We take it as a potential difficulty for the adversary which, we recall, cannot act on .

Theorem 5 is particularly interesting with respect to the current developing strategies around adversarial training that "Lipschitzify" classifiers (Cissé et al., 2017). Such strategies assume that the loss is Lipschitz (remark that we do not make such an assumption). In short, if we rename the inner part (within ) in (5), those strategies exploit the fact that (omitting key parameters for readability)


where is the adversary-free loss and is the Lipschitz constant of the loss () or classifier learned (). One might think that minimizing (22) is not a good strategy in the light of Theorem 5 because the regularization enforces a minimization of ( in Theorem 5), so we seemingly alleviate constraints on the adversary to be -Monge efficient in (21) and can end up being more easily defeated. This is however a too simplistic conclusion that does not take into account the other parameters at play, as we now explain in the context of Cissé et al. (2017). Consider the logistic loss (Cissé et al., 2017), for which:


Suppose we can reduce both and (which is in fact not hard to ensure for deep architectures (Miyato et al., 2018, Section 2.1), (Cranko et al., 2018)) so that . Reorganizing, we get , so for to be -defeated, we in fact get a constraint on : , which reframes the constraint on in (21) as (see also , (57)),


which does not depend anymore on .

The proof of Theorem 5 is followed in  by a proof of an interesting generalization in the light of those recent results (Cissé et al., 2017; Cranko et al., 2018; Miyato et al., 2018): the Monge efficieny requirement can be weakened under a form of dominance (similar to a Lipschitz condition) of the canonical link with respect to the chosen link of the loss. We now provide a simple family of Monge efficient adversaries.

Mixup adversaries. Very recently, it was experimentally demonstrated how a simple modification of a training sample yields models more likely to be robust to adversarial examples and generalize better (Zhang et al., 2018)

. The process can be summarized in a simple way: perform random interpolation between two randomly chosen training examples to create a new example (repeat as necessary). Since we do not allow the adversary to tamper with the class, we define as

-mixup (for ) the process which creates for two observations and having a different class the following adversarial observation (same class as ):


We make the assumption that is metric with an associated distance that stems from this metric. We analyze a very simple case of -mixup, which we call -mixup to , which replaces by some in in (25). Notice that as , we converge to the maximally harmful adversary mentioned in the introduction. The intuition thus suggests that the set of all -mixups to some (where we vary ) designs in fact an arbitrarily Monge efficient adversary, where the optimal transport problem involves the associated distance of . This is indeed true and in fact simple to show. For any the set of all -mixups to is -Monge efficient for , where is the 1-Wasserstein distance between the class marginals. (Proof in , Section 14) The mixup methodology as defined in Zhang et al. (2018) can be specialized in numerous ways: for example, instead of mixing up with a single observation, we could perform all possible mixups within in a spirit closer to Zhang et al. (2018), or mixups with several distinguished observations (e.g. after clustering), etc. . Many choices like these would be eligible to be at least Monge efficient, but while they can be computationally simple to compute, they are just surrogates for Monge efficiency: tackling directly the compression of the optimal transport plan is a more direct option to Monge efficiency.

6 From weak to strong Monge efficiency

In Theorem 5, we showed how Monge efficiency for adversaries can "take over" Lipschitz classifiers and defeat them for some . Suppose now that the we have is weak in that all its elements are Monge efficient but for large values of . In other words, we cannot satisfy condition (2) in Theorem 5. Is there another set of adversaries, , whose elements would combine the elements of is a computationally savvy way, and which would achieve any desired level of Monge efficiency? Such a question parallels that of the boosting framework in supervised learning, in which one combines classifiers just different from random to achieve a combination arbitrarily accurate (Schapire and Freund, 2012).

We now answer our question by the affirmative, in the context of kernel machines. Let denote a RKHS and a feature map of the RKHS. , define cost

Function is said -contractive for , for some iff . Set is said -contractive for iff it contains at least one adversary -contractive for (and we make no assumption on the others). Define now for any , and , the 1-Wasserstein distance between class marginals in the feature map. Let denote a RKHS with feature map and be -contractive for . Then is -Monge efficient for . Furthermore, , is -Monge efficient when . (Proof in , Section 13) To amplify the difference between and , remark that the worst case of Monge efficiency is , since it is just the Monge efficiency for contracting nothing. So, as , there is barely any guarantee we can get from the -contractive while can still be arbitrarily Monge efficient for a linear in the coding size of the Wasserstein distance between class marginals.

7 Experiments

1D problem USPS handwritten digits
Expected logistic loss Cost and weight norm
transf. in digit-1 transf. in digit-3
0 0.15 0.3 0.45 0.6 0 0.15 0.3 0.45 0.6
Figure 2: Left: results for the 1D toy problem as a function of . Left plot:

the expected log loss for the training/testing distribution pairs a/a, a/c, and c/a, where a (respectively c) denotes the adversarial (clean) data distribution. Hence for a/c we optimised the logistic regression classifier on the adversarial distribution, and computed the log loss on the clean distribution.

Right plot: the optimal transport cost (left scale) and the norm of the logistic regression weights (right scale). Right: sample results of digits as they are transformed by the OT adversary (convention follows Figure 1).
c/c c/a a/c a/a
0.15 0.03 0.11 0.00 0.02
0.30 0.03 0.25 0.00 0.12
0.45 0.03 0.48 0.01 0.55
0.60 0.03 0.74 0.20 0.96
Table 2: log loss USPS results. is the strength of the adversary. The convention follows Figure 2. Bold faces denote results better than the c/c baseline.

We have performed toy experiments to demonstrate our new setting. Our objective is not to investigate the competition with respect to the wealth of results that have been recently published in the field, but rather to touch upon the interest that such a novel setting might have for further experimental investigations. Compared to the state-of-the-art, ours is a clear two-stage setting where we first compute the adversaries assuming relevant knowledge of the learner (in our case, we rely on Theorem 6 and therefore assume that the adversary knows at least the cost , see below), and then we learn based on an adversarially transformed set of examples. This process has the advantage over the direct minimization of (2) that it extracts the computation of the adversarial examples from the training loop: we can generate once the adversarial examples, then store them and / or share / reuse them to robustly train various models (recall that under a general Lipschitz assumptions on classifiers, such examples can fit the adversarial training of different kinds of models, see Theorem 5

). This process is also reminiscent of the training process for invariant support vector machines

(DeCoste and Schölkopf, 2002) and can also be viewed as a particular form of vicinal risk minimization (Chapelle et al., 2000). We have performed two experiments: a 1D experiment involving a particular Mixup adversary and a USPS experiment involving a closer proxy of the optimal transport compression that we call Monge adversary.

1D experiment, mixup adversary. Our example involves the unit interval with and . We let contain a single deterministic mapping parametrised by as . Notice that this adversary is just the -mixup to the unconditional mean, following Section 6. We further let be the space of linear functions , , which is the RKHS with linear kernel (assuming that and include the constant 1), and . The transport cost function of interest is . We discretize to simplify the computation of the OT cost. Results are summarized in Figure 2 (and ). We theoretically achieve loss as . There are several interesting observations from Figure 2: first, the mixup adversary indeed works like a Monge efficient adversary: by tuning , we can achieve any desired level of Monge efficiency. The left plot completes in this simple case observations of Tsipras et al. (2019); Zhang et al. (2018): the worst result is consistently obtained for training on clean data and testing on adversarial data, which indicates that our adversaries may be useful to get robustness using adversarial training.

USPS digits, Monge adversary. We have picked 100 examples of each of the "1" and "3" classes of the 88 pixel greyscale USPS handwritten digit dataset. The set of Monge adversaries is , in which, under the budget constraint, we optimize the Wasserstein distance

between the empirical class marginals. We achieve this by combining a generic gradient-free optimiser with a linear program solver

333Code available upon request to CW. We learn using logistic regression. We demonstrate three strengths of adversary — namely where is distance between the (clean) class conditional means. Sample transformations as obtained by the Monge adversary are displayed in Figure 2 (more in ), and Table 2 provides log loss values for different training / test schemes, following the scheme of the 1D data. It clearly emerges two facts: (i) as the budget increases, the Monge adversary smoothly transforms digits in credible adversarial examples, and (ii), as previously observed, training over a tight budget adversary tends to increase generalization abilities Tsipras et al. (2019); Zhang et al. (2018).

8 Conclusion

It has been observed over the past years that classifiers can be extremely sensitive to changes in inputs that would be imperceptible to humans. How such tightly limited resource

-constrained changes can affect and be so damaging to machine learning and how to find a cure has been growing as a very intensive area of research. There is so far little understanding on the formal side and many experimental approaches would rely on adversarial data that, in some way, shrinks the gap between classes in a controlled way.

In this paper, we studied the intuition that such a process can indeed be beneficial for adversarial training. Our answer involves a simple, sufficient (and sometimes loss-independent) property for any given class of adversaries to be detrimental to learning. This property involves a measure of “harmfulness”, which relates to (and generalizes) integral probability metrics and the maximum mean discrepancy. We presented a sufficient condition for this sufficient property to hold for Lipschitz classifiers, which relies on framing it into optimal transport theory. This brings a general way to formalize how adversaries can indeed "shrink the gap" between classes with the objective to be detrimental to learning. As an example, we delivered a negative boosting result which shows how weakly contractive adversaries for a RKHS can be combined to build a maximally detrimental adversary. We also provided justifications that several experimental approaches to adversarial training involve proxies for adversaries like the ones we analyze. On the experimental side, we provided a simple toy assessment of the ways one can compute and then use such adversaries in a two-stage process.
Our experimental results, even when carried out on a toy domain, bring additional reasons to consider such adversaries, this time from a generalization standpoint: our results might indeed indicate that they could at least be useful to gain additional robustness in generalization.


The authors warmly thank Kamalika Chaudhuri, Giorgio Patrini, Bob Williamson, Xinhua Zhang for numerous remarks and stimulating discussions around this material.


  • Amari and Nagaoka [2000] S.-I. Amari and H. Nagaoka. Methods of Information Geometry. Oxford University Press, 2000.
  • Athalye et al. [2018a] A. Athalye, N. Carlini, and D.-A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In 35 ICML, 2018a.
  • Athalye et al. [2018b] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust adversarial examples. In 35 ICML, 2018b.
  • Bastani et al. [2016] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A.-V. Nori, and A. Criminisi. Measuring neural net robustness with constraints. In NIPS*29, 2016.
  • Bubeck et al. [2018] S. Bubeck, E. Price, and I. Razenshteyn. Adversarial examples from computational constraints. CoRR, abs/1805.10204, 2018.
  • Buckman et al. [2018] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow. Thermometer encoding: one hot way to resist adversarial examples. In 6 ICLR, 2018.
  • Buja et al. [2005] A. Buja, W. Stuetzle, and Y. Shen. Loss functions for binary class probability estimation ans classification: structure and applications, 2005. Technical Report, University of Pennsylvania.
  • Cai et al. [2018] Q.-Z. Cai, M. Du, C. Liu, and D. Song. Curriculum adversarial training. In IJCAI-ECAI’18, 2018.
  • Chapelle et al. [2000] O. Chapelle, J. Weston, L. Bottou, and V. Vapnik. Vicinal risk minimization. In Advances in Neural Information Processing Systems*13, 2000.
  • Cissé et al. [2017] M. Cissé, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier. Parseval networks: improving robustness to adversarial examples. In 34 ICML, 2017.
  • Cranko et al. [2018] Z. Cranko, S. Kornblith, Z. Shi, and R. Nock. Lipschitz networks and distributional robustness. CoRR, abs/1809.01129, 2018.
  • DeCoste and Schölkopf [2002] D. DeCoste and B. Schölkopf. Training invariant support vector machines. Machine Learning, 46:161–190, 2002.
  • Dhillon et al. [2018] G. Dhillon, K. Azizzadenesheli, Z.-C. Lipton, J. Bernstein, J. Kossaifi, A. Khanna, and A. Anandkumar. Stochastic activation pruning for robust adversarial defense. In 6 ICLR, 2018.
  • Fawzi et al. [2018] A. Fawzi, H. Fawzi, and O. Fawzi. Adversarial vulnerability for any classifier. CoRR, abs/1802.08686, 2018.
  • Gilmer et al. [2018] J. Gilmer, L. Metz, F. Faghri, S.-S. Schoenholz, M. Raghu, M. Wattenberg, and I. Goodfellow. Adversarial spheres. In 6 ICLR Workshops, 2018.
  • Goswami et al. [2018] G. Goswami, N. Ratha, A. Agarwal, R. Singh, and M. Vatsa.

    Unravelling robustness of deep learning based face recognition against adversarial attacks.

    In AAAI’18, 2018.
  • Gretton et al. [2006] A. Gretton, K.-M. Borgwardt, M.-J. Rasch, B. Schölkopf, and A.-J. Smola. A kernel method for the two-sample-problem. In NIPS*19, pages 513–520, 2006.
  • Grünwald and Dawid [2004] P. Grünwald and P. Dawid. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. of Stat., 32:1367–1433, 2004.
  • Guo et al. [2018] C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Countering adversarial images using input transformations. In 6 ICLR, 2018.
  • Hein and Andriushchenko [2017] M. Hein and M. Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. In NIPS*30, 2017.
  • Hendrickson and Buehler [1971] A.-D. Hendrickson and R.-J. Buehler. Proper scores for probability forecasters. Annals of Mathematical Statistics, 42:1916–11921, 1971.
  • Ilias et al. [2018] A. Ilias, L. Engstrom, A. Athalye, and J. Lin. Adversarial attacks under restricted threat models. In 35 ICML, 2018.
  • Kearns and Mansour [1996] M. Kearns and Y. Mansour.

    On the boosting ability of top-down decision tree learning algorithms.

    In Proc. of the 28 ACM STOC, pages 459–468, 1996.
  • Kurakin et al. [2017] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In 5 ICLR, 2017.
  • Ma et al. [2018] X. Ma, B. Li, Y. Wang, S.-M. Erfani, S. Wijewickrema, G. Schoenebeck, D. Song, M.-E. Houle, and J. Bayley. Characterizing adversarial subspaces using local intrinsic dimensionality. In 6 ICLR, 2018.
  • Madry et al. [2018] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In 6 ICLR, 2018.
  • McCullagh and Nelder [1989] P. McCullagh and J. Nelder. Generalized Linear Models. Chapman Hall/CRC, 1989.
  • Miyato et al. [2018] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. In ICLR’18, 2018.
  • Nock and Nielsen [2008] R. Nock and F. Nielsen. On the efficient minimization of classification-calibrated surrogates. In NIPS*21, pages 1201–1208, 2008.
  • Raghunathan et al. [2018] A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. In 6 ICLR, 2018.
  • Reid and Williamson [2010] M.-D. Reid and R.-C. Williamson. Composite binary losses. JMLR, 11:2387–2422, 2010.
  • Samangouei et al. [2018] P. Samangouei, M. Kabkab, and R. Chellappa. Defense-GAN: protecting classifiers against adversarial attacks using generative models. In 6 ICLR, 2018.
  • Schapire and Freund [2012] R.-E. Schapire and Y. Freund. Boosting, Foundations and Algorithms. MIT Press, 2012.
  • Shuford et al. [1966] E. Shuford, A. Albert, and H.-E. Massengil. Admissible probability measurement procedures. Psychometrika, pages 125–145, 1966.
  • Sinha et al. [2018] A. Sinha, H. Namkoong, and J. Duchi. Certifying some distributional robustness with principled adversarial training. In 6 ICLR, 2018.
  • Song et al. [2018] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. PixelDefend: leveraging generative models to understand and defend against adversarial examples. In 6 ICLR, 2018.
  • Sriperumbudur et al. [2009] B.-K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G.-R.-G. Lanckriet. On integral probability metrics, -divergences and binary classification. CoRR, abs/0901.2698, 2009.
  • Szegedy et al. [2013] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
  • Tramèr et al. [2018] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel. Ensemble adversarial training: attacks and defenses. In 6 ICLR, 2018.
  • Tsipras et al. [2019] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry.

    Robustness may be at odds with accuracy.

    In 7 ICLR, 2019.
  • Uesato et al. [2018] J. Uesato, B. O’Donoghue, P. Kohli, and A. van den Oord. Adversarial risk and the dangers of evaluating against weak attacks. In 35 ICML, 2018.
  • Villani [2009] C. Villani. Optimal transport, old and new. Springer, 2009.
  • Wang et al. [2018] Y. Wang, S. Jha, and K. Chaudhuri. Analyzing the robustness of nearest neighbors to adversarial examples. In 35 ICML, 2018.
  • Wong and Zico Kolter [2018] E. Wong and J. Zico Kolter. Provable defense against adversarial examples via the outer adversarial polytope. In 35 ICML, 2018.
  • Zhang et al. [2018] H. Zhang, M. Cisse, Y.-D. Dauphin, and D. Lopez-Paz. mixup: beyond empirical risk minimization. In 6 ICLR, 2018.

9 Appendix

10 Proof of Theorem 4 and Corollary 4

Our proof assumes basic knowledge about proper losses (see for example Reid and Williamson [2010]). From [Reid and Williamson, 2010, Theorem 1, Corollary 3] and Shuford et al. [1966], being twice differentiable and proper, its conditional Bayes risk and partial losses and are related by:


The weight function [Reid and Williamson, 2010, Theorem 1] being also , we get from the integral representation of partial losses [Reid and Williamson, 2010, eq. (5)],


from which we derive by integrating by parts and then using the Legendre conjugate of ,


Now, suppose that the way a real-valued prediction is fit in the loss is through a general inverse link . Let


Since , the proper composite loss with link on prediction is the same as the proper composite loss with link on prediction . This last loss is in fact using its canonical link and so is proper canonical [Reid and Williamson, 2010, Section 6.1], [Buja et al., 2005]. Letting in this case , we get that the partial loss satisfies


Notice the constant appearing on the right hand side. Notice also that if we see (10) as a Bregman divergence, , then the canonical link is the function that defines uniquely the dual affine coordinate system of the divergence [Amari and Nagaoka, 2000] (see also [Reid and Williamson, 2010, Appendix B]).

We can repeat the derivations for the partial loss , which yields [Reid and Williamson, 2010, eq. (5)]:


and using the canonical link, we get this time


We get from (31) and (34) the canonical proper composite loss


Note that for the optimisation of for , we could discount the right-hand side parenthesis, which acts just like a constant with respect to . Using Fenchel-Young inequality yields the non-negativity of as it brings and so


from Jensen’s inequality (the conditional Bayes risk is always concave [Reid and Williamson, 2010]). Now, if we consider the alternative use of Fenchel-Young inequality,


then if we let


then we get


It follows from (36) and (39),


and we get, ,




and we recall