# Monge beats Bayes: Hardness Results for Adversarial Training

## Authors

• 7 publications
• 25 publications
• 32 publications
• 10 publications
• 12 publications
• 13 publications
• ### Towards Understanding the Dynamics of the First-Order Adversaries

An acknowledged weakness of neural networks is their vulnerability to ad...
10/20/2020 ∙ by Zhun Deng, et al. ∙ 0

Deep Convolution Neural Networks (CNNs) can easily be fooled by subtle, ...
07/29/2020 ∙ by Muzammal Naseer, et al. ∙ 8

• ### Adversarial Risk via Optimal Transport and Optimal Couplings

The accuracy of modern machine learning algorithms deteriorates severely...
12/05/2019 ∙ by Muni Sreenivas Pydi, et al. ∙ 0

• ### Lower Bounds on Adversarial Robustness from Optimal Transport

While progress has been made in understanding the robustness of machine ...
09/26/2019 ∙ by Arjun Nitin Bhagoji, et al. ∙ 0

• ### Testing Robustness Against Unforeseen Adversaries

Considerable work on adversarial defense has studied robustness to a fix...
08/21/2019 ∙ by Daniel Kang, et al. ∙ 1

We study a fundamental question concerning adversarial noise models in s...
11/19/2021 ∙ by Guy Blanc, et al. ∙ 0

• ### Notions of Centralized and Decentralized Opacity in Linear Systems

We formulate notions of opacity for cyberphysical systems modeled as dis...
03/16/2019 ∙ by Bhaskar Ramasubramanian, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Starting from the observation that deep nets are sensitive to imperceptible perturbations of their inputs (Szegedy et al., 2013), a surge of recent work has focussed on new adversarial training

approaches to supervised learning

(Athalye et al., 2018a, b; Bastani et al., 2016; Buckman et al., 2018; Bubeck et al., 2018; Cai et al., 2018; Dhillon et al., 2018; Fawzi et al., 2018; Gilmer et al., 2018; Goswami et al., 2018; Guo et al., 2018; Ilias et al., 2018; Kurakin et al., 2017; Ma et al., 2018; Madry et al., 2018; Samangouei et al., 2018; Song et al., 2018; Tramèr et al., 2018; Uesato et al., 2018; Wang et al., 2018; Wong and Zico Kolter, 2018) (and references within). In the now popular model of Madry et al. (2018), we want to learn a classifier from a set , given a distribution of clean examples and loss . Adversarial training then seeks to find

 argminh∈H\E(\X,\Y)∼D[max\veδ:∥\veδ∥≤δ∗ℓ(\Y,h(\X+\veδ))], (1)

where is a norm and is the budget of the adversary. It has recently been observed that adversarial training damages standard accuracy as data size and adversary’s budget () increases Tsipras et al. (2019). A Bayesian explanation is given for a particular in Tsipras et al. (2019), and the authors conclude their findings questioning the interplay between adversarial robustness and standard accuracy.

In this paper, we dig into this relationship (i) by casting the standard accuracy and loss in (1) in the broad context of Bayesian decision theory Grünwald and Dawid (2004) and (ii) by considering a general form of adversaries, not restricted to the ones used in (1). In particular, we assume that the loss is proper, which is just a general form of statistical unbiasedness that many popular choices meet (Hendrickson and Buehler, 1971; Reid and Williamson, 2010). The minimization of a proper loss gives guarantees on the accuracy (for example, Kearns and Mansour (1996)), so it directly connects to the setting of Tsipras et al. (2019). Regarding the adversaries, instead of relying on the local adversarial modification for some , we consider a set of possible local modifications for some ( fixed). We then analyze the conditions on under which, for some ,

 (2)

where is the loss of the "blunt" predictor which predicts nothing. If has range

, this blunt predictor is in general 0 (for the log loss, square loss, etc), which translates into a class probability estimate of

for all observations and global accuracy of for two classes, i.e. that of an unbiased coin. We see the connection of (2) to the accuracy: as , the learner will be tricked into converging to an extremely poorly accurate predictor. How one can design such provably efficient adversaries, furthermore under tight budget constraints, is the starting point of our paper.

Our first contribution (Section 4) analyzes budgeted adversaries that can enforce (2). Our main finding shows that (2) is implied by a very simple condition involving a central quantity generalizing the celebrated integral probability metrics (Sriperumbudur et al., 2009). Furthermore, under some additional condition on the loss, satisfied by the log, square and Matsushita losses, the adversarial optimization of does not depend on the loss. In other words, [colframe=blue,boxrule=0.5pt,arc=4pt,left=6pt,right=6pt,top=6pt,bottom=6pt,boxsep=0pt]

the adversary can attack the learner disregarding its loss.

Our second contribution (Section 5) considers the adversarial optimisation of when the classifiers in satisfy a generalized form of Lipschitz continuity. Controlling Lipschitz continuity has recently emerged as a solution to limit the impact of adversarial examples (Cissé et al., 2017). In this context, efficient budgeted adversaries take a particular form: we show that, for an adversary to minimize , [colframe=blue,boxrule=0.5pt,arc=4pt,left=6pt,right=6pt,top=6pt,bottom=6pt,boxsep=0pt]

it is sufficient to compress the optimal transport plan between class marginals using the Lipschitz function as transportation cost, disregarding the learner’s .

This result puts the machinery of optimal transport (OT) to the table of adversarial design Villani (2009), with a new purpose (the compression of OT plans). These two findings turn out to be very useful from an experimental standpoint: we have implemented two kinds of adversaries inspired by our theory (called Mixup and Monge for their respective links with Zhang et al. (2018); Villani (2009)); Table 1 displays their behaviour on two simple problems. We have observed that training a learner against a ”weak” (severely budgeted) adversary improves generalization on clean data, a phenomenon also observed elsewhere Tsipras et al. (2019); Zhang et al. (2018). The digit experiment displays how our adversaries progressively transform observations of one class into credible observations of the other (See Experiments in Section 7, and the Supplement, ).

Our third contribution (Section 6) is an adversarial boosting result: it answers the question as to whether one can efficiently craft an arbitrarily strong adversary from the sole access to a black box weak adversary. In the framework of reproducing kernel Hilbert spaces (RKHS), we show that [colframe=blue,boxrule=0.5pt,arc=4pt,left=6pt,right=6pt,top=6pt,bottom=6pt,boxsep=0pt]

this "weak adversary" "strong adversary" design does exist, and our proof is constructive: we build one.

Our proof revolves around a standard concept of fixed point theory: contractive mappings. We insist on the computational efficiency of this design, linear in the coding size of the Wasserstein distance between class marginals. It shows that, on some adversarial training problems, the existence of the weakest forms of adversaries implies that much stronger ones may be available at cheap (computational) cost.

## 2 Related work

Formal approaches to the problem of adversarial training are sparse compared to the growing literature on the arms race of experimental results. The formal trend has started on adversarial changes to a loss to be optimized (Sinha et al., 2018) or more directly on a classifier’s output (Hein and Andriushchenko, 2017; Raghunathan et al., 2018). For example, (Sinha et al., 2018) add a Wasserstein penalty to a loss, computing the distance between the true and adversarial distributions. They provide smoothness guarantees for the loss and robustness in generalization for its minimization. (Raghunathan et al., 2018) directly penalize the classifier’s output (not a loss per se), in the context of shallow networks, and compute adversarial perturbations in a bounded ball. A similar approach (but in -norm) is taken in (Hein and Andriushchenko, 2017) for kernel methods and shallow networks. Recent ones also focus on introducing general robustness constraints (Bastani et al., 2016).

## 3 Definitions and notations

We present some important definitions and notations.

Proper losses. Many of our notations follow Reid and Williamson (2010). Suppose we have a prediction problem with binary labels. We let

denote a general loss function to be minimized, where the left argument is a class

and the right argument is a class probability estimate ( is the closure of ). Its conditional Bayes risk function is the best achievable loss when labels are drawn with a particular positive base-rate,

 \cbr(π) \defeq infc\E\Y∼π\properloss(\Y,c), (3)

where , so that and . We call the loss proper iff Bayes prediction locally achieves the minimum everywhere222Losses for which properness makes particular sense are called class probability estimation losses (Reid and Williamson, 2010).: . One value of is interesting in our context, the one which corresponds to Bayes rule returning maximal "uncertainty", i.e. for ,

 \properloss∘ \defeq \cbr(12). (4)

Without further ado, we give the key definition which makes more precise the framework sketched in (2). For any proper loss and integrable with respect to some distribution , the adversarial loss is defined as

 \properloss(H,A,D) \defeq minh∈H\E(\X,\Y)∼D[maxa∈A\properloss(\Y,h∘a(\X))]. (5)

For any , we say that is -defeated by on iff

 \properloss(H,A,D) ≥ (1−ϵ)⋅\properloss∘. (6)

Intuitively, if the adversary can modify instances such that the learner does not do much better than a trivial blunt constant predictor, the adversary can declare victory. The additional quantities (such as the integrability condition) are given later in this section. To finish up with general proper losses, as an example, the log-loss given by and is proper, with conditional Bayes risk given by the Shannon entropy .

Composite, canonical proper losses. We let denote a set of classifiers. To convert real valued predictions into class probability estimates (McCullagh and Nelder, 1989), one traditionally uses an invertible link function , forming a composite loss (Reid and Williamson, 2010). We shall leave hereafter the adjective composite for simplicity, and the link implicit from context whenever appropriate. The unique (up to multiplication or addition by a scalar (Buja et al., 2005)) canonical link for a proper loss is defined from the conditional Bayes risk as (Reid and Williamson, 2010, Section 6.1), (Buja et al., 2005). As an example, for log-loss we find the canonical link , with inverse the well-known sigmoid . A proper loss will also be assumed to be twice differentiable. Twice differentiability is a technical convenience to simplify derivations. It can be removed (Reid and Williamson, 2010, Footnote 6). A canonical proper loss is a proper loss using the canonical link.

Adversaries. Let denote a set of adversaries, so that any is allowed to transform instances in some way (e.g., change pixel values on an image). Suppose (fixed) denotes a distribution over and (resp. ) is the corresponding distribution conditioned on (resp. ). The only assumption we make about adversaries is a measurability one. We assume that , is integrable with respect to and : . For the sake of simplicity, we shall denote the tuple integrable with respect to . Assuming loss is proper composite with link , there is one interesting constant :

 h∘ \defeq ψ(12), (7)

because this value delivers the real valued prediction corresponding to maximal uncertainty in (4). For example, when the loss is proper canonical and furthermore required to be symmetric, i.e. there is no class-dependent misclassification cost, we have (Nock and Nielsen, 2008)

 h∘ = 0, (8)

which corresponds to a classifier always abstaining and indeed delivering maximal uncertainty on prediction. It is not hard to check that is the loss of constant . So we can now see that in Definition 3, as , training against the adversarial loss essentially produces a classifier no better than predicting nothing. We do not assume that , but keep in mind that such prediction with maximal uncertainty is the baseline against which a learner has to compete to "learn" something.

The adversarial distortion parameter . We now unveil the key parameter used earlier in the Introduction. For any , , we let:

 ϕ(Q,f,u,v) \defeq ∫Xu⋅(f(\vex)+v)dQ(\vex). (9)

For any , the adversarial distortion is:

[colframe=blue,boxrule=0.5pt,arc=4pt,left=6pt,right=6pt,top=6pt,bottom=6pt,boxsep=0pt]

 \upgammagH,a(P,N,π,b,c) \defeq maxh∈H{ϕ(P,g∘h∘a,π,b)−ϕ(N,g∘h∘a,1−π,−c)}. (10)

Finally, . While abstract, we shall shortly see that quantities

relate to a well-known object in the study of distances between probability distributions. Let

 \upgammaellpi \defeq π\cbr(1)+(1−π)\cbr(0). (11)

As an example, we have for the the log-loss , with the convention . We remark that in (11) is related to in (10):

 \upgammaellpi = \upgammag∗H∗,a(P,N,π,0,0), (12)

for and the singleton classifier which makes the hard prediction over and over (Hereafter, we note instead of for short). Remark that such a classifier is not affected by a particular adversary, but it is not implementable in the general case as it would require to know the class of an observation.

## 4 Main result: the hardness theorem

We now show a lower bound on the adversarial loss of (5).

For any proper loss , link and any integrable with respect to , the following holds true:

 \properloss(H,A,D) ≥ (\properloss∘−12⋅mina∈Aβa)+, (13)

where:

 x+ \defeq max{0,x}, βa \defeq \upgammagH,a(P,N,π,2\cbr(1),2\cbr(0)), g \defeq (−\cbr′)∘ψ−1. (14)

(all other parameters implicit in the definition of , Proof in , Section 10) This pins down a simple condition for the adversary to defeat . Under the conditions and with notations of Theorem 4, if there exists and such that

 βa ≤ 2ϵ\properloss∘, (15)

then is -defeated by on . (Proof in , Section 10) We remark that whenever is canonical, and so

 βa = \upgammaH,a(P,N,π,2\cbr(1),2\cbr(0)). (16)

We also note that constants get out of the maximization problem in (10) so when is canonical, the optimisation of does not depend on the loss at hand — hence, its optimisation by an adversary could be done without knowing the loss that the learner is going to minimise. We also remark that the condition for to be -defeated by does not involve an algorithmic component: it means that any learning algorithm minimising loss will end up with a poor predictor if (15) is satisfied, regardless of its computational resources.

Relationships with integral probability metrics. In a special case, the somewhat abstract quantity can be related to the more familiar class of integral probability metrics (IPMs) (Sriperumbudur et al., 2009). The latter are a class of metrics on probability distributions, capturing e.g. the total variation divergence, Wasserstein distance, and maximum mean discrepancy. The proof of the following Corollary is immediate.

Suppose and is closed by negation. Then

 2⋅βa = maxh∈H∣∣∣∫Xg∘h∘a(\vex)dP(\vex)−∫Xg∘h∘a(\vex)dN(\vex)∣∣∣,

which is the integral probability metric for the class on and . Here, is defined in (14). We may now interpret Theorem 4 as saying: for an adversary to defeat a learner minimising a proper loss, it suffices to make a suitable IPM between the class-conditionals small. The particular choice of IPM arises from the learner’s choice of hypothesis class, . Of particular interest is when this comprises kernelized scorers, as we now detail.

Relationships with the maximum mean discrepancy. The maximum mean discrepancy (MMD) (Gretton et al., 2006) corresponds to an IPM where is the unit-ball in an RKHS. We have the following re-expression of for this hypothesis class, which turns out to involve the MMD.

Suppose is proper canonical and let denote the unit ball of a reproducing kernel Hilbert space (RKHS) of functions with reproducing kernel . Denote

 μa,Q \defeq ∫Xκ(a(\vex),.)dQ(\vex) (17)

the adversarial mean embedding of on . If and , then

 2⋅βa = 12⋅∥μa,P−μa,N∥H. (18)

The constraints on are for readability: the proof (in , Section 11) shows a more general result, with unrestricted . The right-hand side of (18) is proportional to the MMD between and . In the more general case, the right-hand side of (18) is replaced by . Figure 1 displays an example picture (for unrestricted ) for some canonical proper but asymmetric loss () when an adversary with a given upperbound guarantee on can indeed -defeat some . We remark that while this may be possible for a whole range of , this may not be possible for all. The picture would be different if the loss were symmetric (Corollary 4 below), since in this case a guarantee to -defeat for some would imply a guarantee for all. Loss asymmetry thus brings a difficulty for the adversary which, we recall, cannot act on .

Simultaneously defeating over sets of losses. Satisfying (15) involves at least the knowledge of one value of the loss, if not of the loss itself. It turns out that if the loss is canonical and the adversary has just a partial knowledge of it, it may in fact still be possible for him to guess whether (15) can be satisfied over this set, as we now show.

Let be a set of canonical proper losses satisfying the following property: such that . Assuming integrable with respect to , if

 ∃a∈A:\upgammaH,a(P,N,π,0,0) ≤ ϵ⋅infℓ∈L\properloss∘−\cbr†, (19)

then is jointly -defeated by on all losses of . Notice that all the adversary needs to know is . The result easily follows from remarking that we have in this case:

 βa = 2\cbr†+\upgammaH,a(P,N,π,0,0),

which we then plug in (15) to get the statement of the Corollary. Corollary 4 is interesting for two reasons. First, it applies to all proper symmetric losses (Nock and Nielsen, 2008; Reid and Williamson, 2010), which includes popular losses like the square, logistic and Matsushita losses. Finally, it does not just offer the adversarial strategy to defeat classifiers that would be learned on any of such losses, it also applies to more sophisticated learning strategies that would tune the loss at learning time (Nock and Nielsen, 2008; Reid and Williamson, 2010) or tailor the loss to specific constraints (Buja et al., 2005).

We now highlight a sufficient condition on adversaries for (15) to be satisfied, which considers classifiers in the increasingly popular framework of "Lipschitz classification" for adversarial training (Cissé et al., 2017), and turns out to frame adversaries in optimal transport (OT) theory (Villani, 2009). We proceed in three steps, first framing OT adversaries, then Lipschitz classifiers and finally showing how the former defeats the latter. Given any and some , we say that is -Monge efficient for cost on marginals iff , with

 C(a,P,N) \defeq inf\muup∈Π(P,N)∫c(a(\vex),a(\vex′))d\muup(\vex,\vex′),

and is the set of all joint probability measures whose marginals are and . Hence, Monge efficiency relates to an efficient compression of the transport plan between class marginals. In fact, we should require to satisfy some mild additional assumptions for the existence of optimal couplings (Villani, 2009, Theorem 4.1), such as lower semicontinuity. We skip them for the sake of simplicity, but note that infinite costs are possible without endangering the existence of optimal couplings of (Villani, 2009), which is convenient for the following generalized notion of Lipschitz continuity. Let . For some and , set is said to be -Lipschitz with respect to iff

 (20)

We shall also write that is -Lipschitz if Definition 5 holds for ( implicit). Actual Lipschitz continuity would restrict to involve a distance, and the state of the art of adversarial training would restrict further the distance to be based on a norm (Cissé et al., 2017). Equipped with this, we obtain the main result of this Section.

Fix any and proper canonical loss . Suppose such that:

1. is -Lipschitz with respect to ;

2. is -Monge efficient for cost on marginals for

 δ ≤ 4ϵ\properloss∘−2\upgammaellpiK. (21)

Then is -defeated by on . The proof (in , Section 12) is given for the more general case where is not necessarily and any proper loss, not necessarily canonical. We also show in the proof that unless , cannot be a distance in the general case. We take it as a potential difficulty for the adversary which, we recall, cannot act on .

Theorem 5 is particularly interesting with respect to the current developing strategies around adversarial training that "Lipschitzify" classifiers (Cissé et al., 2017). Such strategies assume that the loss is Lipschitz (remark that we do not make such an assumption). In short, if we rename the inner part (within ) in (5), those strategies exploit the fact that (omitting key parameters for readability)

where is the adversary-free loss and is the Lipschitz constant of the loss () or classifier learned (). One might think that minimizing (22) is not a good strategy in the light of Theorem 5 because the regularization enforces a minimization of ( in Theorem 5), so we seemingly alleviate constraints on the adversary to be -Monge efficient in (21) and can end up being more easily defeated. This is however a too simplistic conclusion that does not take into account the other parameters at play, as we now explain in the context of Cissé et al. (2017). Consider the logistic loss (Cissé et al., 2017), for which:

 \properloss∘=Kℓ=1,\upgammaellpi=0. (23)

Suppose we can reduce both and (which is in fact not hard to ensure for deep architectures (Miyato et al., 2018, Section 2.1), (Cranko et al., 2018)) so that . Reorganizing, we get , so for to be -defeated, we in fact get a constraint on : , which reframes the constraint on in (21) as (see also , (57)),

 δ ≤ 4\properloss∘−\upgammaellpiKh=4, (24)

which does not depend anymore on .

The proof of Theorem 5 is followed in  by a proof of an interesting generalization in the light of those recent results (Cissé et al., 2017; Cranko et al., 2018; Miyato et al., 2018): the Monge efficieny requirement can be weakened under a form of dominance (similar to a Lipschitz condition) of the canonical link with respect to the chosen link of the loss. We now provide a simple family of Monge efficient adversaries.

Mixup adversaries. Very recently, it was experimentally demonstrated how a simple modification of a training sample yields models more likely to be robust to adversarial examples and generalize better (Zhang et al., 2018)

. The process can be summarized in a simple way: perform random interpolation between two randomly chosen training examples to create a new example (repeat as necessary). Since we do not allow the adversary to tamper with the class, we define as

-mixup (for ) the process which creates for two observations and having a different class the following adversarial observation (same class as ):

 a(\vex) \defeq λ⋅\vex+(1−λ)⋅\vex′. (25)

We make the assumption that is metric with an associated distance that stems from this metric. We analyze a very simple case of -mixup, which we call -mixup to , which replaces by some in in (25). Notice that as , we converge to the maximally harmful adversary mentioned in the introduction. The intuition thus suggests that the set of all -mixups to some (where we vary ) designs in fact an arbitrarily Monge efficient adversary, where the optimal transport problem involves the associated distance of . This is indeed true and in fact simple to show. For any the set of all -mixups to is -Monge efficient for , where is the 1-Wasserstein distance between the class marginals. (Proof in , Section 14) The mixup methodology as defined in Zhang et al. (2018) can be specialized in numerous ways: for example, instead of mixing up with a single observation, we could perform all possible mixups within in a spirit closer to Zhang et al. (2018), or mixups with several distinguished observations (e.g. after clustering), etc. . Many choices like these would be eligible to be at least Monge efficient, but while they can be computationally simple to compute, they are just surrogates for Monge efficiency: tackling directly the compression of the optimal transport plan is a more direct option to Monge efficiency.

## 6 From weak to strong Monge efficiency

In Theorem 5, we showed how Monge efficiency for adversaries can "take over" Lipschitz classifiers and defeat them for some . Suppose now that the we have is weak in that all its elements are Monge efficient but for large values of . In other words, we cannot satisfy condition (2) in Theorem 5. Is there another set of adversaries, , whose elements would combine the elements of is a computationally savvy way, and which would achieve any desired level of Monge efficiency? Such a question parallels that of the boosting framework in supervised learning, in which one combines classifiers just different from random to achieve a combination arbitrarily accurate (Schapire and Freund, 2012).

We now answer our question by the affirmative, in the context of kernel machines. Let denote a RKHS and a feature map of the RKHS. , define cost

 CΦ(f,P,N) \defeq inf\muup∈Π(P,N)∫X∥Φ∘f(\vex)−Φ∘f(\vex′)∥Hd\muup(\vex,\vex′).

Function is said -contractive for , for some iff . Set is said -contractive for iff it contains at least one adversary -contractive for (and we make no assumption on the others). Define now for any , and , the 1-Wasserstein distance between class marginals in the feature map. Let denote a RKHS with feature map and be -contractive for . Then is -Monge efficient for . Furthermore, , is -Monge efficient when . (Proof in , Section 13) To amplify the difference between and , remark that the worst case of Monge efficiency is , since it is just the Monge efficiency for contracting nothing. So, as , there is barely any guarantee we can get from the -contractive while can still be arbitrarily Monge efficient for a linear in the coding size of the Wasserstein distance between class marginals.

## 7 Experiments

We have performed toy experiments to demonstrate our new setting. Our objective is not to investigate the competition with respect to the wealth of results that have been recently published in the field, but rather to touch upon the interest that such a novel setting might have for further experimental investigations. Compared to the state-of-the-art, ours is a clear two-stage setting where we first compute the adversaries assuming relevant knowledge of the learner (in our case, we rely on Theorem 6 and therefore assume that the adversary knows at least the cost , see below), and then we learn based on an adversarially transformed set of examples. This process has the advantage over the direct minimization of (2) that it extracts the computation of the adversarial examples from the training loop: we can generate once the adversarial examples, then store them and / or share / reuse them to robustly train various models (recall that under a general Lipschitz assumptions on classifiers, such examples can fit the adversarial training of different kinds of models, see Theorem 5

). This process is also reminiscent of the training process for invariant support vector machines

(DeCoste and Schölkopf, 2002) and can also be viewed as a particular form of vicinal risk minimization (Chapelle et al., 2000). We have performed two experiments: a 1D experiment involving a particular Mixup adversary and a USPS experiment involving a closer proxy of the optimal transport compression that we call Monge adversary.

1D experiment, mixup adversary. Our example involves the unit interval with and . We let contain a single deterministic mapping parametrised by as . Notice that this adversary is just the -mixup to the unconditional mean, following Section 6. We further let be the space of linear functions , , which is the RKHS with linear kernel (assuming that and include the constant 1), and . The transport cost function of interest is . We discretize to simplify the computation of the OT cost. Results are summarized in Figure 2 (and ). We theoretically achieve loss as . There are several interesting observations from Figure 2: first, the mixup adversary indeed works like a Monge efficient adversary: by tuning , we can achieve any desired level of Monge efficiency. The left plot completes in this simple case observations of Tsipras et al. (2019); Zhang et al. (2018): the worst result is consistently obtained for training on clean data and testing on adversarial data, which indicates that our adversaries may be useful to get robustness using adversarial training.

USPS digits, Monge adversary. We have picked 100 examples of each of the "1" and "3" classes of the 88 pixel greyscale USPS handwritten digit dataset. The set of Monge adversaries is , in which, under the budget constraint, we optimize the Wasserstein distance

between the empirical class marginals. We achieve this by combining a generic gradient-free optimiser with a linear program solver

333Code available upon request to CW. We learn using logistic regression. We demonstrate three strengths of adversary — namely where is distance between the (clean) class conditional means. Sample transformations as obtained by the Monge adversary are displayed in Figure 2 (more in ), and Table 2 provides log loss values for different training / test schemes, following the scheme of the 1D data. It clearly emerges two facts: (i) as the budget increases, the Monge adversary smoothly transforms digits in credible adversarial examples, and (ii), as previously observed, training over a tight budget adversary tends to increase generalization abilities Tsipras et al. (2019); Zhang et al. (2018).

## 8 Conclusion

It has been observed over the past years that classifiers can be extremely sensitive to changes in inputs that would be imperceptible to humans. How such tightly limited resource

-constrained changes can affect and be so damaging to machine learning and how to find a cure has been growing as a very intensive area of research. There is so far little understanding on the formal side and many experimental approaches would rely on adversarial data that, in some way, shrinks the gap between classes in a controlled way.

In this paper, we studied the intuition that such a process can indeed be beneficial for adversarial training. Our answer involves a simple, sufficient (and sometimes loss-independent) property for any given class of adversaries to be detrimental to learning. This property involves a measure of “harmfulness”, which relates to (and generalizes) integral probability metrics and the maximum mean discrepancy. We presented a sufficient condition for this sufficient property to hold for Lipschitz classifiers, which relies on framing it into optimal transport theory. This brings a general way to formalize how adversaries can indeed "shrink the gap" between classes with the objective to be detrimental to learning. As an example, we delivered a negative boosting result which shows how weakly contractive adversaries for a RKHS can be combined to build a maximally detrimental adversary. We also provided justifications that several experimental approaches to adversarial training involve proxies for adversaries like the ones we analyze. On the experimental side, we provided a simple toy assessment of the ways one can compute and then use such adversaries in a two-stage process.
Our experimental results, even when carried out on a toy domain, bring additional reasons to consider such adversaries, this time from a generalization standpoint: our results might indeed indicate that they could at least be useful to gain additional robustness in generalization.

## Acknowledgments

The authors warmly thank Kamalika Chaudhuri, Giorgio Patrini, Bob Williamson, Xinhua Zhang for numerous remarks and stimulating discussions around this material.

## References

• Amari and Nagaoka [2000] S.-I. Amari and H. Nagaoka. Methods of Information Geometry. Oxford University Press, 2000.
• Athalye et al. [2018a] A. Athalye, N. Carlini, and D.-A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In 35 ICML, 2018a.
• Athalye et al. [2018b] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust adversarial examples. In 35 ICML, 2018b.
• Bastani et al. [2016] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A.-V. Nori, and A. Criminisi. Measuring neural net robustness with constraints. In NIPS*29, 2016.
• Bubeck et al. [2018] S. Bubeck, E. Price, and I. Razenshteyn. Adversarial examples from computational constraints. CoRR, abs/1805.10204, 2018.
• Buckman et al. [2018] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow. Thermometer encoding: one hot way to resist adversarial examples. In 6 ICLR, 2018.
• Buja et al. [2005] A. Buja, W. Stuetzle, and Y. Shen. Loss functions for binary class probability estimation ans classification: structure and applications, 2005. Technical Report, University of Pennsylvania.
• Cai et al. [2018] Q.-Z. Cai, M. Du, C. Liu, and D. Song. Curriculum adversarial training. In IJCAI-ECAI’18, 2018.
• Chapelle et al. [2000] O. Chapelle, J. Weston, L. Bottou, and V. Vapnik. Vicinal risk minimization. In Advances in Neural Information Processing Systems*13, 2000.
• Cissé et al. [2017] M. Cissé, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier. Parseval networks: improving robustness to adversarial examples. In 34 ICML, 2017.
• Cranko et al. [2018] Z. Cranko, S. Kornblith, Z. Shi, and R. Nock. Lipschitz networks and distributional robustness. CoRR, abs/1809.01129, 2018.
• DeCoste and Schölkopf [2002] D. DeCoste and B. Schölkopf. Training invariant support vector machines. Machine Learning, 46:161–190, 2002.
• Dhillon et al. [2018] G. Dhillon, K. Azizzadenesheli, Z.-C. Lipton, J. Bernstein, J. Kossaifi, A. Khanna, and A. Anandkumar. Stochastic activation pruning for robust adversarial defense. In 6 ICLR, 2018.
• Fawzi et al. [2018] A. Fawzi, H. Fawzi, and O. Fawzi. Adversarial vulnerability for any classifier. CoRR, abs/1802.08686, 2018.
• Gilmer et al. [2018] J. Gilmer, L. Metz, F. Faghri, S.-S. Schoenholz, M. Raghu, M. Wattenberg, and I. Goodfellow. Adversarial spheres. In 6 ICLR Workshops, 2018.
• Goswami et al. [2018] G. Goswami, N. Ratha, A. Agarwal, R. Singh, and M. Vatsa.

Unravelling robustness of deep learning based face recognition against adversarial attacks.

In AAAI’18, 2018.
• Gretton et al. [2006] A. Gretton, K.-M. Borgwardt, M.-J. Rasch, B. Schölkopf, and A.-J. Smola. A kernel method for the two-sample-problem. In NIPS*19, pages 513–520, 2006.
• Grünwald and Dawid [2004] P. Grünwald and P. Dawid. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. of Stat., 32:1367–1433, 2004.
• Guo et al. [2018] C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Countering adversarial images using input transformations. In 6 ICLR, 2018.
• Hein and Andriushchenko [2017] M. Hein and M. Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. In NIPS*30, 2017.
• Hendrickson and Buehler [1971] A.-D. Hendrickson and R.-J. Buehler. Proper scores for probability forecasters. Annals of Mathematical Statistics, 42:1916–11921, 1971.
• Ilias et al. [2018] A. Ilias, L. Engstrom, A. Athalye, and J. Lin. Adversarial attacks under restricted threat models. In 35 ICML, 2018.
• Kearns and Mansour [1996] M. Kearns and Y. Mansour.

On the boosting ability of top-down decision tree learning algorithms.

In Proc. of the 28 ACM STOC, pages 459–468, 1996.
• Kurakin et al. [2017] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In 5 ICLR, 2017.
• Ma et al. [2018] X. Ma, B. Li, Y. Wang, S.-M. Erfani, S. Wijewickrema, G. Schoenebeck, D. Song, M.-E. Houle, and J. Bayley. Characterizing adversarial subspaces using local intrinsic dimensionality. In 6 ICLR, 2018.
• Madry et al. [2018] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In 6 ICLR, 2018.
• McCullagh and Nelder [1989] P. McCullagh and J. Nelder. Generalized Linear Models. Chapman Hall/CRC, 1989.
• Miyato et al. [2018] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. In ICLR’18, 2018.
• Nock and Nielsen [2008] R. Nock and F. Nielsen. On the efficient minimization of classification-calibrated surrogates. In NIPS*21, pages 1201–1208, 2008.
• Raghunathan et al. [2018] A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. In 6 ICLR, 2018.
• Reid and Williamson [2010] M.-D. Reid and R.-C. Williamson. Composite binary losses. JMLR, 11:2387–2422, 2010.
• Samangouei et al. [2018] P. Samangouei, M. Kabkab, and R. Chellappa. Defense-GAN: protecting classifiers against adversarial attacks using generative models. In 6 ICLR, 2018.
• Schapire and Freund [2012] R.-E. Schapire and Y. Freund. Boosting, Foundations and Algorithms. MIT Press, 2012.
• Shuford et al. [1966] E. Shuford, A. Albert, and H.-E. Massengil. Admissible probability measurement procedures. Psychometrika, pages 125–145, 1966.
• Sinha et al. [2018] A. Sinha, H. Namkoong, and J. Duchi. Certifying some distributional robustness with principled adversarial training. In 6 ICLR, 2018.
• Song et al. [2018] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. PixelDefend: leveraging generative models to understand and defend against adversarial examples. In 6 ICLR, 2018.
• Sriperumbudur et al. [2009] B.-K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G.-R.-G. Lanckriet. On integral probability metrics, -divergences and binary classification. CoRR, abs/0901.2698, 2009.
• Szegedy et al. [2013] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
• Tramèr et al. [2018] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel. Ensemble adversarial training: attacks and defenses. In 6 ICLR, 2018.
• Tsipras et al. [2019] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry.

Robustness may be at odds with accuracy.

In 7 ICLR, 2019.
• Uesato et al. [2018] J. Uesato, B. O’Donoghue, P. Kohli, and A. van den Oord. Adversarial risk and the dangers of evaluating against weak attacks. In 35 ICML, 2018.
• Villani [2009] C. Villani. Optimal transport, old and new. Springer, 2009.
• Wang et al. [2018] Y. Wang, S. Jha, and K. Chaudhuri. Analyzing the robustness of nearest neighbors to adversarial examples. In 35 ICML, 2018.
• Wong and Zico Kolter [2018] E. Wong and J. Zico Kolter. Provable defense against adversarial examples via the outer adversarial polytope. In 35 ICML, 2018.
• Zhang et al. [2018] H. Zhang, M. Cisse, Y.-D. Dauphin, and D. Lopez-Paz. mixup: beyond empirical risk minimization. In 6 ICLR, 2018.

## 10 Proof of Theorem 4 and Corollary 4

Our proof assumes basic knowledge about proper losses (see for example Reid and Williamson [2010]). From [Reid and Williamson, 2010, Theorem 1, Corollary 3] and Shuford et al. [1966], being twice differentiable and proper, its conditional Bayes risk and partial losses and are related by:

 −\cbr′′(c)=\properloss′−1(c)c=−\properloss′1(c)1−c,∀c∈(0,1). (26)

The weight function [Reid and Williamson, 2010, Theorem 1] being also , we get from the integral representation of partial losses [Reid and Williamson, 2010, eq. (5)],

 \properloss1(c) = −∫1c(1−u)\cbr′′(u)du, (27)

from which we derive by integrating by parts and then using the Legendre conjugate of ,

 \properloss1(c)+\cbr(1) = −[(1−u)\cbr′(u)]1c−∫1c\cbr′(u)du+\cbr(1) = (1−c)\cbr′(c)+\cbr(c)−\cbr(1)+\cbr(1) = −(−\cbr′)(c)+c⋅(−\cbr′)(c)−(−\cbr)(c) = −(−\cbr′)(c)+(−\cbr)⋆((−\cbr)′(c)). (29)

Now, suppose that the way a real-valued prediction is fit in the loss is through a general inverse link . Let

 vℓ,ψ \defeq (−\cbr′)∘ψ−1(v). (30)

Since , the proper composite loss with link on prediction is the same as the proper composite loss with link on prediction . This last loss is in fact using its canonical link and so is proper canonical [Reid and Williamson, 2010, Section 6.1], [Buja et al., 2005]. Letting in this case , we get that the partial loss satisfies

 \properloss1(c) = −vℓ,ψ+(−\cbr)⋆(vℓ,ψ)−\cbr(1). (31)

Notice the constant appearing on the right hand side. Notice also that if we see (10) as a Bregman divergence, , then the canonical link is the function that defines uniquely the dual affine coordinate system of the divergence [Amari and Nagaoka, 2000] (see also [Reid and Williamson, 2010, Appendix B]).

We can repeat the derivations for the partial loss , which yields [Reid and Williamson, 2010, eq. (5)]:

 \properloss−1(c)+\cbr(0) = −∫c0u\cbr′′(u)du+\cbr(0) = −[u\cbr′(u)]c0+∫c0\cbr′(u)du = −c\cbr′(c)+\cbr(c)−\cbr(0)+\cbr(0) = c⋅(−\cbr′)(c)−(−\cbr)(c) = (−\cbr)⋆((−\cbr)′(c)), (33)

and using the canonical link, we get this time

 \properloss−1(c) = (−\cbr)⋆(vℓ,ψ)−\cbr(0). (34)

We get from (31) and (34) the canonical proper composite loss

 \properloss(y,v) = (−\cbr)⋆(vℓ,ψ)−y+12⋅vℓ,ψ−12⋅((1−y)⋅\cbr(0)+(1+y)⋅\cbr(1)). (35)

Note that for the optimisation of for , we could discount the right-hand side parenthesis, which acts just like a constant with respect to . Using Fenchel-Young inequality yields the non-negativity of as it brings and so

 \properloss(y,v) ≥ \cbr(1+y2)−12⋅((1−y)⋅\cbr(0)+(1+y)⋅\cbr(1)) (36) =\cbr(12⋅(1−y)⋅0+12⋅(1+y)⋅1)−12⋅((1−y)⋅\cbr(0)+(1+y)⋅\cbr(1)) ≥ 0,∀y∈{−1,1},∀v∈R,

from Jensen’s inequality (the conditional Bayes risk is always concave [Reid and Williamson, 2010]). Now, if we consider the alternative use of Fenchel-Young inequality,

 (−\cbr)⋆(vℓ,ψ)−12⋅vℓ,ψ ≥ \cbr(12), (37)

then if we let

 Δ(y) \defeq \cbr(12)−12⋅((1−y)⋅\cbr(0)+(1+y)⋅\cbr(1)), (38)

then we get

 \properloss(y,v) ≥ Δ(y)−y2⋅vℓ,ψ,∀y∈{−1,1},∀v∈R. (39)

It follows from (36) and (39),

 \properloss(y,v) ≥ max{0,Δ(y)−y2⋅vℓ,ψ},∀y∈{−1,1},∀v∈R, (40)

and we get, ,

 \E(\X,\Y)∼D[\properloss(y,h∘a(\X))] (41) ≥ \E(\X,\Y)∼D[max{0,Δ(\Y)−\Y2⋅(h∘a)ℓ,ψ(\X)}] ≥ max{0,\E(\X,\Y)∼D[Δ(\Y)−\Y2⋅(h∘a(\X))ℓ,ψ]} =max{0,\cbr(12)−12⋅\E(\X,\Y)∼D[\Y⋅(h∘a(\X))ℓ,ψ+(1−\Y)⋅\cbr(0)+(1+\Y)⋅\cbr(1)]} = max{0,\cbr(12)−12⋅(\E\X∼P[π⋅((h∘a(\X))ℓ,ψ+2\cbr(1))]−\E\X∼N[(1−π)⋅((h∘a(\X))ℓ,ψ−2\cbr(0))])} = max{0,\cbr(12)−12⋅(ϕ(P,(h∘a)ℓ,ψ,π,2\cbr(1))−ϕ(N,(h∘a)ℓ,ψ,1−π,−2\cbr(0)))},

with

 ϕ(Q,f,b,c) \defeq ∫Xb⋅(f(\vex)+c)dQ(\vex), (42)

and we recall

 (h∘a)ℓ,ψ = (−\cbr′)∘ψ−1∘h∘a. (43)

Hence,

 minh∈H\E(\X,\Y)∼D[maxa∈A\properloss(\Y,h∘a(\X))] ≥ minh∈Hmaxa∈A\E(\X,\Y)∼D[\properloss(\Y,h∘a(\X))] ≥ minh∈Hmaxa∈Amax{0,\cbr(12)−12⋅(ϕ(P,(h∘a)ℓ,ψ,π,2\cbr(1))−ϕ(N,(h∘a)ℓ,ψ,1−π,−2\cbr(0)))} ≥ maxa∈Aminh∈Hmax{0,\cbr(12)−12⋅(ϕ(P,(h∘a)ℓ,ψ,π,2\cbr(1))−ϕ(N,(h∘a)ℓ,ψ,1−π,−2\cbr(0)))} =maxa∈Amax{0,minh∈H(\cbr(12)−12⋅(ϕ(P,(h∘a)ℓ,ψ,π,2\cbr(1))−ϕ(N,(h∘a)ℓ,ψ,1−π,−2\cbr(0))))} = maxa∈Amax{0,\cbr(12)−12⋅maxh∈