# Theoretical Analysis of Adversarial Learning: A Minimax Approach

We propose a general theoretical method for analyzing the risk bound in the presence of adversaries. In particular, we try to fit the adversarial learning problem into the minimax framework. We first show that the original adversarial learning problem could be reduced to a minimax statistical learning problem by introducing a transport map between distributions. Then we prove a risk bound for this minimax problem in terms of covering numbers. In contrast to previous minimax bounds in lee,far, our bound is informative when the radius of the ambiguity set is small. Our method could be applied to multi-class classification problems and commonly-used loss functions such as hinge loss and ramp loss. As two illustrative examples, we derive the adversarial risk bounds for kernel-SVM and deep neural networks. Our results indicate that a stronger adversary might have a negative impact on the complexity of the hypothesis class and the existence of margin could serve as a defense mechanism to counter adversarial attacks.

## Authors

• 1 publication
• 18 publications
• 210 publications

Classification problems in security settings are usually contemplated as...
02/21/2018 ∙ by Roi Naveiro, et al. ∙ 0

• ### An Application of Multiple-Instance Learning to Estimate Generalization Risk

We focus on several learning approaches that employ max-operator to eval...
11/14/2019 ∙ by Daiki Suehiro, et al. ∙ 0

State-of-the-art adversarial attacks are aimed at neural network classif...
02/04/2020 ∙ by Blerta Lindqvist, et al. ∙ 0

• ### Fundamental Limits of Adversarial Learning

Robustness of machine learning methods is essential for modern practical...
07/01/2020 ∙ by Kevin Bello, et al. ∙ 0

• ### Deep Minimax Probability Machine

Deep neural networks enjoy a powerful representation and have proven eff...
11/20/2019 ∙ by Lirong He, et al. ∙ 13

• ### Learning Generative Adversarial RePresentations (GAP) under Fairness and Censoring Constraints

We present Generative Adversarial rePresentations (GAP) as a data-driven...
09/27/2019 ∙ by Jiachun Liao, et al. ∙ 0

• ### Learning Vector-valued Functions with Local Rademacher Complexity

We consider a general family of problems of which the output space admit...
09/11/2019 ∙ by Jian Li, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Machine learning models, especially deep neural networks, have achieved impressive performance across a variety of domains including image classification, natural language processing, and speech recognition. However, these techniques can easily be fooled by adversarial examples, i.e., carefully perturbed input samples aimed to cause misclassification during the test phase. This phenomenon was first studied in spam filtering [15, 32, 33] and has attracted considerable attention since 2014, when Szegedy et al. [43]

noticed that small perturbations in images can cause misclassification in neural network classifiers. Since then, there has been considerable focus on developing adversarial attacks against machine learning algorithms

[22, 9, 8, 4, 45], and, in response, many defense mechanisms have also been proposed to counter these attacks [23, 21, 16, 42, 34]. These works focus on creating optimization-based robust algorithms, but their generalization performance under adversarial input perturbations is still not fully understood.

Schmidt et al. [39] recently discussed the generalization problem in the adversarial setting and showed that the sample complexity of learning a specific distribution in the presence of -bounded adversaries increases by an order of for all classifiers. The same paper recognized that deriving the agnostic-distribution generalization bound remained an open problem [39]. In a subsequent study, Cullina et al. [14] extended the standard PAC-learning framework to the adversarial setting by defining a corrupted hypothesis class and showed that the VC dimension of this corrupted hypothesis class for halfspace classifiers does not increase in the presence of an adversary. While their work provided a theoretical understanding of the problem of learning with adversaries, it had two limitations. First, their results could only be applied to binary problems, whereas in practice we usually need to handle multi-class problems. Second, the 0-1 loss function used in their work is not convex and thus very hard to optimize.

In this paper, we propose a general theoretical method for analyzing generalization performance in the presence of adversaries. In particular, we attempt to fit the adversarial learning problem into the minimax framework [29]. In contrast to traditional statistical learning, where the underlying data distribution is unknown but fixed, the minimax framework considers the uncertainty about the distribution by introducing an ambiguity set and then aims to minimize the risk with respect to the worst-case distribution in this set. Motivated by Lee and Raginsky [29], we first note that the adversarial expected risk over a distribution is equivalent to the standard expected risk under a new distribution . Since this new distribution is not fixed and depends on the hypothesis, we instead consider the worst case. In this way, the original adversarial learning problem is reduced to a minimax problem, and we use the minimax approach to derive the risk bound for the adversarial expected risk. Our contributions can be summarized as follows.

• We propose a general method for analyzing the risk bound in the presence of adversaries. Our method is general in several respects. First, the adversary we consider is general and encompasses all bounded adversaries for . Second, our method can be applied to multi-class problems and other commonly used loss functions such as the hinge loss and ramp loss, whereas Cullina et al. [14] only considered the binary classification problem and the 0-1 loss.

• We prove a new bound for the local worst-case risk under a weak version of Lipschitz condition. Our bound is always better than that of Lee and Raginsky [30], and can recover the standard non-adversarial risk bound by setting the radius of the adversary to 0, whereas Lee and Raginsky [30] give a -free bound.

• We derive the adversarial risk bounds for SVM, deep neural networks, and PCA. Our bounds have two data-dependent terms, suggesting that minimizing the sum of the two terms can help achieve adversarial robustness.

The remainder of this paper is structured as follows. In Section 2, we discuss related works. Section 3 formally defines the problem, and we present our theoretical method in Section 4. The adversarial risk bounds for SVM, neural networks, and PCA are described in Section 5, and we conclude and discuss future directions in Section 6.

## 2 Related Work

Our work leverages some of the benefits of statistical machine learning, summarized as follows.

### 2.1 Generalization in Supervised Learning

Generalization is a central problem in supervised learning, and the generalization capability of learning algorithms has been extensively studied. Here we review the salient aspects of generalization in supervised learning relevant to this work.

Two main approaches are used to analyze the generalization bound of a learning algorithm. The first is based on the complexity of the hypothesis class, such as the VC dimension [46, 47] for binary classification, Rademacher and Gaussian complexities [7, 5], and the covering number [55, 54, 6]. Note that hypothesis complexity-based analyses of generalization error are algorithm independent and consider the worst-case generalization over all functions in the hypothesis class. In contrast, the second approach is based on the properties of a learning algorithm and is therefore algorithm dependent. The properties characterizing the generalization of a learning algorithm include, for example, algorithmic stability [11, 40, 31], robustness [51], and algorithmic luckiness [25]. Some other methods exist for analyzing the generalization error in machine learning such as the PAC-Bayesian approach [36, 2], compression-based bounds [28, 3], and information-theoretic approaches [50, 1, 38, 53].

### 2.2 Minimax Statistical Learning

In contrast to standard empirical risk minimization in supervised learning, where test data follow the same distribution as training data, minimax statistical learning arises in problems of distributionally robust learning [17, 19, 29, 30, 41]

and minimizes the worst-case risk over a family of probability distributions. Thus, it can be applied to the learning setting in which the test data distribution differs from that of the training data, such as in domain adaptation and transfer learning

[12]. In particular, Gao and Kleywegt [19] proposed a dual representation of worst-case risk over the ambiguity set of probability distributions, which was given by balls in Wasserstein space. Then, Lee and Raginsky [29] derived the risk bound for minimax learning by exploiting the dual representation of worst-case risk proposed by Gao and Kleywegt [19]. However, the minimax risk bound proposed in Lee and Raginsky [29] would go to infinity and thus become vacuous as . During the preparation of the initial draft of this paper, Lee and Raginsky [30] presented a new bound by imposing a Lipschitz assumption to avoid this problem. However, their new bound was -free and cannot recover the usual risk bound by setting . Sinha et al. [41] also provided a similar upper bound on the worst-case population loss over distributions defined by a certain distributional Wasserstein distance, and their bound was efficiently computable by a principled adversarial training procedure and hence certified a level of robustness. However their training procedure required that the penalty parameter should be large enough and thus can only achieve a small amount of robustness. Here we improve on the results in Lee and Raginsky [29, 30] and present a new risk bound for the minimax problem.

The existence of adversaries during the test phase of a learning algorithm makes learning systems untrustworthy. There is extensive literature on analysis of adversarial robustness [48, 18, 24, 20] and design of provable defense against adversarial attacks[49, 37, 34, 41]

, in contrast to the relatively limited literature on risk bound analysis of adversarial learning. A comprehensive review of works on adversarial machine learning can be found in

Biggio and Roli [10]. Concurrently to our work, Khim and Loh [26] and Yin et al. [52] provided different approaches for deriving adversarial risk bounds. Khim and Loh [26] derived adversarial risk bounds for linear classifiers and neural networks using a method called function transformation. However, their approach can only be applied to binary classification. Yin et al. [52] gave similar adversarial risk bounds as Khim and Loh [26] through the lens of Rademacher complexity. Although they provided risk bounds in multi-class setting, their work focused on

adversarial attacks and was limited to one-hidden layer ReLU neural networks. After the initial preprint of this paper,

Khim and Loh [27] extended their method to multi-class setting at the expense of incurring an extra factor of the number of classes in their bound. In contrast, our multi-class bound does not have explicit dependence on this number. We hope that our method can provide new insight into analysis of the adversarial risk bounds.

## 3 Problem Setup

We consider a standard statistical learning framework. Let be a measurable instance space where and represent feature and label spaces, respectively. We assume that examples are independently and identically distributed according to some fixed but unknown distribution . The learning problem is then formulated as follows. The learner considers a class of hypothesis and a loss function . The learner receives training examples denoted by drawn i.i.d. from and tries to select a hypothesis that has a small expected risk. However, in the presence of adversaries, there will be imperceptible perturbations to the input of examples, which are called adversarial examples. We assume that the adversarial examples are generated by adversarially choosing an example from neighborhood . We require to be nonempty and that some choice of examples is always available. Throughout this paper, we assume that , where is a nonempty, closed, convex, origin-symmetric set. Note that the definition of is very general and encompasses all -bounded adversaries when . We next give the formal definition of adversarial expected and empirical risk to measure the learner’s performance in the presence of adversaries.

###### Definition 1.

(Adversarial Expected Risk). The adversarial expected risk of a hypothesis over the distribution in the presence of an adversary constrained by is

 RP(h,B)=E(x,y)∼P[maxx′∈N(x)l(h(x′),y)].

If is the zero-dimensional space , then the adversarial expected risk will reduce to the standard expected risk without an adversary. Since the true distribution is usually unknown, we instead use the empirical distribution to approximate the true distribution, which is equal to with probability for each . That gives us the following definition of adversarial empirical risk.

###### Definition 2.

(Adversarial Empirical Risk ). The adversarial empirical risk of in the presence of an adversary constrained by is

 RPn(h,B)=E(x,y)∼Pn[maxx′∈N(x)l(h(x′),y)],

where represents the empirical distribution.

In the next section, we derive the adversarial risk bounds.

## 4 Main Results

In this section, we present our main results. The trick is to pushforward the original distribution into a new distribution using a transport map satisfying

 RP(h,B)=RP′(h),

where is the standard expected risk without the adversary. Therefore, an upper bound on the expected risk over the new distribution leads to an upper bound on the adversarial expected risk.

Note that the new distribution is not fixed and depends on the hypothesis . As a result, traditional statistical learning cannot be directly applied. However, note that these new distributions lie within a Wasserstein ball centered on . If we consider the worst case within this Wasserstein ball, then the original adversarial learning problem can be reduced to a minimax problem. We can therefore use the minimax approach to derive the adversarial risk bound. We first introduce the Wasserstein distance and minimax framework.

### 4.1 Wasserstein Distance and Local Worst-case Risk

Let be a metric space where and is defined as

 dpZ(z,z′)=dpZ((x,y),(x′,y′))=(dpX(x,x′)+dpY(y,y′))

with and representing the metric in the feature space and label space respectively. For example, if , can be , and if , can be . In this paper, we require that is translation invariant, i.e., . With this metric, we denote with the space of all Borel probability measures on , and with the space of all with finite

th moments for

:

 Pp(Z):={P∈P(Z):EP[dpZ(z,z0)]<∞ for z0∈Z}.

Then, the th Wasserstein distance between two probability measures is defined as

 Wp(P,Q):=infM∈Γ(P,Q)(EM[dpZ(z,z′)])1/p,

where denotes the collection of all measures on with marginals P and Q on the first and second factors, respectively.

Now we define the local worst-case risk of at ,

 Rϵ,p(P,h):=supQ∈BWϵ,p(P)RQ(h),

where is the -Wasserstein ball of radius centered at .

With these definitions, we next show the adversarial expected risk can be related to the local worst-case risk by a transport map .

### 4.2 Transport Map

Define a mapping

 z=(x,y)→(x∗,y),

where . By the definition of , it is easy to obtain . We now prove that the adversarial expected risk can be related to the standard expected risk via the mapping .

###### Lemma 1.

Let , the pushforward of by , then we have

 RP(h,B)=RP′(h).
###### Proof.

By the definition, we have

 RP(h,B)=E(x,y)∼P[maxx′∈N(x)l(h(x′),y)]=E(x,y)∼P[l(h(x∗),y)]=E(x,y)∼P′[l(h(x),y)]

So . ∎

By this lemma, the adversarial expected risk over a distribution is equivalent to the standard expected risk over a new distribution . However since the new distribution is not fixed and depends on the hypothesis , traditional statistical learning cannot be directly applied. Luckily, the following lemma proves that all these new distributions locate within a Wasserstein ball centered at .

###### Lemma 2.

Define the radius of the adversary as . For any hypothesis and the corresponding , we have

 Wp(P,P′)≤ϵB.
###### Proof.

By the definition of Wasserstein distance,

 Wpp(P,P′)≤EP[dpZ(Z,Th(Z))]=EP[dpX(x,x∗)]≤ϵpB,

where the last inequality uses the translation invariant property of . Therefore, we have . ∎

From this lemma, we can see that all possible new distributions lie within a Wasserstein ball of radius centered on . So, by upper bounding the worst-case risk in the ball, we can bound the adversarial expected risk. The relationship between local worst-case risk and adversarial expected risk is as follows. Note that this inequality holds for any . So, in the rest of the paper, we only discuss the case ; that is,

 RP(h,B)≤RϵB,1(P,h),∀h∈H. (4.1)

In this subsection, we first prove a bound for the local worst-case risk. Then, the adversarial risk bounds can be derived directly by (4.1). For the convenience of our discussion, we denote a function class by compositing the functions in with the loss function , i.e., . The key ingredient of a bound on the local worst-case risk is the following strong duality result after Gao and Kleywegt [19]:

###### Proposition 1.

For any upper semicontinuous function and for any ,

 RϵB,1(P,f)=minλ≥0{λϵB+EP[φλ,f(z)]},

where .

We begin with some assumptions.

###### Assumption 1.

The instance space is bounded: .

###### Assumption 2.

The functions in are upper semicontinuous and uniformly bounded: for all and .

###### Assumption 3.

For any function and any , there exists a constant such that for any .

Note that Assumption 3 is a weak version of Lipschitz condition since the constant is not fixed and depends on and . It is easy to see that if the function is -Lipschitz with respect to the metric , i.e., , Assumption 3 automatically holds with always being . Assumption 3 is very straightforward. But it is not easy to use for our proof. For this sake, we give an equivalent expression to Assumption 3 in the following lemma.

###### Lemma 3.

Assumption 3 holds if and only if for any function and any empirical distribution , the set is nonempty, where .

The proof of this lemma is contained in Appendix A.

We denote the smallest value in the set as . In order to prove the local worst-case risk bound, we need two technical lemmas.

###### Lemma 4.

Fix some . Define via

Then

 ¯λ∈⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩[0,MϵB]if ϵB≥Mλ+f,Pn[λ−f,Pn,λ+f,Pn]if ϵB

where if the set is nonempty, otherwise .

###### Proof.

If , by Proposition 1, , we have

 ¯λϵB≤RϵB,1(Pn,f).

Since for any , we get . So

For the other side, we first show that is continuous and monotonically non-increasing. The monotonicity is easy to verify from the definition. For continuity, for any , suppose that

 ^z=supz′∈Z{f(z′)−λ1dZ(z,z′)−f(z)}),z∗=supz′∈Z{f(z′)−λ2dZ(z,z′)−f(z)}).

Then we have

 ψf,Pn(λ1)−ψf,Pn(λ2)=EPn(supz′∈Z{f(z′)−λ1dZ(z,z′)−f(z)}−supz′∈Z{f(z′)−λ2dZ(z,z′)−f(z)})≤Epn((λ2−λ1)dZ(z,^z))≤(λ2−λ1)diam(Z).

So is -Lipschitz and thus continuous.

Now we prove . If , by the monotonicity and nonnegativity of , we have , which implies . Therefore the optimal . To show , first notice that belongs to for any . We define

 λ−f,Pn:=sup{λ:ψf,Pn(λ)=λ+f,Pn⋅ϵB}.

Note that this set might be empty if . In this case, we just let , and must belong to . Otherwise, there must exist some which satisfies by the intermediate value theorem of a continuous function. We choose to be the maximal one in that set. Then, for any , since is monotonically non-increasing, we have

 EPn(supz′∈Z{f(z′)−λdZ(z,z′)−f(z)})≥λ+f,Pn⋅ϵB.

By rearranging the items on both sides, we obtain

 λϵB+EPn[φλ,f(z)]≥λ+f,Pn⋅ϵB+EPn(f(z))

for any . Therefore, , and we complete the proof. ∎

Remark 1. We can show by using language as follows. , define . Then, for any , we have . By the definition of , . Since is monotonically non-increasing, we have . Therefore, .

###### Lemma 5.

Define the function class where . Then, the expected Rademacher complexity of the function class satisfies

 Rn(Φ)≤12C(F)√n+6√π√n(b−a)⋅diam(Z),

where and denotes the covering number of .

The proof of this lemma is contained in Appendix B.

The following theorem gives the generalization bound for the local worst-case risk. We first introduce the corresponding notation: denotes expression (4.2), and . It is straightforward to check that from expression (4.2).

###### Theorem 1.

If the assumptions 13 hold, then for any , we have

 RϵB,1(P,f)−RϵB,1(Pn,f)≤24C(F)√n+M√log(1δ)2n+                                                   12√π√nΛϵB⋅diam(Z)

with probability at least .

###### Proof.

For any , define

 ¯λ:=argminλ≥0{λϵB+EPn[φλ,f(Z)]}.

Then using Proposition 1, we can write

 RϵB,1(P,f)−RϵB,1(Pn,f)=minλ≥0{λϵB+∫Zφλ,f(z)P(dz)}−(¯λϵB+∫Zφ¯λ,f(z)Pn(dz))≤∫Zφ¯λ,f(z)(P−Pn)(dz).

By lemma 4, we have . Define the function class . Then, we have

 RϵB,1(P,f)−RϵB,1(Pn,f)≤supφ∈Φ[∫Zφ(z)(P−Pn)(dz)].

Since all takes values in , the same holds for all . Therefore, by a standard symmetrization argument [35],

 RϵB,1(P,f)−RϵB,1(Pn,f)≤2Rn(Φ)+M√log(1/δ)2n

with probability at least , where is the expected Rademacher complexity of . Using the bound of lemma 4, we get the following result

 RϵB,1(P,f)−RϵB,1(Pn,f)≤24C(F)√n+M√log(1δ)2n+                                                   12√π√nΛϵB⋅diam(Z).

Remark 2. Lee and Raginsky [30] prove a bound with under the Lipschitz assumption with representing the Lipschitz constant. Our result improves a lot on theirs. First, our Assumption 3 is weaker than the Lipschitz assumption in Lee and Raginsky [30]. Second, even under our weaker assumptions, our bound is still better than that of Lee and Raginsky [30] for the case since . Third, if further assuming the same Lipschitz condition as Lee and Raginsky [30], we can get by the definition of and , which is always better than the ones in Lee and Raginsky [30]. Finally, the term in our bound will vanish as or whereas Lee and Raginsky [30] give a -free bound with always being a constant .

This leads to the following upper bound on the adversarial expected risk.

###### Corollary 1.

With the conditions in Theorem 1, for any , we have

 RP(f,B)≤1n∑ni=1f(zi)+minλ≥0{λϵB+ψf,Pn(λ)}+                24C(F)√n+12√π√nΛϵBdiam(Z)+M√log(1δ)2n (4.3)

and

 RP(f,B)≤1n∑ni=1f(zi)+λ+f,PnϵB+24C(F)√n+                     12√π√nΛϵB⋅diam(Z)+M√log(1δ)2n (4.4)

with probability at least .

###### Proof.

By Proposition 1, can be written as

 RϵB,1(Pn,f)=minλ≥0{λϵB+EPn[φλ,f(z)]}=minλ≥0{λϵB+EPn[φλ,f(z)−f(z)]}+EPn[f(z)]=minλ≥0{λϵB+ψf,Pn(λ)}+1nn∑i=1f(zi),

where the last equality uses the definition of . Substituting the above equation into Theorem 1, we get result (4.3). To obtain (4.4), we can make use of the following inequality

 minλ≥0{λϵB+ψf,Pn(λ)}≤λ+f,PnϵB+ψf,Pn(λ+f,Pn)=λ+f,PnϵB,

where the equality follows from the definition of . ∎

Remark 3. We are interested in how the adversarial risk bounds differ from the case in which the adversary is absent. Plugging into inequality (4.4) yields the usual generalization of the form

 RP(h)≤1n∑ni=1f(zi)+24C(F)√n+M√log(1/δ)2n.

So the effect of an adversary is to introduce an extra complexity term and an additional linear term on which contributes to the empirical risk.

Remark 4. As mentioned in Remark 2, the extra complexity term will decrease as gets bigger if , indicating that a stronger adversary might have a negative impact on the hypothesis class complexity. This is intuitive, since different hypotheses might have the same performance in the presence of a strong adversary and, therefore, the hypothesis class complexity will decrease. We emphasize that this phenomenon does not occur in our concurrent work Khim and Loh [26] and Yin et al. [52]. In both of their work, this term will increase linearly as grows.

Remark 5. We should point out that is data-dependent and might be difficult to compute exactly. Luckily we can upper bound it easily. For example, if is -Lipschitz, by the definition of , we have . See Section 5 for more examples. In particular, if for any , we get , and the additional term in (4.4) will disappear.

## 5 Example Bounds

In this section, we illustrate the application of Corollary 1 to several commonly-used models: SVMs, neural networks, and PCA.

### 5.1 Support Vector Machines

We first start with SVMs. Let , where the feature space and the label space . Equip with the Euclidean metric

 dZ(z,z′)=dZ((x,y),(x′,y′))=||x−x′||2+1(y≠y′).

Consider the hypothesis space , where . We can now derive the expected risk bound for SVMs in the presence of an adversary.

###### Corollary 2.

For the SVM setting considered above, for any , with probability at least ,

 RP(f,B)≤1n∑ni=1f(zi)+λ+f,PnϵB+144√nΛr√d+                    12√π√nΛϵB⋅(2r+1)+(1+Λr)√log(1δ)2n,

where .

The proof of Corollary 2 can be found in Appendix C.

Our result can easily be extended to kernel SVM. Here, we take a Gaussian kernel as an example. Let be a Gaussian kernel with . Let be a feature mapping associated with and , where is the inner product in the reproducing kernel Hilbert space and is the induced norm. Suppose is compact and the space is equipped with the metric

 dZ(z,z′)=||τ(x)−τ(x′)||H+1(y≠y′)

for and . It is easy to show that is translation invariant from Gaussian kernel definition. To apply Corollary 1, we must calculate the covering numbers . To this end, we embed into the space of continuous real-valued functions on denoted by equipped with the sup norm .

We can now derive the adversarial risk bounds for the Gaussian-kernel SVM.

###### Corollary 3.

For the Gaussian-kernel SVM described above, for any , with probability at least ,

 RP(f,B)≤1n∑ni=1f(zi)+λ+f,PnϵB+24√nΛ√dC3+30√π√nΛϵB+(1+Λ)√log(1/δ)2n,

where , , and