Adversarial Risk Bounds for Binary Classification via Function Transformation

by   Justin Khim, et al.

We derive new bounds for a notion of adversarial risk, characterizing the robustness of binary classifiers. Specifically, we study the cases of linear classifiers and neural network classifiers, and introduce transformations with the property that the risk of the transformed functions upper-bounds the adversarial risk of the original functions. This reduces the problem of deriving adversarial risk bounds to the problem of deriving risk bounds using standard learning-theoretic techniques. We then derive bounds on the Rademacher complexities of the transformed function classes, obtaining error rates on the same order as the generalization error of the original function classes. Finally, we provide two algorithms for optimizing the adversarial risk bounds in the linear case, and discuss connections to regularization and distributional robustness.



There are no comments yet.


page 1

page 2

page 3

page 4


Optimal exponential bounds on the accuracy of classification

We consider a standard binary classification problem. The performance of...

Adversarial Risk via Optimal Transport and Optimal Couplings

The accuracy of modern machine learning algorithms deteriorates severely...

Statistically Robust Neural Network Classification

Recently there has been much interest in quantifying the robustness of n...

Estimated VC dimension for risk bounds

Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the gener...

Generalization Bounds for Vicinal Risk Minimization Principle

The vicinal risk minimization (VRM) principle, first proposed by vapnik1...

Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution

We study adversarial perturbations when the instances are uniformly dist...

Evaluation of binary classifiers for asymptotically dependent and independent extremes

Machine learning classification methods usually assume that all possible...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning systems are becoming ubiquitous in everyday life. From virtual assistants on phones to image search and translation, neural networks have vastly improved the performance of many computerized systems in a short amount of time (Goodfellow et al., 2016). However, neural networks have a variety of shortcomings: A peculiarity that has gained much attention over the past few years has been the apparent lack of robustness of neural network classifiers to adversarial perturbations. Szegedy et al. (2013) noticed that small perturbations to images could cause neural network classifiers to predict the wrong class. Further, these perturbations could be carefully chosen so as to be imperceptible to humans.

Such observations have instigated a deluge of research in finding adversarial attacks (Athalye et al., 2018; Goodfellow et al., 2014; Papernot et al., 2016; Szegedy et al., 2013), defenses against adversaries for neural networks (Madry et al., 2018; Raghunathan et al., 2018; Sinha et al., 2018; Wong and Kolter, 2018), evidence that adversarial examples are inevitable (Shafahi et al., 2018), and theory suggesting that constructing robust classifiers is computationally infeasible (Bubeck et al., 2018)

. Attacks are usually constructed assuming a white-box framework, in which the adversary has access to the network, and adversarial examples are generated using a perturbation roughly in the direction of the gradient of the loss function with respect to a training data point. This idea generally produces adversarial examples that can break ad-hoc defenses in image classification.

Currently, strategies for creating robust classification algorithms are much more limited. One approach (Madry et al., 2018; Suggala et al., 2018) is to formalize the problem of robustifying the network as a novel optimization problem, where the objective function is the expected loss of a supremum over possible perturbations. However, Madry et al. (2018) note that the objective function is often not concave in the perturbation. Other authors (Raghunathan et al., 2018; Wong and Kolter, 2018) have leveraged convex relaxations to provide optimization-based certificates on the adversarial loss of the training data. However, the generalization performance of the training error to unseen examples is still not understood.

The optimization community has long been interested in constructing robust solutions for various problems, such as portfolio management (Ben-Tal et al., 2009), and deriving theoretical guarantees. Robust optimization has been studied in the context of regression and classification (Trafalis and Gilbert, 2007; Xu et al., 2009a, b). More recently, a notion of robustness that attempts to minimize the risk with respect to the worst-case distribution close to the empirical distribution has been the subject of extensive work (Ben-Tal et al., 2013; Namkoong and Duchi, 2016, 2017). Researchers have also considered a formulation known as distributionally robust optimization, using the Wasserstein distance as a metric between distributions (Esfahani and Kuhn, 2015; Blanchet and Kang, 2017; Gao et al., 2017; Sinha et al., 2018). With the exception of Sinha et al. (2018), generalization bounds of a learning-theoretic nature are nonexistent, with most papers focusing on studying properties of a regularized reformulation of the problem. Sinha et al. (2018) provide bounds for Wasserstein distributionally robust generalization error based on covering numbers for sufficiently small perturbations. This is sufficient for ensuring a small amount of adversarial robustness and is quite general; but for classification using neural networks, known covering number bounds (Bartlett et al., 2017) are substantially weaker than Rademacher complexity bounds (Golowich et al., 2018).

Although neural networks are rightly the subject of attention due to their ubiquity and utility, the theory that has been developed to explain the phenomena arising from adversarial examples is still far from complete. For example, Goodfellow et al. (2014) argue that non-robustness may be due to the linear nature of neural networks. However, attempts at understanding linear classifiers (Fawzi et al., 2018) argue against linearity, i.e., the function classes should be more expressive than linear classification.

In this paper, we provide upper bounds for a notion of adversarial risk in the case of linear classifiers and neural networks. These bounds may be viewed as a sample-based guarantee on the risk of a trained classifier, even in the presence of adversarial perturbations on the inputs. The key step is to transform a classifier into an “adversarially-perturbed" classifier by modifying the loss function. The risk of the function can then be analyzed in place of the adversarial risk of ; in particular, we can more easily provide bounds on the Rademacher complexities necessary for bounding the robust risk. Finally, our transformations suggest algorithms for minimizing the adversarially robust empirical risk. Thus, from the theory developed in this paper, we can show that adversarial perturbations have somewhat limited effects from the point of view of generalization error.

This paper is organized as follows: We introduce the precise mathematical framework in Section 2. In Section 3, we discuss our main results. In Section 4, we provide results on optimizing the adversarial risk bounds. In Section 5, we prove our key theoretical contributions. Finally, we conclude with a discussion of future avenues of research in Section 6.

Notation: For a matrix , we write to denote the -operator norm. We write

to denote the Frobenius norm. For a vector

, we write to denote the -norm.

2 Setup

We consider a standard statistical learning setup. Let be a space of covariates, and define the space of labels to be . Let . Suppose we have observations , drawn i.i.d. according to some unknown distribution . We write .

A classifier corresponds to a function , where . Thus, the function may express uncertainty in its decision; e.g., prediction in allows the classifier to select an expected outcome.

2.1 Risk and Losses

Given a loss function , our goal is to minimize the adversarially robust risk, defined by

where is an adversarially chosen perturbation in the -ball of radius . For simplicity, we write , so the input is perturbed by a vector in the -ball of radius , but still classified according to . Usually in the literature, is taken to be , , or ; the case has received particular interest. Also note that if , the adversarial risk reduces to the usual statistical risk, for which upper bounds based on the empirical risk are known as generalization error bounds. For some discussion of the relationship between the adversarial risk to the distributionally robust risk, see Appendix E.

We now define a few specific loss functions. The indicator loss

is of primary interest in classification; in both the linear classifier and neural network classification settings, we will primarily be interested in bounding the adversarial risk with respect to the indicator loss. As is standard in linear classification, we also define the hinge loss

which is a convex surrogate for the indicator loss, and will appear in some of our bounds. We also introduce the indicator of whether the hinge loss is positive, defined by

For analyzing neural networks, we will also employ the cross-entropy loss, defined by

where is the softmax function:

Note that in all of the cases above, we can also write the loss , for an appropriately defined loss . Furthermore, and are 1-Lipschitz.

2.2 Function Classes and Rademacher Complexity

We are particularly interested in two function classes: linear classifiers and neural networks. We denote the first class by , and we write an element of , parametrized by and , as

We similarly denote the class of neural networks as , and we write a neural network , parametrized by and , as

where each is a matrix and each is a monotonically increasing

-Lipschitz activation function applied elementwise to vectors, such that

. For example, we might have

, which is the ReLU function. The matrix

is of dimension , where and . We use to denote the th row of , with th entry . Also, when discussing indices, we write as shorthand for .

A standard measure of the complexity of a class of functions is the Rademacher complexity. The empirical Rademacher complexity of a function class and a sample is


where the

’s are i.i.d. Rademacher random variables; i.e., the

’s are random variables taking the values and

, each with probability

. Note that denotes the expectation with respect to the ’s. Finally, we note that the standard Rademacher complexity is obtained by taking an expectation over the data: .

3 Main Results

We introduce our main results in this section. The trick is to push the supremum through the loss and incorporate it into the function , yielding a transformed function . We require this transformation to satisfy

so an upper bound on the transformed risk leads to an upper bound on the adversarial risk. We call the proposed functions the supremum transformation and tree transformation in the cases of linear classifiers and neural networks, respectively.

In both cases, we have to make a minor assumption about the loss. The assumption is that is monotonically decreasing in : Specifically, is decreasing in and is increasing in . This is not a stringent assumption, and is satisfied by all of the loss functions mentioned earlier.

One technicality is that the transformed function needs to be a function of both and ; i.e., we have . Thus, the loss of a transformed function is . We now define the essential transformations studied in our paper.

Definition 1.

The supremum (sup) transform is defined by

Additionally, we define to be the transformed function class

We now have the following result:

Proposition 1.

Let be a loss function that is monotonically decreasing in . Then

Remark 1.

The consequence of the supremum transformation can be seen by taking the expectation:

Thus, we can bound the adversarial risk of a function with a bound on the usual risk of via Rademacher complexities. For linear classifiers, we shall see momentarily that the supremum transformation can be calculated exactly.

3.1 The Supremum Transformation and Linear Classification

We start with an explicit formula for the supremum transform.

Proposition 2.

Let . Then the supremum transformation takes the explicit form

where satisfies .

The proof is contained in Section 5.

Next, the key ingredient to a generalization bound is an upper bound on the Rademacher complexity of .

Lemma 1.

Let be a compact linear function class such that and for all , where . Suppose for all . Then we have

This leads to the following upper bound on adversarial risk, proved in Appendix C:

Corollary 1.

Let be a collection of linear classifiers such that, for any classifier in , we have and . Let be a constant such that for all . Then for any , we have




with probability at least .

As seen in the proof of Corollary 1, the loss involved in defining the adversarial risk could be replaced by another loss, which would then need to be upper-bounded by a Lipschitz loss function (in this case, the hinge loss). The empirical version of the latter loss would then appear on the right-hand side of the bounds.

Remark 2.

An immediate question is how our adversarial risk bounds compare with the case when perturbations are absent. Plugging into the equations above yields the usual generalization bounds of the form

so the effect of an adversarial perturbation is essentially to introduce an additional term as well as an additional contribution to the empirical risk that depends linearly on . The additional empirical risk term vanishes if classifies adversarially perturbed points correctly, since in that case.

Remark 3.

Clearly, we could further upper-bound the regularization term in equation (3) by . This is essentially the bound obtained for the empirical risk for Wasserstein distributionally robust linear classification (Gao et al., 2017). However, this bound is loose when a good robust linear classifier exists, i.e., when is small relative to . Thus, when good robust classifiers exist, distributional robustness is relatively conservative for solving the adversarially robust problem (cf. Appendix E).

3.2 The Tree Transformation and Neural Networks

In this section, we consider adversarial risk bounds for neural networks. We begin by introducing the tree transformation, which unravels the neural network into a tree in some sense.

Definition 2.

Let be a neural network given by

Define the terms and by



Then the tree transform is defined by


Intuitively, the tree transform (5) can be thought of as a new neural network classifier where the adversary can select a different worst-case perturbation for each path through the neural network from the input to the output indexed by . This leads to distinct paths through the network for given inputs and , and if these paths were laid out, they would form a tree (see Section 3.3).

Next, we show that the risk of the tree transform upper-bounds the adversarial risk of the original neural network.

Proposition 3.

Let be monotonically decreasing in . Then we have the inequality

As an immediate corollary, we obtain

so it suffices to bound this latter expectation. We have the following bound on the Rademacher complexity of :

Lemma 2.

Let be a class of neural networks of depth satisfying and , for each , and let . Additionally, suppose and for all . Then we have the bound

Finally, we have our adversarial risk bounds for neural networks. The proof is contained in Appendix C.

Corollary 2.

Let be a class of neural networks of depth . Let . Under the same assumptions as Lemma 2, for any , we have the upper bounds



with probability at least .

Remark 4.

As in the linear case, we can essentially recover pre-existing non-adversarial risk bounds by setting (Bartlett et al., 2017; Golowich et al., 2018). Again, the effect of adversarial perturbations on the adversarial risk is the addition of on top of the empirical risk bounds for the unperturbed loss. Finally, the bound (6) includes an extra perturbation term that is linear in , with coefficient reflecting the Lipschitz coefficient of the neural network, as well as a term , which decreases as improves as a classifier because is small when is small. A similar term appears in the bound (3).

3.3 A Visualization of the Tree Transform

In this section, we provide a few pictures to illustrate the tree transform. Consider the following two-layer network with two hidden units per layer:

We begin by with visualizing in Figure 1.

Figure 1: A visualization of . The input is fed up through the network.

Next, we examine what happens when the supremum is taken inside the first layer. The resulting transformed function (cf. Lemma 3 in Section 5) becomes


The corresponding network is shown in Figure 2.

Figure 2: A visualization of the function of equation (7). Note that two different perturbations, and , are fed upward through different paths in the network.

Finally, we examine the entire tree transform. This is


the result, shown in Figure 3, yields a tree-structured network.

Figure 3: A visualization of the function in equation (8). Note that four distinct perturbed inputs are fed through the network via different paths. The resulting tree-structured graph leads to the name “tree transform."

In particular, we note that now the visualization of the network reveals a tree. This is the reason that is called the tree transform.

4 Optimization of Risk Bounds

In practice, our sample-based upper bounds on adversarial risk suggest the strategy of optimizing the bounds in the corollaries, rather than simply the empirical risk, to achieve robustness of the trained networks against adversarial perturbations. Accordingly, we provide two algorithms for optimizing the upper bounds appearing in Corollary 1. One idea is to optimize the first bound (2) directly. Recalling the form of , this leads to the following optimization problem:


Note that the optimization problem of equation (9) is convex in and ; therefore, this is a computationally tractable problem. We summarize this approach in Algorithm 1.

Input : Data , function class .
Solve equation (9) to obtain . Return the resulting classifier , where .
Algorithm 1 Convex risk

The second approach involves optimizing the second adversarial risk bound (3). Although this bound is generally looser than the bound (2), we comment on optimization due to the fact that regularization has been suggested as a way to encourage generalization. However, note that the regularization coefficient in the bound (3) depends on . Thus, we propose to perform a grid search over the value of the regularization parameter.

Specifically, define


We then have the optimization problem


Note, however, that is nonconvex, and the form as a function of and is complicated. We propose to take for and solve


At the end, we simply pick the solution minimizing the objective function in equation (11) over all . Note that this involves evaluating equation (10), but this is easy to do in the linear case. This method is summarized in Algorithm 2.

Input : Data , function class .
1 for  do
2       Set . Calculate the minimizing equation (12). Save the robust empirical risk, the objective of equation (11), of as .
3 end for
Return the with the minimum .
Algorithm 2 Regularized risk

5 Proofs

We now present the proofs of our core theoretical results regarding the transform functions and .

Proof of Proposition 1.

We break our analysis into two cases. If , then is decreasing in . Thus, we have

If instead , then is increasing in , so

This completes the proof. ∎

Proof of Proposition 2.

Using the definition of the sup transform, we have

where the final equality comes from the variational definition of the -norm. This completes the proof. ∎

Before we begin the proof of Proposition 3, we state, prove, and remark upon a helpful lemma. We want to apply this iteratively to push the supremum inside the layers of the neural network.

Lemma 3.

Let be a function and define to be a monotonically increasing function applied elementwise to vectors. Then we have the inequality


Denote the left hand-side of the desired inequality by . First, we can push the supremum inside the sum to obtain

Next, note that


Since is monotonically increasing, we see that the map is monotonically increasing, as well. Thus, the supremum in equation (13) is obtained when is maximized. Hence, we obtain

which completes the proof. ∎

Remark 5.

Note that if , where , this lemma yields

If we apply Lemma 3 again, we obtain

In particular, we note that the sign terms accumulate within the supremum, but when we take the supremum inside another layer, the sign terms remaining in the previous layers cancel out and are incorporated into the of the next layer.

Proof of Proposition 3.

First note that the assumption that is monotonically decreasing in is equivalent to being monotonically increasing in . As in the proof of Proposition 1, if , we want to show that ; if , we want to show that . Thus, it is our goal to establish the inequality


We define and show how to take the supremum inside each layer of the neural network to yield . To this end, we simply apply Lemma 3 and Remark 5 iteratively until the remaining function is linear. Thus, we see that

and simplifying gives

The final supremum clearly evaluates to . Recalling the definition (4) of , we then have

which proves the proposition. ∎

6 Discussion

We have presented a method of transforming classifiers to obtain upper bounds on the adversarial risk. We have shown that bounding the generalization error of the transformed classifiers may be performed using similar machinery for obtaining traditional generalization bounds in the case of linear classifiers and neural network classifiers. In particular, since the Rademacher complexity of neural networks only has a small additional term due to adversarial perturbations, generalization even in the presence of adversarial perturbations should not be impossibly difficult for binary classification.

We mention several future directions for research. First, one might be interested in extending the supremum transformation to other types of classifiers. The most interesting avenues would include calculating explicit representations as in the case of linear classifiers, suitable alternative transformations as in the case of neural networks, and bounds on the resulting Rademacher complexities.

A second direction is to understand the tree transformation better and develop algorithms for optimizing the resulting adversarial risk bounds. One view that we have taken in this paper is to bound the difference between the empirical risk of and as a regularization term, but one could also optimize the empirical risk of directly. An immediate idea would be to train a good and then use the resulting , since the empirical risk of provides an upper bound on the adversarial risk of . For computational reasons, this may not be practical for the tree transform, in which case one might need to explore alternative transformations.


Appendix A Rademacher Complexity Proofs

In this section, we prove Lemmas 1 and 2, which are the bounds on the empirical Rademacher complexities of and . The proofs are largely based on pre-existing proofs for bounding the empirical Rademacher complexities of and , and this simplicity is part of what makes and attractive.

Proof of Lemma 1.

Using Proposition 2, we have

By Lemma 10, the empirical Rademacher complexity of a linear function class is given by

Thus, it remains to analyze the second term in the upper bound.

If the sum of the ’s is negative, the maximizing the supremum is the zero vector. Alternatively, if the sum is positive, we clearly have the upper bound . Thus, we have

where follows because and have the same distribution, and the last inequality follows by Jensen’s inequality. The last term is equal to , using the fact that the ’s are independent, zero-mean, and unit-variance random variables. Putting everything together yields

which completes the proof. ∎

Proof of Lemma 2.

Our broad goal is to peel off the layers of the neural network one at a time. Most of the work is done by Lemma 7. The proof is essentially the same as the Rademacher complexity bounds on neural networks of Golowich et al. [2018] until we reach the underlying linear classifier. We then bound the action of the adversary in an analogous manner to the linear case.

We write

Recalling the form of from equation (5), we can apply Lemma 7 successively times with for various in order to remove the layers of the neural network. Specifically, we use , , , up to , as we peel away the layers and retain the bounds on the matrix norms from the layers that we have removed. This implies

Note that the maxima over are accumulated from each application of Lemma 7. These maxima correspond to taking a worst-case path through the tree. To bound the first term, we apply the Cauchy-Schwarz inequality. To bound the second term, we use the inequality

Thus, we have