Deep learning systems are becoming ubiquitous in everyday life. From virtual assistants on phones to image search and translation, neural networks have vastly improved the performance of many computerized systems in a short amount of time (Goodfellow et al., 2016). However, neural networks have a variety of shortcomings: A peculiarity that has gained much attention over the past few years has been the apparent lack of robustness of neural network classifiers to adversarial perturbations. Szegedy et al. (2013) noticed that small perturbations to images could cause neural network classifiers to predict the wrong class. Further, these perturbations could be carefully chosen so as to be imperceptible to humans.
Such observations have instigated a deluge of research in finding adversarial attacks (Athalye et al., 2018; Goodfellow et al., 2014; Papernot et al., 2016; Szegedy et al., 2013), defenses against adversaries for neural networks (Madry et al., 2018; Raghunathan et al., 2018; Sinha et al., 2018; Wong and Kolter, 2018), evidence that adversarial examples are inevitable (Shafahi et al., 2018), and theory suggesting that constructing robust classifiers is computationally infeasible (Bubeck et al., 2018)
. Attacks are usually constructed assuming a white-box framework, in which the adversary has access to the network, and adversarial examples are generated using a perturbation roughly in the direction of the gradient of the loss function with respect to a training data point. This idea generally produces adversarial examples that can break ad-hoc defenses in image classification.
Currently, strategies for creating robust classification algorithms are much more limited. One approach (Madry et al., 2018; Suggala et al., 2018) is to formalize the problem of robustifying the network as a novel optimization problem, where the objective function is the expected loss of a supremum over possible perturbations. However, Madry et al. (2018) note that the objective function is often not concave in the perturbation. Other authors (Raghunathan et al., 2018; Wong and Kolter, 2018) have leveraged convex relaxations to provide optimization-based certificates on the adversarial loss of the training data. However, the generalization performance of the training error to unseen examples is still not understood.
The optimization community has long been interested in constructing robust solutions for various problems, such as portfolio management (Ben-Tal et al., 2009), and deriving theoretical guarantees. Robust optimization has been studied in the context of regression and classification (Trafalis and Gilbert, 2007; Xu et al., 2009a, b). More recently, a notion of robustness that attempts to minimize the risk with respect to the worst-case distribution close to the empirical distribution has been the subject of extensive work (Ben-Tal et al., 2013; Namkoong and Duchi, 2016, 2017). Researchers have also considered a formulation known as distributionally robust optimization, using the Wasserstein distance as a metric between distributions (Esfahani and Kuhn, 2015; Blanchet and Kang, 2017; Gao et al., 2017; Sinha et al., 2018). With the exception of Sinha et al. (2018), generalization bounds of a learning-theoretic nature are nonexistent, with most papers focusing on studying properties of a regularized reformulation of the problem. Sinha et al. (2018) provide bounds for Wasserstein distributionally robust generalization error based on covering numbers for sufficiently small perturbations. This is sufficient for ensuring a small amount of adversarial robustness and is quite general; but for classification using neural networks, known covering number bounds (Bartlett et al., 2017) are substantially weaker than Rademacher complexity bounds (Golowich et al., 2018).
Although neural networks are rightly the subject of attention due to their ubiquity and utility, the theory that has been developed to explain the phenomena arising from adversarial examples is still far from complete. For example, Goodfellow et al. (2014) argue that non-robustness may be due to the linear nature of neural networks. However, attempts at understanding linear classifiers (Fawzi et al., 2018) argue against linearity, i.e., the function classes should be more expressive than linear classification.
In this paper, we provide upper bounds for a notion of adversarial risk in the case of linear classifiers and neural networks. These bounds may be viewed as a sample-based guarantee on the risk of a trained classifier, even in the presence of adversarial perturbations on the inputs. The key step is to transform a classifier into an “adversarially-perturbed" classifier by modifying the loss function. The risk of the function can then be analyzed in place of the adversarial risk of ; in particular, we can more easily provide bounds on the Rademacher complexities necessary for bounding the robust risk. Finally, our transformations suggest algorithms for minimizing the adversarially robust empirical risk. Thus, from the theory developed in this paper, we can show that adversarial perturbations have somewhat limited effects from the point of view of generalization error.
This paper is organized as follows: We introduce the precise mathematical framework in Section 2. In Section 3, we discuss our main results. In Section 4, we provide results on optimizing the adversarial risk bounds. In Section 5, we prove our key theoretical contributions. Finally, we conclude with a discussion of future avenues of research in Section 6.
Notation: For a matrix , we write to denote the -operator norm. We write
to denote the Frobenius norm. For a vector, we write to denote the -norm.
We consider a standard statistical learning setup. Let be a space of covariates, and define the space of labels to be . Let . Suppose we have observations , drawn i.i.d. according to some unknown distribution . We write .
A classifier corresponds to a function , where . Thus, the function may express uncertainty in its decision; e.g., prediction in allows the classifier to select an expected outcome.
2.1 Risk and Losses
Given a loss function , our goal is to minimize the adversarially robust risk, defined by
where is an adversarially chosen perturbation in the -ball of radius . For simplicity, we write , so the input is perturbed by a vector in the -ball of radius , but still classified according to . Usually in the literature, is taken to be , , or ; the case has received particular interest. Also note that if , the adversarial risk reduces to the usual statistical risk, for which upper bounds based on the empirical risk are known as generalization error bounds. For some discussion of the relationship between the adversarial risk to the distributionally robust risk, see Appendix E.
We now define a few specific loss functions. The indicator loss
is of primary interest in classification; in both the linear classifier and neural network classification settings, we will primarily be interested in bounding the adversarial risk with respect to the indicator loss. As is standard in linear classification, we also define the hinge loss
which is a convex surrogate for the indicator loss, and will appear in some of our bounds. We also introduce the indicator of whether the hinge loss is positive, defined by
For analyzing neural networks, we will also employ the cross-entropy loss, defined by
where is the softmax function:
Note that in all of the cases above, we can also write the loss , for an appropriately defined loss . Furthermore, and are 1-Lipschitz.
2.2 Function Classes and Rademacher Complexity
We are particularly interested in two function classes: linear classifiers and neural networks. We denote the first class by , and we write an element of , parametrized by and , as
We similarly denote the class of neural networks as , and we write a neural network , parametrized by and , as
where each is a matrix and each is a monotonically increasing
-Lipschitz activation function applied elementwise to vectors, such that. For example, we might have
, which is the ReLU function. The matrixis of dimension , where and . We use to denote the th row of , with th entry . Also, when discussing indices, we write as shorthand for .
A standard measure of the complexity of a class of functions is the Rademacher complexity. The empirical Rademacher complexity of a function class and a sample is
’s are i.i.d. Rademacher random variables; i.e., the’s are random variables taking the values and
, each with probability. Note that denotes the expectation with respect to the ’s. Finally, we note that the standard Rademacher complexity is obtained by taking an expectation over the data: .
3 Main Results
We introduce our main results in this section. The trick is to push the supremum through the loss and incorporate it into the function , yielding a transformed function . We require this transformation to satisfy
so an upper bound on the transformed risk leads to an upper bound on the adversarial risk. We call the proposed functions the supremum transformation and tree transformation in the cases of linear classifiers and neural networks, respectively.
In both cases, we have to make a minor assumption about the loss. The assumption is that is monotonically decreasing in : Specifically, is decreasing in and is increasing in . This is not a stringent assumption, and is satisfied by all of the loss functions mentioned earlier.
One technicality is that the transformed function needs to be a function of both and ; i.e., we have . Thus, the loss of a transformed function is . We now define the essential transformations studied in our paper.
The supremum (sup) transform is defined by
Additionally, we define to be the transformed function class
We now have the following result:
Let be a loss function that is monotonically decreasing in . Then
The consequence of the supremum transformation can be seen by taking the expectation:
Thus, we can bound the adversarial risk of a function with a bound on the usual risk of via Rademacher complexities. For linear classifiers, we shall see momentarily that the supremum transformation can be calculated exactly.
3.1 The Supremum Transformation and Linear Classification
We start with an explicit formula for the supremum transform.
Let . Then the supremum transformation takes the explicit form
where satisfies .
The proof is contained in Section 5.
Next, the key ingredient to a generalization bound is an upper bound on the Rademacher complexity of .
Let be a compact linear function class such that and for all , where . Suppose for all . Then we have
This leads to the following upper bound on adversarial risk, proved in Appendix C:
Let be a collection of linear classifiers such that, for any classifier in , we have and . Let be a constant such that for all . Then for any , we have
with probability at least .
As seen in the proof of Corollary 1, the loss involved in defining the adversarial risk could be replaced by another loss, which would then need to be upper-bounded by a Lipschitz loss function (in this case, the hinge loss). The empirical version of the latter loss would then appear on the right-hand side of the bounds.
An immediate question is how our adversarial risk bounds compare with the case when perturbations are absent. Plugging into the equations above yields the usual generalization bounds of the form
so the effect of an adversarial perturbation is essentially to introduce an additional term as well as an additional contribution to the empirical risk that depends linearly on . The additional empirical risk term vanishes if classifies adversarially perturbed points correctly, since in that case.
Clearly, we could further upper-bound the regularization term in equation (3) by . This is essentially the bound obtained for the empirical risk for Wasserstein distributionally robust linear classification (Gao et al., 2017). However, this bound is loose when a good robust linear classifier exists, i.e., when is small relative to . Thus, when good robust classifiers exist, distributional robustness is relatively conservative for solving the adversarially robust problem (cf. Appendix E).
3.2 The Tree Transformation and Neural Networks
In this section, we consider adversarial risk bounds for neural networks. We begin by introducing the tree transformation, which unravels the neural network into a tree in some sense.
Let be a neural network given by
Define the terms and by
Then the tree transform is defined by
Intuitively, the tree transform (5) can be thought of as a new neural network classifier where the adversary can select a different worst-case perturbation for each path through the neural network from the input to the output indexed by . This leads to distinct paths through the network for given inputs and , and if these paths were laid out, they would form a tree (see Section 3.3).
Next, we show that the risk of the tree transform upper-bounds the adversarial risk of the original neural network.
Let be monotonically decreasing in . Then we have the inequality
As an immediate corollary, we obtain
so it suffices to bound this latter expectation. We have the following bound on the Rademacher complexity of :
Let be a class of neural networks of depth satisfying and , for each , and let . Additionally, suppose and for all . Then we have the bound
Finally, we have our adversarial risk bounds for neural networks. The proof is contained in Appendix C.
Let be a class of neural networks of depth . Let . Under the same assumptions as Lemma 2, for any , we have the upper bounds
with probability at least .
As in the linear case, we can essentially recover pre-existing non-adversarial risk bounds by setting (Bartlett et al., 2017; Golowich et al., 2018). Again, the effect of adversarial perturbations on the adversarial risk is the addition of on top of the empirical risk bounds for the unperturbed loss. Finally, the bound (6) includes an extra perturbation term that is linear in , with coefficient reflecting the Lipschitz coefficient of the neural network, as well as a term , which decreases as improves as a classifier because is small when is small. A similar term appears in the bound (3).
3.3 A Visualization of the Tree Transform
In this section, we provide a few pictures to illustrate the tree transform. Consider the following two-layer network with two hidden units per layer:
We begin by with visualizing in Figure 1.
The corresponding network is shown in Figure 2.
Finally, we examine the entire tree transform. This is
the result, shown in Figure 3, yields a tree-structured network.
In particular, we note that now the visualization of the network reveals a tree. This is the reason that is called the tree transform.
4 Optimization of Risk Bounds
In practice, our sample-based upper bounds on adversarial risk suggest the strategy of optimizing the bounds in the corollaries, rather than simply the empirical risk, to achieve robustness of the trained networks against adversarial perturbations. Accordingly, we provide two algorithms for optimizing the upper bounds appearing in Corollary 1. One idea is to optimize the first bound (2) directly. Recalling the form of , this leads to the following optimization problem:
The second approach involves optimizing the second adversarial risk bound (3). Although this bound is generally looser than the bound (2), we comment on optimization due to the fact that regularization has been suggested as a way to encourage generalization. However, note that the regularization coefficient in the bound (3) depends on . Thus, we propose to perform a grid search over the value of the regularization parameter.
We then have the optimization problem
Note, however, that is nonconvex, and the form as a function of and is complicated. We propose to take for and solve
At the end, we simply pick the solution minimizing the objective function in equation (11) over all . Note that this involves evaluating equation (10), but this is easy to do in the linear case. This method is summarized in Algorithm 2.
We now present the proofs of our core theoretical results regarding the transform functions and .
Proof of Proposition 1.
We break our analysis into two cases. If , then is decreasing in . Thus, we have
If instead , then is increasing in , so
This completes the proof. ∎
Proof of Proposition 2.
Using the definition of the sup transform, we have
where the final equality comes from the variational definition of the -norm. This completes the proof. ∎
Before we begin the proof of Proposition 3, we state, prove, and remark upon a helpful lemma. We want to apply this iteratively to push the supremum inside the layers of the neural network.
Let be a function and define to be a monotonically increasing function applied elementwise to vectors. Then we have the inequality
Denote the left hand-side of the desired inequality by . First, we can push the supremum inside the sum to obtain
Next, note that
Since is monotonically increasing, we see that the map is monotonically increasing, as well. Thus, the supremum in equation (13) is obtained when is maximized. Hence, we obtain
which completes the proof. ∎
Note that if , where , this lemma yields
If we apply Lemma 3 again, we obtain
In particular, we note that the sign terms accumulate within the supremum, but when we take the supremum inside another layer, the sign terms remaining in the previous layers cancel out and are incorporated into the of the next layer.
Proof of Proposition 3.
First note that the assumption that is monotonically decreasing in is equivalent to being monotonically increasing in . As in the proof of Proposition 1, if , we want to show that ; if , we want to show that . Thus, it is our goal to establish the inequality
We define and show how to take the supremum inside each layer of the neural network to yield . To this end, we simply apply Lemma 3 and Remark 5 iteratively until the remaining function is linear. Thus, we see that
and simplifying gives
The final supremum clearly evaluates to . Recalling the definition (4) of , we then have
which proves the proposition. ∎
We have presented a method of transforming classifiers to obtain upper bounds on the adversarial risk. We have shown that bounding the generalization error of the transformed classifiers may be performed using similar machinery for obtaining traditional generalization bounds in the case of linear classifiers and neural network classifiers. In particular, since the Rademacher complexity of neural networks only has a small additional term due to adversarial perturbations, generalization even in the presence of adversarial perturbations should not be impossibly difficult for binary classification.
We mention several future directions for research. First, one might be interested in extending the supremum transformation to other types of classifiers. The most interesting avenues would include calculating explicit representations as in the case of linear classifiers, suitable alternative transformations as in the case of neural networks, and bounds on the resulting Rademacher complexities.
A second direction is to understand the tree transformation better and develop algorithms for optimizing the resulting adversarial risk bounds. One view that we have taken in this paper is to bound the difference between the empirical risk of and as a regularization term, but one could also optimize the empirical risk of directly. An immediate idea would be to train a good and then use the resulting , since the empirical risk of provides an upper bound on the adversarial risk of . For computational reasons, this may not be practical for the tree transform, in which case one might need to explore alternative transformations.
Athalye et al. 
A. Athalye, N. Carlini, and D. Wagner.
Obfuscated gradients give a false sense of security: Circumventing
defenses to adversarial examples.
In Proceedings of the 35th International Conference on Machine
, Proceedings of Machine Learning Research. PMLR, July 2018.
- Bartlett et al.  P. L. Bartlett, D. J. Foster, and M. J. Telgarsky. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, pages 6240–6249, 2017.
- Ben-Tal et al.  A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton University Press, 2009.
- Ben-Tal et al.  A. Ben-Tal, D. Den Hertog, A. De Waegenaere, B. Melenberg, and G. Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
- Blanchet and Kang  J. Blanchet and Y. Kang. Semi-supervised learning based on distributionally robust optimization. arXiv preprint arXiv:1702.08848, 2017.
- Boucheron et al.  S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
- Bubeck et al.  S. Bubeck, E. Price, and I. Razenshteyn. Adversarial examples from computational constraints. arXiv preprint arXiv:1805.10204, 2018.
- Esfahani and Kuhn  P. M. Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, pages 1–52, 2015.
- Fawzi et al.  A. Fawzi, O. Fawzi, and P. Frossard. Analysis of classifiers’ robustness to adversarial perturbations. Machine Learning, 107(3):481–508, 2018.
- Gao et al.  R. Gao, X. Chen, and A. J. Kleywegt. Distributional robustness and regularization in statistical learning. arXiv preprint arXiv:1712.06050, 2017.
- Golowich et al.  N. Golowich, A. Rakhlin, and O. Shamir. Size-independent sample complexity of neural networks. In Conference On Learning Theory, pages 297–299, 2018.
- Goodfellow et al.  I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. Deep Learning, volume 1. MIT Press Cambridge, 2016.
- Goodfellow et al.  I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Ledoux and Talagrand  M. Ledoux and M. Talagrand. Comparison theorems, random geometry and some limit theorems for empirical processes. The Annals of Probability, pages 596–631, 1989.
- Ledoux and Talagrand  M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Springer Science & Business Media, 1991.
- Madry et al.  A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb.
- Mohri et al.  M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of Machine Learning. MIT Press, 2012.
- Namkoong and Duchi  H. Namkoong and J. C. Duchi. Stochastic gradient methods for distributionally robust optimization with -divergences. In Advances in Neural Information Processing Systems, pages 2208–2216, 2016.
- Namkoong and Duchi  H. Namkoong and J. C. Duchi. Variance-based regularization with convex objectives. In Advances in Neural Information Processing Systems, pages 2971–2980, 2017.
- Papernot et al.  N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016.
- Raghunathan et al.  A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Bys4ob-Rb.
- Shafahi et al.  A. Shafahi, W. R. Huang, C. Studer, S. Feizi, and T. Goldstein. Are adversarial examples inevitable? arXiv preprint arXiv:1809.02104, 2018.
- Sinha et al.  A. Sinha, H. Namkoong, and J. Duchi. Certifiable distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Hk6kPgZA-.
- Suggala et al.  A. S. Suggala, A. Prasad, V. Nagarajan, and P. Ravikumar. On adversarial risk and training. arXiv preprint arXiv:1806.02924, 2018.
- Szegedy et al.  C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
Trafalis and Gilbert 
T. B. Trafalis and R. C. Gilbert.
Robust support vector machines for classification and computational issues.Optimisation Methods and Software, 22(1):187–198, 2007.
- Wong and Kolter  E. Wong and Z. Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, pages 5286–5295. PMLR, July 2018.
- Xu et al. [2009a] H. Xu, C. Caramanis, and S. Mannor. Robust regression and Lasso. In Advances in Neural Information Processing Systems, pages 1801–1808, 2009a.
- Xu et al. [2009b] H. Xu, C. Caramanis, and S. Mannor. Robustness and regularization of support vector machines. Journal of Machine Learning Research, 10(Jul):1485–1510, 2009b.
Appendix A Rademacher Complexity Proofs
In this section, we prove Lemmas 1 and 2, which are the bounds on the empirical Rademacher complexities of and . The proofs are largely based on pre-existing proofs for bounding the empirical Rademacher complexities of and , and this simplicity is part of what makes and attractive.
Proof of Lemma 1.
Using Proposition 2, we have
By Lemma 10, the empirical Rademacher complexity of a linear function class is given by
Thus, it remains to analyze the second term in the upper bound.
If the sum of the ’s is negative, the maximizing the supremum is the zero vector. Alternatively, if the sum is positive, we clearly have the upper bound . Thus, we have
where follows because and have the same distribution, and the last inequality follows by Jensen’s inequality. The last term is equal to , using the fact that the ’s are independent, zero-mean, and unit-variance random variables. Putting everything together yields
which completes the proof. ∎
Proof of Lemma 2.
Our broad goal is to peel off the layers of the neural network one at a time. Most of the work is done by Lemma 7. The proof is essentially the same as the Rademacher complexity bounds on neural networks of Golowich et al.  until we reach the underlying linear classifier. We then bound the action of the adversary in an analogous manner to the linear case.
Recalling the form of from equation (5), we can apply Lemma 7 successively times with for various in order to remove the layers of the neural network. Specifically, we use , , , up to , as we peel away the layers and retain the bounds on the matrix norms from the layers that we have removed. This implies
Note that the maxima over are accumulated from each application of Lemma 7. These maxima correspond to taking a worst-case path through the tree. To bound the first term, we apply the Cauchy-Schwarz inequality. To bound the second term, we use the inequality
Thus, we have