Deep models, while being extremely versatile and accurate, are vulnerable to adversarial attacks: slight perturbations that are imperceptible to humans can completely flip the prediction of deep models. Many attack and defense mechanisms have been proposed, although a satisfying solution still largely remains elusive. In this work, we give strong evidence that during training, deep models maximize the minimum margin in order to achieve high accuracy, but at the same time decrease the average margin hence hurting robustness. Our empirical results highlight an intrinsic trade-off between accuracy and robustness for current deep model training. To further address this issue, we propose a new regularizer to explicitly promote average margin, and we verify through extensive experiments that it does lead to better robustness. Our regularized objective remains Fisher-consistent, hence asymptotically can still recover the Bayes optimal classifier.

## Authors

• 4 publications
• 28 publications

Although deep networks achieve strong accuracy on a range of computer vi...
07/14/2020 ∙ by Chengzhi Mao, et al. ∙ 31

• ### Defense-guided Transferable Adversarial Attacks

Though deep neural networks perform challenging tasks excellently, they ...
10/22/2020 ∙ by Zifei Zhang, et al. ∙ 0

• ### Average Margin Regularization for Classifiers

Adversarial robustness has become an important research topic given empi...
10/09/2018 ∙ by Matt Olfat, et al. ∙ 0

• ### Improved Detection of Adversarial Attacks via Penetration Distortion Maximization

This paper is concerned with the defense of deep models against adversar...
11/03/2019 ∙ by Shai Rozenberg, et al. ∙ 0

• ### A Deep Marginal-Contrastive Defense against Adversarial Attacks on 1D Models

Deep learning algorithms have been recently targeted by attackers due to...
12/08/2020 ∙ by Mohammed Hassanin, et al. ∙ 0

• ### Fuzzy Unique Image Transformation: Defense Against Adversarial Attacks On Deep COVID-19 Models

Early identification of COVID-19 using a deep model trained on Chest X-R...
09/08/2020 ∙ by Achyut Mani Tripathi, et al. ∙ 4

• ### Beyond Sparsity: Tree Regularization of Deep Models for Interpretability

The lack of interpretability remains a key barrier to the adoption of de...
11/16/2017 ∙ by Mike Wu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Traditionally, training more accurate classifiers has been much of the focus of machine learning research. More recently, as machine learning models start to penetrate into safety-critical applications, their robustness against random or even adversarial manipulations has drawn a lot of attention. The work of

Szegedy et al. [1] demonstrated surprisingly that it is possible to craft very minimal changes to an input image that (a) is clearly not perceivable by humans but (b) can completely flip the predictions of state-of-the-art deep models. Such detrimental perturbations are called adversarial examples and have raised serious concerns on the safety of deep models.

Since the work of Szegedy et al. [1], a sequence of works have proposed new attack algorithms to generate adversarial examples [e.g. 2, 3, 4, 5, 6, 7, 8, 9], as well as defensive techniques [e.g. 10, 11, 12, 13, 14] to train more robust models. There even appears to be an arm race between attack and defense techniques: new defense techniques are shown nonrobust under stronger attacks shortly after being proposed [15]. Thus, a line of research focusing on provable certification of (non)robustness has emerged. For example, one can use the Lipschitz constant of the network to provide a lower bound on the amount of needed adversarial perturbations [e.g. 16, 17, 18, 19]. More recently, certification methods base on mixed integer programming (MIP) [20, 21] have been proposed to provide more accurate lower bounds. However, due to the NP-hardness of MIP, convex relaxations [22, 23, 24, 25, 26, 27] have been popular for scaling these provable certification methods to larger models.

In this paper, we study adversarial robustness from a margin perspective. The notion of margin distribution dates back to [28, 29, 30, 31], which provide generalization bounds based on the margin distribution. Later on, algorithms that control margin distribution for better generalization performance have been proposed [32]. In this work, however, we connect the notion of margin with adversarial robustness, and we reveal a surprising trade-off between minimum and average margin in standard deep model training. In particular, the recent result in [33]

implies that on a linearly separable dataset a linear classifier trained with (stochastic) gradient descent converges to the SVM solution, under mild conditions on the loss function. Since SVM explicitly maximizes the minimum margin, any linear classifier would eventually achieve the same. However, our experiments reveal that the average margin is greatly reduced during training. In other words, models maximize the minimum margin at the expense of reducing the average margin hence becoming more susceptible to adversarial attacks. This surprising phenomenon was also confirmed on linearly nonseparable datasets and a variety of nonlinear classifiers. Interestingly, a few defense methods based on margin maximization have been proposed

[34, 35, 36], which through our work, should be understood as maximizing the average margin, as opposed to the minimum margin.

To overcome the trade-off between minimum margin and average margin in standard training, we propose the standard training objective with an average-margin promoting regularizer. We prove that the regularized objective remains Fisher consistent, if the original loss is so. This means as sample size grows, the classifier obtained through optimizing our regularized objective still approaches the Bayes optimal classifier, while potentially is much more robust than standard training. Our regularizer can be easily extended to multiclass and nonlinear classifiers. We confirm the latter point through extensive experiments on two standard datasets and a variety of different models and settings.

To summarize, we make the following contributions in this work:

• [leftmargin=*, noitemsep, nolistsep]

• We reveal the intrinsic trade-off between minimum margin and average margin;

• We propose a new regularizer that explicitly promotes average margin and retains Fisher consistency;

• We perform extensive experiments to confirm the effectiveness of our new regularizer.

## 2. Background

We consider the multi-category classification problem. Given a set of training instances , where the feature domain and with the number of categories, we are interested in finding a classifier , or equivalently, a set partition of the domain such that and for all , . Often in practice, the classifier

is learned through a vector-valued function

, with the argmax prediction rule employed to predict the label of a test sample as follows:

 (1)

As is well-known, the two representations of a classifier, i.e. either through a set partition or through a vector-valued function , are equivalent. We will use both representations interchangeably.

In their seminal work, Szegedy et al. [1] defined the robustness of a classifier on a test sample as the minimum perturbation needed to flip its prediction (c.f. (1)):

 (2) r(x)=inf{∥z∥:x+z∈X, ^y(x+z)≠^y(x)},

where is an abstract norm (e.g. the familiar

norm) for measuring the amount of perturbation. This notion of robustness turns out to have a very intuitive geometric meaning, as we detail below. We remind that we also represent a classifier

as a set partition .

For any , define so that , i.e. is the predicted label of our classifer. We define the distance from a point to a set as:

 (3) d(x,F):=inf{∥x−z∥:z∈F}.

Recall that the notations , , and denote the closure, interior, complement and the boundary111Of course, . of a point set , respectively. We remark that for any set :

 (4) d(x,F)={0,  if x∈clFd(x,bdF),  otherwise ,

which will be important when we aim to optimize some function of the distance.

Note that is the distance to the decision boundary, which captures the definition (2) exactly:

###### Proposition 1.

Let be a vector-valued function that, if augmented with the argmax prediction rule (1), leads to the following sets (classifier): for all ,

 (5) Fk:={x∈X:fk(x)=maxl=1,…,cfl(x)},

where we break ties arbitrarily. Then the robustness definition in Equation 2 equals to .

We are now ready to define the individual margin of a sample pair w.r.t. the classifier as:

 (6) m(x,y):=m(x,y;{Fk}):=sign(^y(x),y)⋅d(x,bdF^y),

where we define if and otherwise. Namely, when predicting correctly, is the distance to the decision boundary, i.e. the minimum perturbation needed to change the prediction; whereas when predicting wrongly, it is the negation of the distance to the decision boundary, i.e. the minimum perturbation needed to correct the prediction. We then define

• [leftmargin=*, noitemsep, nolistsep]

• minimum margin:

• average margin: , simply the average distance to the decision boundary.

Although the definition (3) applies for any norm, we will focus on margin later, as some current results of implicit minimum margin maximization [33, 37, 38] only holds for norm, which will be discussed in §3. From now on, we use to denote the Euclidean norm.

## 3. Minimum vs. Average Margin: A Case Study

In this section, we study a simple example to demonstrate the trade-off between minimum and average margin. In statistical learning theory, it is well-known that we can bound the generalization error of a classifier using empirical margins

[39]

. In particular, the celebrated support vector machines (SVM) explicitly maximize the minimum margin (on any linearly separable dataset). In adversarial learning, however, we argue that average margin is more indicative of the robustness of a classifier.

Our main observations in this section are: (a) Current machine learning models, especially when they become deep and powerful, implicitly maximize the minimum margin to achieve high accuracy; (b) there appears to be an inherent trade-off between minimum margin and average margin. In particular, by maximizing the minimum margin, the model also (unconsciously) minimizes the average margin, hence becomes susceptible to adversarial attacks. Note that we do not claim minimum margin always contradicts average margin. It does not, as can be easily shown through carefully constructed toy datasets (e.g. a hightly symmetric dataset). Our empirical observation is that there does appear to be some tension between the two notions of margin on real datasets.

To begin with, we first demonstrate that minimum margin is maximized during standard training. In fact, this can be formally established through the implicit bias of optimization algorithms, such as (stochastic) gradient descent which is ubiquitously used in training deep models. Recall that on a (linearly) separable dataset, (hard-margin) linear SVM explicitly maximizes the minimum margin. The following result, due to Soudry et al. [33]

, confirms the same for most models currently used in machine learning, including logistic regression.

###### Theorem 1 ([33]).

For almost all linearly separable binary datasets and any smooth decreasing loss with an exponential tail, gradient descent with small constant step size and any starting point converges to the (unique) solution of hard-margin SVM, i.e.

Note that the margin of a linear classifier , with decision boundary , can be computed in closed-form: , which only depends on the direction of the weight vector . Thus, Theorem 1 implies in particular that if we optimize logistic regression by gradient descent on a linearly separable dataset, then we implicitly maximize the minimum margin, just as in SVM. Besides linear classifiers, the same implicit bias towards maximizing the minimum margin has also been discovered for deep networks [37, 38].

It is then natural to ask the next question: How does average margin change during training? To answer this question, we train a binary logistic regression (LR) using gradient descent on MNIST to discriminate ’s from ’s. Note that the subset of MNIST consisting of only ’s and ’s is indeed linearly separable, as LR achieves zero training error (see Figure 3 in appendix). All conditions of Theorem 1 are thus satisfied, and we expect LR to maximize minimum margin. Indeed, Figure 0(a) confirms that the minimum margin continues to increase during training until it approaches that of hard-margin SVM, as predicted by Theorem 1. Meanwhile, the average margin decreases drastically after a few epochs at the very beginning, and then keeps decreasing. Interestingly, the minimum margin continues to increase while the average margin continues to decrease even after the training error reaches zero, which means the decision boundary still changes even after training error diminishes.

To gain further insight, in Figure 0(b) we plot the histogram of margins at different training epochs. We observe that the margin distribution shifts towards the left during training and eventually approaches that of hard-margin SVM. In other words, during training the majority of data is pushed towards the decision boundary, leading LR to become more and more vulnerable to adversarial attacks. Indeed, Figure 0(c) confirms this by visualizing the adversarial examples constructed at different training epochs. Note that for a linear classifier the adversarial example with minimum perturbation can be explicitly determined as

which is the point on the decision boundary that is closest to the training example . As shown in Figure 0(c), the adversarial examples gradually become more and more imperceptible, indicating that LR becomes more and more vulnerable to adversarial attacks.

A few remarks are in order. First, we have carefully designed our experiment so that (a) the theoretical results in Theorem 1 apply; (b) the margins and adversarial examples can be explicitly computed. However, similar phenomenon is also observed for deep models on different datasets where the conditions of Theorem 1 may be violated or the margins can only be approximately computed; see §5 for more of these experiments. Second, we remark that the decrease of average margin cannot be caused by overfitting. This is because the test accuracy continues to decrease during training (see training curves in Figure 3 of the appendix). Moreover, we observe that the average margin decreases on both training and test sets. To summarize, we conclude that practically, deep models try to maximize the minimum margin during training at the expense of sacrificing the average margin, hence become susceptible to adversarial attacks. To address this issue, in the next section we propose to explicitly optimize the average margin through appropriate regularization.

## 4. An Average Margin Regularizer

In this section, we propose a regularization function to explicitly promote average margin. We first handle linear classifiers in §4.1 through maximizing the average margin in the input space directly. For nonlinear classifiers, maximizing the margin in the input space directly is intractable. Instead, we maximize a lower bound of input space margins in §4.2, through simultaneously maximizing feature space margin and controlling the Lipschitz constant of the network.

The most straightforward way to improve robustness of a classifier is through explicit regularization. In particular, we consider the following regularized problem:

 (8) minf:X→Rc 1nn∑i=1ϕ(yi,f(xi))−λ⋅1nn∑i=11yi=^yi⋅dτ(xi,bdFyi)

where is the loss function we use to measure the accuracy of our classifier , are the sets induced by (using the argmax rule) , is the regularization constant that balances the two objectives, and is the truncated distance. In particular, we only maximize the margin when the classifier makes a correct prediction and the margin is capped at

to avoid being dominated by some outliers. We show next how to solve (

8) when is a binary linear classifier.

### 4.1. Binary Linear Classifier

In this section we assume there are two classes, i.e. and we consider the linear classifier (w.l.o.g. we omit the bias term). Correspondingly, let and the distance The regularized problem (8) reduces to

 (9) min∥w∥=1  n∑i=1ϕ(yiw⊤xi)−λ⋅n∑i=1[yiw⊤xi]τ0,

where . The second regularization term can be written as a difference of two convex functions. Indeed, define , then

 (10) −[t]τ0=Hτ(t)−H0(t)+τ,

and the objective function simplifies to

 (11) min∥w∥=1  n∑i=1ϕ(yiw⊤xi)+λ⋅n∑i=1[Hτ−H0](yiw⊤xi),

where and

are tuning hyperparameters.

Quite pleasantly, our average margin regularizer not only promotes robustness, it also retains Fisher consistency (aka classification-calibrated), namely that the classifier we obtain by minimizing the regularized objective in (9) still approaches the Bayes optimal classifier, as sample size increases to infinity.

###### Definition 1 (Fisher consistency [40]).

Suppose is a loss function and . For any , define the conditional -risk as We say a loss is Fisher consistent if for any ,

 (12) infα:α(2η−1)≤0Cϕη(α)>infα∈RCϕη(α).

For a Fischer consistent loss functions , (12) implies that to minimize the -risk, should satisfy , i.e. the decision of our classifier should match the decision of the Bayes classifier (which is optimal under the 0-1 loss). Fisher-consistency is a necessary condition for any reasonable loss, if our goal is to approximate the Bayes optimal classifier. The following result confirms the Fisher-consistency of our regularized objective (9) (see the proof in appendix).

###### Theorem 2.

Suppose the loss function is decreasing, continuous, bounded below and . Let be our average margin regularizer. Then the regularized loss is Fisher consistent for any .

The above condition on the loss is reasonable: it basically guarantees itself to be Fisher consistent. Common loss functions, such as the logistic loss, exponential loss and hinge loss, all satisfy this condition. Thus, our average margin regularizer, when combined with these typical, Fisher consistent loss functions, remains Fisher consistent.

### 4.2. Extension to Multiclass Deep Models

For deep neural networks, even computing the margin in the input space is already NP-hard

[41, 19], let alone optimizing it. However, the margin in the feature space provides a lower bound of the margin in the input space. For example, let be a deep neural network with layers, , where

is the activation function. Let

be the output of the second last fully connected layer, i.e. , then

 (13) ∥Φ(x1)−Φ(x2)∥≤Lip(Φ)∥x1−x2∥,

where is the Lipschitz constant of the feature map . The tightness of this bound is determined by the Lipschitz constant of the model. For example, if the model is -Lipschitz with , then the margin in the input space is exactly lower bounded by the margin in the feature space . The bound (13) motivates a natural way to optimize the margin in input space: we simultaneously fix the Lipschitz constant and maximize the margin in feature space. Note that this perspective allows us to treat any deep network as first performing a nonlinear feature transform through , and then applying a linear classifier on top.

The Lipschitz constant can be bounded by the product of norms of the weight matrices , where is the weight matrix in layer

, assuming the activation function is 1-Lipschitz, for instance ReLU

[1]. To control the overall Lipschitz constant , ideally, we want to enforce the Lipschitz constant of each layer to be exactly , namely constraining the spectral norm of each weight matrix to be approximately 1. Inspired by [11, 42], we add an orthogonal penalty into our training objective, which encourages the weight matrices to be orthogonal hence having spectral norm close to . For convolutional layers, we first flatten the convolutional filter and then apply the above penalty.

Once the Lipschitz constant is bounded, we directly maximize the average margin in the feature space, as a computationally efficient lower bound for promoting the average margin in the input space. Since our classifier is linear in the feature space, the distance to the decision boundary can be computed as in §4.1:

 (14) d(x,bdF^y)=mink≠^y(w^y−wk)⊤Φ(x),

where is the (normalized) weight vector in the final layer for the -th class. Similar to the binary case, we only maximize the margin for correctly classified examples and truncate the margin if it exceeds the threshold . In the end, our regularized objective is,

 (15)

where the first term is the standard training loss (such as cross-entropy), the third term is the orthogonal constraint for controlling the Lipschitz constant of the network, and the second term is the average margin penalty in the feature space, which also maximizes the input space average margin, provided that the Lipschitz constant of the network is indeed close to unity.

## 5. Experiments

In this section, we conduct experiments to (a) verify the minimum-average margin trade-off on a variety of (nonlinear) deep models; (b) demonstrate the effectiveness of our average margin regularizer. First, we train six models with various architectures on MNIST and CIFAR10 to verify the trade-off again. Then, we retrain these models with our average margin regularizer, and confirm that our regularizer indeed can effectively promote the average margin. Finally, we compare our average margin regularizer with other state-of-the-art defense techniques in the literature and find that our regularizer achieves comparable performance in terms of robustness and accuracy.

### 5.1. Approximating Distance

In general, exactly computing the distance to the decision boundary of deep models is intractable [41, 19]. Instead, we use the following Lipschitz lower bound as a reasonable approximation:

###### Theorem 3 ([16]).

Suppose is a multiclass classifier (with argmax prediction rule). Then, for any :

 (16)

where is the local Lipschitz constant of over the ball

The lower bound (16) is in fact exact when is linear, and for general nonlinear , it is reasonably close to the true distance, as shown empirically in [17]

. Here, our estimate for the Lipschitz constant

is simply the maximum norm of the gradient (difference) of many random samples in the ball , as we found this simple strategy is reasonably efficient and accurate. For space limits, we defer the experimental details on approximating this distance to the appendix.

### 5.2. Verifying the Margin Trade-off again on Deep Nonlinear Models

Due to space limits, here we only present the experimental results on the CIFAR10 dataset, while results on MNIST can be found in the appendix. For each model, we plot the average margin and minimum margin w.r.t. the number of training epochs in Figure 1. It is clear that similar phenomenon as those in §3 can be observed: the minimum margin continues increasing while at the same time the average margin keeps decreasing. This again highlights an intrinsic trade-off between minimum margin and average margin. In addition, we consistently observed that the average margin continues to decrease even after the training/test error has saturated (see training curves Figure 4 in the appendix).

To investigate how the margin distribution changes during training, we plot the histograms of margins in Figure 7 (appendix). As training proceeds, we observe again the margin distribution shifts towards left, meaning the majority of data points is pushed closer to the boundary. To some extent, this provides compelling explanation of the non-robustness of deep models: although achieving high accuracy by maximizing the minimum margin, the average margin, which is a better indicator of robustness, is at serious jeopardy. Note that early stopping, while helps preventing the average margin to decrease unnecessarily, is not sufficient by itself to promote average margin. Instead, an explicit average margin regularizer is more effective, as we show next.

### 5.3. Average Margin Regularizer

We retrain six models augmented with our average margin regularizer, and compare the average margin with standard training in Figure 2. It is clear that with our average margin regularizer, the regularized models no longer sacrifice the average margin during training. For all six models, adding the average margin regularizer significantly improves adversarial robustness. For example, the average margin of LR and MLP become times more than those of models with standard training. For CNN, the average margin is increased by a factor of .

Next, we compare our regularizer with Lipschitz constant regularization. In our regularizer, we use the orthogonal constraint to control the Lipschitz constant. The idea of using orthogonal constraint as Lipschitz constant regularizer has been proposed before (e.g. Paserval networks [11] and defensive quantization [42]). Here, we prvide a comparison between Lipschitz constant regularization and our average margin regularizer. We evaluate models using clean accuracy, robust accuracy under norm projected gradient descent (PGD) attack [2], and the average margin. We run iterations of PGD to generate adversarial examples, whose success serves as a good approximation of robust accuracy. The results can be found in Table 1. As we can see, when , the amount of allowed perturbations, is small, the accuracy of our average margin regularizer is comparable to that of the Lipschitz constant regularizer (LCR); but as becomes larger, the average margin regularizer consistently outperforms the latter. Interestingly, LCR alone can also improve adversarial robustness marginally [11, 42], when is small. This may be caused by implicit max-margin effect of the loss function used in training. However, as becomes larger, the robust accuracy of LCR is inferior to our method, sometimes even worse than standard training, which is also confirmed in the experimental results in [11, 36]. This implies that controlling Lipschitz constant alone may not be enough to train a robust model. Instead, one needs to maximize the average margin explicitly.

We also compare our method with deep defense (DD) [36], PGD adversarial training (Adv) [12] and the maximum margin regularizer (MMR) [35], which maximizes the linear region of ReLU netowrks. The results are shown in Table 2. Compared with standard training, our method can significantly improve model robustness. On MNIST, our method achieves overall better robust accuracy. On CIFAR10, our method achieves comparable results as adversarial training and deep defense. In addition, our method consistently outperforms MMR on both datasets. Moreover, our method is more efficient than adversarial training and deep defense. In fact, adversarial training would increase the training time by the number of PGD iterations (typically tens or hundreds), and deep defense is even more time consuming, to a point where it can only be applied as fine-tuning [36], while the training time of our method is roughly the same as the standard training.

## 6. Conclusion

In this work, we studied adversarial robustness from a margin perspective. We discovered the intrinsic trade-off between minimum and average margin, which appears across different models and datasets. We gave strong empirical evidence that deep models maximize the minimum margin during training to achieve high accuracy, but at the expense of decreasing the average margin significantly hence becoming more susceptible to adversarial attacks. To address the issue, we designed a new regularizer to explicitly maximize average margin and to retain Fisher consistency. Our extensive experiments confirmed the effectiveness of our regularizer. In the future, we will theoretically analyze the trade-off to provide further insights to the phenomenon. Moreover, we plan to go beyond just maximizing average margin, by controlling other margin distribution statistics such as variance.

## References

• Szegedy et al. [2014] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. In International Conference on Learning Representations (ICLR), 2014.
• Goodfellow et al. [2015] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.
• Moosavi-Dezfooli et al. [2016] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

, pages 2574–2582, 2016.
• Carlini and Wagner [2017] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks, 2017. arXiv:1608.
• Chen et al. [2017] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. In

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

, pages 15–26, 2017.
• Hashemi et al. [2018] Mohammad Hashemi, Greg Cusack, and Eric Keller. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pages 25–36, 2018.
• Du et al. [2018] Yali Du, Meng Fang, Jinfeng Yi, Jun Cheng, and Dacheng Tao. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pages 13–24, 2018.
• Papernot et al. [2017] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519, 2017.
• Zantedeschi et al. [2017] Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 39–49, 2017.
• Papernot et al. [2016] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy, 2016.
• Cisse et al. [2017] Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning, pages 854–863, 2017.

Towards deep learning model resisstant to adversarial attacks.

In International Conference on Learning Representations (ICLR), 2018.
• Goodfellow et al. [2018] Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Communications of the ACM, 61(7):56–66, 2018.
• Guo et al. [2018] Yiwen Guo, Chao Zhang, Changshui Zhang, and Yurong Chen. In Advances in Neural Information Processing Systems 31, pages 242–251, 2018.
• Athalye et al. [2018] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, 2018.
• Hein and Andriushchenko [2017] Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. In Advances in Neural Information Processing Systems (NIPS), pages 2266–2276, 2017.
• Weng et al. [2018a] Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. In International Conference on Learning Representations (ICLR), 2018a.
• Zhang et al. [2018] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in Neural Information Processing Systems, pages 4944–4953, 2018.
• Weng et al. [2018b] Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S. Dhillon, and Luca Daniel. Towards fast computation of certified robustness for relu networks. In ICML, 2018b.
• Tjeng et al. [2019] Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations, 2019.
• Singh et al. [2019] Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. Robustness certification with refinement. In International Conference on Learning Representations, 2019.
• Wong et al. [2018] Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems, pages 8400–8409, 2018.
• Wong and Kolter [2018] Eric Wong and J. Zico Kolter. In ICML, 2018.
• Raghunathan et al. [2018a] Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, pages 10900–10910, 2018a.
• Raghunathan et al. [2018b] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. In International Conference on Learning Representations, 2018b.
• Singh et al. [2018] Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. In Advances in Neural Information Processing Systems 31, pages 10802–10813. 2018.
• Gowal et al. [2018] Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models. In NeurIPS workshop on Security in Machine Learning. 2018.
• Breiman [1999] Leo Breiman. Neural Computation, 11(7):1493–1517, 1999.
• Schapire et al. [1998] Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. The Annals of Statistics, 26(5):1651–1686, 1998.
• Garg et al. [2002] Ashutosh Garg, Sariel Har-Peled, and Dan Roth. On generalization bounds, projection profile, and margin distribution. In ICML, 2002.
• Garg and Roth [2003] Ashutosh Garg and Dan Roth. Margin distribution and learning algorithms. In ICML, 2003.
• Zhang and Zhou [2017] Teng Zhang and Zhi-Hua Zhou. Multi-class optimal margin distribution machine. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 4063–4071. JMLR. org, 2017.
• Soudry et al. [2018] Daniel Soudry, Elad Hoffer, and Nathan Srebro. In International Conference on Learning Representations, 2018.
• Lee et al. [2019] Guang-He Lee, David Alvarez-Melis, and Tommi S. Jaakkola. Towards robust, locally linear deep networks. In International Conference on Learning Representations, 2019.
• Croce et al. [2019] Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu networks via maximization of linear regions. In AISTATS, 2019.
• Yan et al. [2018] Ziang Yan, Yiwen Guo, and Changshui Zhang. Deep defense: Training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems, pages 419–428, 2018.
• Gunasekar et al. [2018] Suriya Gunasekar, Jason D Lee, Daniel Soudry, and Nati Srebro. Implicit bias of gradient descent on linear convolutional networks. In Advances in Neural Information Processing Systems, pages 9461–9471, 2018.
• Ji and Telgarsky [2019] Ziwei Ji and Matus Telgarsky. Gradient descent aligns the layers of deep linear networks. In International Conference on Learning Representations, 2019.
• Koltchinskii and Panchenko [2002] V. Koltchinskii and D. Panchenko. The Annals of Statistics, 30(1):1–50, 2002.
• Bartlett et al. [2006] Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
• Katz et al. [2017] Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97–117, 2017.
• Lin et al. [2019] Ji Lin, Chuang Gan, and Song Han. In International Conference on Learning Representations, 2019.

## Appendix A Proofs

###### Proof of Proposition 1.

Indeed, according to the definitions in (5) and (2) we have

 (17) r(x) =inf{∥z∥:x+z∉F^y(x), x+z∈X} (18) =inf{∥(x+z)−x∥:x+z∉F^y(x), x+z∈X} (19) =d(x,¯F^y(x))=d(x,bdF^y(x))

###### Proof of Theorem 2.

Since both and are bounded below, is also bounded below. Denote . We need to show is strictly greater than for all . We first consider the case . For the case , the proof is similar.

The derivative of at zero is . Thus such that satisfies .

The right hand derivative of at zero is negative, thus there such that satisfies . Notice that is constant when . Thus satisfies .

Notice that . Combining above arguments, there exists a sufficiently small , such that , and . Splitting the interval into and , it is sufficient to show that both

 (20) infα≤−α0Cℓη(α)>m

and

 (21) inf−α0≤α≤0Cℓη(α)>m

hold.

For ,

 (22) Cℓη(α)−Cℓη(−α) =(2η−1)(ℓ(α)−ℓ(−α)) (23) ≥(2η−1)(ℓ(0)−ℓ(−α)) (24) ≥(2η−1)(ℓ(0)−ℓ(α0)) (25) >0.

The inequalities (23) and (24) are because is a decreasing function. The inequality (25) is because the specific choice of strictly decreases the value of at zero. Namely, flipping the sign of can strictly decrease the value of . Thus,

 (26) infα≤−α0Cℓη(α) ≥(2η−1)(ℓ(0)−ℓ(α0))+Cℓη(−α) (27) ≥(2η−1)(ℓ(0)−ℓ(α0))+m (28) >m.

For , by continuity there exists a minimizer for on this compact set. If , then we can again flip the sign of to get a strictly smaller value. Thus . If , then

 (29) Cℓη(0)>Cℓη(α0)≥m.

Hence the theorem holds. ∎

## Appendix B Training Curves

Figure 3 shows the training loss, training error and test error of logistic regression in Section 3. As training goes, the error rate on both training set and test set never increase, thus the logistic regression is not overfitting, although trained with excessive number of epochs. This again hightlights that the trade-off between minimum and average margin cannot be caused by overfitting.

Figure 4 shows the training curves of six models in Section 5 by standarded training. Although there exists a generalization gap between training errors and test errors, the test errors never increase, thus the trade-off between minimum and average margin cannot be caused by overfitting.

## Appendix C Minimum-average Margin Trade-off for MNIST Models

Figure 5 shows the minimum and average margin trade-off for MNIST models 222In the figure of average margin of MNIST-LR (top left), we shift the curve of average margin on training set by a small constant , to avoid overlapping of two curves. This is only for illustration purpose.. A similar trade-off between minimum and average margin can be observed as discussed in Sections 5 and 3. The minimum margin keeps increasing while the average margin keeps decreasing. The only exception is the average margin of MNIST-CNN. But the range of margin in the figure is very small (from to ), in which case the Lipschitz constant estimation in Theorem 3 could be inaccurate.

## Appendix D Margin Histograms of MNIST Models during Training

We plot margin histograms for three MNIST models in Figure 6.

## Appendix E Margin Histograms of CIFAR Models during Training

We plot margin histograms for three CIFAR models in Figure 7.

## Appendix F Detailed Experiment Setting

Model Architecture:

• MNIST-LR and CIFAR-LR: Multiclass logistic regression.

• MNIST-MLP and CIFAR-MLP: Neural network with one hidden layer. The hidden layer has neurons and relu activation.

• MNIST-CNN and MNIST-CNN:

Two AlexNet-Like convolutional neural networks with a slight difference. They are exactly the same as the models used by

Weng et al. [17], Carlini and Wagner [4]. Dropout is used to alleviate overfitting on CIFAR10. No data augmentation is used.

• ConvNet: A convolutional neural network with convolutional layers, used in [36].

• LeNet: LeNet, consists of two convolutional layers and two fully connected layers.

• LeNetSmall: A convolutional neural network with similar structure to LeNet, but with fewer filers. This model is same as the one used in [35]. On CIFAR10, data augmentation (random crop and random flip) is used.

Standard Training: We use SGD (learning rate ) with momentum () and nestorv to train all six models. Batch size is set to . Linear models (LR) are trained for epochs. Nonlinear models (MLP, CNN) are trained for epochs.

Average Margin Regularizer: and are used for all models, while different ’s are used for different models. is tuned such that it is on the same order as the Lipschitz constant of the model. MNIST-LR and MNIST-MLP use . MNIST-CNN use . CIFAR-LR use . CIFAR-MLP and CIFAR-CNN use . The idea is using larger for deeper models. The spectral norm of weights across layers may not be exactly , even after adding orthogonal constraint. In fact, the product of spectral norm will become larger as the model becomes deeper. Thus, one should use larger truncation parameter, since the range of margin in feature space may be larger. Following same idea, LeNet and LeNetSmall use ; ConvNet uses .

In addition, for fair comparison, we set the same to train models with Lipschitz constant regularization.

Adversarial Training: We use PGD attack to perform adversarial training. We set the number of iterations and step size . On MNIST, the perturbation budget is set to ; on CIFAR, is set to .

Attack Method: Through out the experiment, we use projected gradient descent (PGD) attack to evaluate the robust accuracy. We set step size and number of iterations . Increasing the number of iterations does not change the robust accuracy much, thus iterations is sufficient to generate strong adversarial examples.

Approximating Distance When using Theorem 3 to approximate the distance to decision boundary, we need to estimation the Lipchitz constant of the network in a small neighbourhood around the input. Following [20], we simply take the maximum norm of gradient in that neighbourhood. Throughout the experiments, we set and the sampling size equal to (except for linear classifiers, whose sampling size is ). For linear logistic regression, we perform estimation in every epoch; for nonlinear classifiers, we only perform estimation at certain epochs: , , , , , , , , and , where we perform more estimations at the beginning, as distances may change drastically in initial stages. Due to efficiency concerns, the estimation is performed on a subset of size , which is randomly chosen (with a fixed random seed) from the original training and test sets.