It has been discovered that deep neural networks (DNNs) are vulnerable to adversarial examples [29, 8, 18], and the phenomenon can prohibit them from being deployed in security-sensitive applications. Amongst the most effective methods for mitigating the issue, adversarial training [29, 8, 18] is capable of resisting a series of malicious examples [18, 30] and yield adversarially robust DNN models in the sense of an norm. By injecting advanced adversarial examples (e.g., using BIM  or PGD ) into training as some sort of augmentation, the obtained models learn to defend against these examples. In addition, the obtained models may also resist some other types of adversarial examples (generated using, for example, the fast gradient sign method ). However, advanced adversarial examples are typically generated in an iterative manner by back-propagating deep models for multiple times, and thus the mechanism may demand a massive amount of computation .
Another thriving category of methods for hardening DNNs is to perform regularizations, aim at trading off the effectiveness and efficiency properly. Although most traditional regularization-based strategies (e.g., weight decay  and dropout ) do not operate properly in this respect, a variety of recent work [6, 12, 24, 13, 20] has shown that more dedicated and principled regularizations help to gain comparable or only slightly worse performance in improving DNN robustness. Instead of raising a perpetual “arms-race”, these regularization-based strategies are in general attack-agnostic and of benefit to the generalization ability  and interpretability of learning models . Moreover, the computational and memory complexity of these methods are acceptable in very large models. It has also been shown that the methods can be combined with adversarial training to achieve even stronger DNN robustness.
While many regularizers have been developed for DNN robustness, there is of yet few comparative analysis among these choices, especially from a theoretical point of view. In this paper, we attempt to shed light on intrinsic functionality and theoretical connections between several effective regularizers, even if their formulations may stem from different rationales. Concretely, it has been presented over the past few years that regularizing the Euclidean norm of an input-gradient [24, 17], the Frobenius norm of a Jacobian matrix [13, 27], the spectral norm of a Hessian matrix , and a cross-Lipschitz functional  all significantly contribute to the adversarial robustness of DNNs. We analyze all these choices on DNNs with general rectified linear activations, which are ubiquitous in image classification and a host of other machine learning tasks.
Some of our key contributions and observations are:
We present, for the first time, an analytic expression for the norm of an approximately-optimal adversarial perturbation concerned in very recent papers [20, 26], to demonstrate that local cross-Lipschitz constants 
and the prediction probability are its essential ingredients in binary classification cases. In addition to thenorm-based results, we also show similar results for the robustness to norm-based attacks.
We unveil that most discussed regularizations advocate small local cross-Lipschitz constants in binary classification, except for the Jacobian regularization that suggests small local Lipschitz constants, yet regularizing the two network properties can be equivalent.
We further demonstrate that critical discrepancies still exist between specific methods, mostly in regularizing the prediction probability/confidence.
We extend some analyses to multi-class classification and verify our findings with experiments.
2 Regularizations Improving Robustness
2.1 Adversarial Phenomenon in DNNs
Given an input instance
, a DNN-based classifier offers its prediction along with a softmax normalized probabilityfor each class
on top of a vector representation. Suppose that a set of labeled instances are provided, then a classifier is typically learned with the assistance of an objective function that evaluates training prediction loss, i.e., the average discrepancy between a set of predictions and ground-truth .
Existing adversarial attacks can be roughly divided into two main categories, i.e., white-box attacks [29, 8] and black-box attacks [22, 5], according to how much information of the victim model is accessible to an adversary . Our study in this paper mainly focuses on the white-box non-targeted attacks and defenses against them, in order to be complied with prior theoretical work. Under such threat, substantial endeavors have been exerted to demonstrate the adversarial vulnerability of DNNs [29, 8, 23, 19, 3, 4, 1, 18]. Most of them are proposed within a framework that favors perturbations with least norms yet would still cause the DNNs to make incorrect predictions. That being said, an adversary opts to solve
Utilizing the objective function , the task of mounting adversarial attacks can be formulated from a dual perspective which attempts to maximize the loss with a presumed perturbation magnitude (in the context of norms). That being said, given , one may resort to
Omit box constraints on the image domain, many off-the-shelf attacks [8, 23, 19, 3, 18] can be considered as efficient approximations to either (1) or (2). Under certain circumstances, their solutions can be equivalent to the optimal solutions to (1) or (2). For instance, the fast gradient sign method (FGSM)  achieves the optimum of (2) with some binary linear classifiers and . Also, take any linear model together with , the DeepFool perturbation  is theoretically optimal to (1). Training with an augmented set involving adversarial examples, i.e., adversarial training, has been proven to be very effective in improving the DNN robustness , regardless of the computational burden.
2.2 Regularizations and Important Notations
A recent study  demonstrates the relationship between a classical regularization  and adversarial training . It is conceivable that a principled regularization term involved in training suffices to yield DNN models with comparable robustness, whereby a whole series of methods have been developed. Unlike many traditional methods which are normally date-independent (e.g., weight decay and dropout), recent progress conforms closely with theoretical guarantees and focuses mostly on regularizing the loss landscape [17, 27, 12, 24, 13, 20]. Before systematically studying their functionality and relationships in the following sections, we first introduce some important notations.
Given the objective function for classification, we will refer to 1) , as its gradient with respect to (w.r.t.) the input vector , 2) , as the Hessian matrix of , and 3) , as the Jacobian matrix of w.r.t. . It has been presented that training regularized using —the Frobenius norm of (dubbed the Jacobian regularization, [13, 27]), —the Euclidean norm of (i.e., the input-gradient regularization, [24, 17]), —the spectral norm of (i.e., the curvature regularization, ), and a cross-Lipschitz functional 
as will be elaborated later all significantly improve the adversarial robustness of obtained models. We focus on DNNs with general rectified linear units (general ReLUs)[21, 10, 25] as nonlinear activations and analyze in binary classification and multi-class classification tasks separately in the following sections.
3 Binary Classification
With the background information introduced in the previous section, here we first discuss different regularizations in binary classification DNNs, and we will generalize some of our results to multi-class classifications in Section 4.
For simplicity of notations, let us first consider a multi-layer perceptron (MLP) parameterized by a series of weight matrices, where and in our theories. (We stress that, although a simple MLP is formulated here, our following discussions directly generalize to DNNs with convolutions, poolings, skip-connections , self-attentions , etc.) For a -layer MLP, we have
in which the general ReLU activationof our particular interest is piecewise linear and hence is also piecewise linear. Following prior work , we can define and , for , in which
is an diagonal matrix whose main diagonal entries corresponding to nonzero activations within the -th parameterized layer take an value of , and others take an value of . Denote by the two columns of matrix (i.e., ), we have the two entries of as: and
. These two scalars estimate the probability ofbeing sampled from the positive and negative classes, respectively. Since is piecewise linear as analyzed, there exists a polytope to which the input instance belongs and on which is linear, i.e., and
in which is a matrix with its columns .
3.1 Robustness in Binary Classification
is a threshold for correct and incorrect classifications and its value solely depends on the choice of the loss function(e.g., if the cross-entropy loss is chosen, then ). It follows from DeepFool and others  that we may well-approximate the constraint with a Taylor series and get bounds for the () magnitude of , as will be presented in Lemma 3.1 as below.
The above lemma establishes connections between the robustness of a DNN and the spectral norm of its Hessian matrix . Though enlightening, the variables , , and in Eq. (6) are heavily entangled so that it is difficult to reveal the functionality of concerned regularizations.
Fortunately, we show that the derived bounds are tight such that they collapse to the same expression in terms of and a local cross-Lipschitz constant  in binary classification with some common choices of the loss function (e.g., the cross-entropy loss and logistic loss). To be concrete, suppose that the cross-entropy loss is adopted, then with the matrix introduced in Eq. (5), we have the following lemma and theorem.
(Simplified expressions for , , and ). Given an instance paired with its label , we have for the Jacobian , input-gradient , and Hessian :
(An analytic expression for ). For the binary classifier with a locally linear and a correctly classified instance , we have
Proposition 3.1 is obtained on the basis of Lemma 3.1 and 3.2. See our proofs in Appendix A and B, respectively. Similar results can be achieved with the logistic loss (as also demonstrated in the appendix). The decomposition of (i.e., the magnitude of ) in the derived Eq. (8) appears to be more obvious than in Eq. (6), and it can be concluded that , , and jointly affect the magnitude of . Seeing that the value of is determinate w.r.t. , the prediction probability and become the only dominating ingredients. Let us define which is in fact a local cross-Lipschitz constant of  for better clarity. Even though might as well be influential to the prediction probability , we discuss them separately here, considering that the latter can still be optimized with any presumed value of the former 111Within linear networks where all instances share the same (data-independent) and , we can still have different prediction probabilities for different input instances. .
It is easy to verify that holds for all , in a special case of . Yet, for , the general impact of the prediction probability in Eq. (8) is still obscure. To gain direct insights, we depict how varies with on the right panel of Figure 1, given specific values. We observe that, in general, a larger implies a larger and thus lower vulnerability of a classification model, provided that the magnitude of is a reasonable measure of the robustness and (or equivalently, ). See also the left panel of the figure for an illustration with approaching from above, being equal to , and being equal to .
Our theoretical result in Proposition 3.1 gives rise to a formal guarantee of the robustness for piecewise linear DNNs, without concerning much about the accuracy of the Taylor approximation. Regarding the adversarial robustness to some other norm-based attacks, we have similar results in this paper. One might be of special interest to the case as it has been widely considered in practical attacks. Proposition 3.3 and 3.2 provide results from different viewpoints in correspondence to (2) and (1), i.e., by bounding the worst-case loss with any fixed and by providing an analytic expression for the norm of . 222Note that we probably have .
(An analytic expression for ). For the binary classifier with a locally linear and a correctly classified instance , we have
(An upper bound of ). For the binary classifier with a locally linear and a correctly classified input instance , we have satisfying , it holds that
3.2 Regularizations in Binary Classification
Besides Proposition 3.1, some intriguing corollaries can also be derived from Lemma 3.2. First, the direction of the input-gradient vector is the same as that of the first eigenvector (i.e., the one corresponding to the largest eigenvalue) of matrix . Second, we can derive and , which means that we further have simple analytic expressions for the concerned regularizers as:
in which calculates the spectral norm (i.e., the matrix norm) of , is a hyper-parameter, and denotes which is apparently a local Lipschitz constant of . Third, it holds that (i.e., ), and thus we get a chained inequality of the regularizers. Without loss of generality, we write the regularizers in squared forms in Eq. (11) for direct comparison.
One might have noticed that and are also the only ingredients in two of the regularizers in Eq. (11). In the remainder of this subsection, we shall discuss and highlight that: (1) the input-gradient regularization and curvature regularization both enforce suppression of , which is in principle consistent with a cross-Lipschitz regularization ; (2) though the Jacobian regularization focuses on instead of , there probably exists an underlying equivalence between penalizing scaled and ; (3) critical discrepancies still exist amongst these regularizations, mostly about .
Cross-Lipschitz vs. Lipschitz: With clear expressions in Eq. (7) and (11), we know that the input-gradient regularization and curvature regularization are similar to a cross-Lipschitz regularization that penalizes  333Interested readers can refer to Section G for rigorous analyses., while the Jacobian regularization penalizes (with a local Lipschitz constant ) and it boils down to weight decay in single-layer perceptrons and linear classifiers. Although it seems as if the Jacobian regularization was different from the others, in light of the Parseval tight frame and Parseval networks , we conjecture nonetheless that there exists an equivalence between penalizing scaled (as with the cross-Lipschitz regularization, input-gradient regularization, and curvature regularization) and (as with the Jacobian regularization). To shed light on this, more discussions are performed as follows.
First and foremost, it is self-evident that the inequality
holds, thus one might argue that adopting the Jacobian regularization also implies small in obtained models as with the cross-Lipschitz regularization. Second, for single-layer perceptrons, we can easily verify that the function is convex, and thus the Jacobian regularized training loss is strongly convex w.r.t. . Considering that the columns of can be processed simultaneously by adding/subtracting a vector whilst the classification decision and cross-entropy loss won’t change, we have for the optimal and an equivalence is achieved between penalizing and through derivation. The result naturally generalizes to DNNs with locally linear (i.e., DNNs with general ReLU activations) of our interest, if only the final layer is to be optimized. The following proposition makes this formal, and the proof can be found in Appendix D.
(A derived equivalence). For a single-layer perceptron or a piecewise linear DNN in which only the final layer parameterized by is to be optimized, we have the equivalence:
In addition to the above results, we further show that the two regularizations can lead to the same gradient flow in certain scenarios. One example in which this can be demonstrated is when the first feature is uncorrelated with the label and the other features are distributed normally with the mean value being propotional to (i.e., they are weakly correlated with the label) . We let and approach the Bayes error rate. Under such circumstance, the two regularizations initialized from the Bayes classifier share the same gradient flow for their matrices, provided smaller penalty to than to as in Eq. (13) .
To test whether the revealed equivalence generalizes to practical scenarios, we conducted an experiment on distinguishing the digit “7” from “1” using MNIST images. Our experimental settings and many more details are carefully introduced in Appendix F. As suggested [13, 20], we first trained baseline models from scratch without any explicit regularization, then fine-tuned the models using different regularization strategies and evaluated the obtained adversarial robustness to FGSM , PGD , DeepFool , and the C&W’s attack . We trained MLPs and convolutional networks with ReLU nonlinearity following the “LeNet-300-100” and “LeNet-5” architectures in prior work . Figure 2 compares the performance of regularizations incorporating and . With varying , it can be seen that the regularized models show similar robustness in almost all test cases. Similar results on CIFAR-10 with ResNets and VGG-like networks can be found in Appendix F.
“Confidence” in regularizations: Apart from suppressing the local (cross-)Lipschitz constants, the input-gradient regularizer and curvature regularizer both involve the prediction probability in Eq. (11), with different objectives though. By incorporating , the input-gradient regularization encourages model predictions with high confidence. If is fixed, then the -related term in the input-gradient regularizer acts as an additional prediction loss during training. It has larger penalties and slopes (in absolute value) for the training instances with relatively smaller , i.e., lower confidence. Similarly, we know that the curvature regularization involves and advocates large as well. However, as depicted in the green curve in Figure 3, the function exhibits larger absolute value of slope at predictions with higher confidence, which is different from but consistent with the preference of as shown in Figure 1 right. As for the cross-Lipschitz regularizer and Jacobian regularizer, no -related term is explicitly involved whatsoever. 444See Eq. (11), the “regularizer” means the regularization term itself in this paper. Note that the cross-entropy term involves the prediction probability of course.
Although it is unclear which of the tactics would be the most suitable one in practice, one might be aware that different choices perform dis-similarly, otherwise we should have obtained functional equivalence for all these contestants. In order to figure out the best one in practice, we compared the achieved robustness via input-gradient regularization and curvature regularization empirically with our results using the cross-Lipschitz regularization and Jacobian regularization. As shown in Figure 4, the lately developed curvature regularization surpasses all its competitors with reasonably large values, showing the superiority of its specific tactic of handling confident predictions. Notice that we retain the same numerical ranges of axes in Figure 4 as in Figure 2, but some newly drawn curves (for curvature regularization) in Figure 4 may be too promising to stick in the plot.
4 Multi-class Classification
This section focuses on multi-class classification tasks. The notations are mostly the same as those in the binary classification. Suppose there are possible labels for an instance, i.e., and , then for the discussed general ReLU networks, we have . Similarly, there exists a polytope to which the input instance belongs and on which the network is linear, i.e.
in which is a matrix with its -th column . For the properties of DNNs that are considered in the regularization strategies, we have the following lemma.
(Simplified expressions for , , and in multi-class classification). Given an input instance paired with the one-hot representation of its label , we have for , , and :
The above expressions for and seem more complex and different from those given in Lemma 3.2. In particular, the local cross-Lipschitz constant seems absent in (for the input-gradient regularizer). Furthermore, on account of the difficulty of decomposing the Hessian matrix , one might not have an analytic expression for its spectral norm and , in which the concerned adversarial perturbation is defined similarly as in binary classification in Section 3.1, involving a threshold value of rather than . To give insights as in the binary classification scenarios, we first derive bounds for the magnitude of the adversarial perturbation .
(Lower bounds of in multi-class classification). For the multi-class classifier with a locally linear and a correctly classified instance , we have the bounds
Considering that the value of is determinate w.r.t. the prediction probability , we can conclude from Eq. (17) that the essential ingredients of such a lower bound are and (i.e., the Frobenius norm of an matrix ). Likewise, we can easily verify that is a local Lipschitz constant of . Somewhat unsurprisingly, a property considered in the cross-Lipschitz regularization defined as  is involved in the other derived lower bound as given in Eq. (16). The results show that the local (cross-)Lipschitz constants and prediction probability are possibly still the essential ingredients of . Apart from Proposition 4.1, we further know that the chained inequality holds by derivations from Lemma 4.1. More discussions similar to those made for binary classification in Section 3.2 will be given in Appendix E (right after the proof).
As in binary classification, we aim to study possible connections between regularizations penalizing a squared local Lipschitz constant and . Experimental results are given to show a vague equivalence. The same MLP and convolutional architectures were adopted. Similar to the binary classification experiments, we trained multiple baseline models for each considered architecture and fine-tuned them using different regularizations. The same training and test policies were also kept. We report the average results of obtained model robustness to FGSM, PGD, Deepfool, and the C&W’s attack in Figure 5. It can be seen that the Jacobian regularization and cross-Lipschitz regularization still perform similarly across all tested values, except for the ones being too large to keep the models numerically stable. NaN was produced in Jacobian regularized LeNet-5 if was further enlarged.
This paper aims at exploring and analyzing possible connections between recent network-property-based regularizations for improving the adversarial robustness of DNNs. While the empirical effectiveness of appropriate regularizations has been demonstrated in prior arts [12, 24, 13, 20], there still lacks systematic understanding of their intrinsic functionality and connections. We made some comparative analyses among these regularizations and our achievements include:
We have analyzed regularizations on DNNs with ReLU activations from a theoretical perspective.
We have presented analytic expressions for the and magnitudes of some approximately-optimal adversarial perturbations, and we have shown that the local cross-Lipschitz constants and prediction probability are their essential ingredients in binary classification.
We have demonstrated that, the regularizations suggest either small Lipschitz constants or small cross-Lipschitz constants, and regularizing them can be equivalent. Yet, critical discrepancies still exist between specific regularizations, mostly in handling the prediction probability.
We have verified that curvature regularization  concerned in a very recent paper shows the most promising performance, and we have extended some of our analyses to multi-class classification and verified our findings with experiments.
-  (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In ICML, Cited by: §2.1.
-  (2015) Neural machine translation by jointly learning to align and translate. In ICLR, Cited by: §3.
-  (2017) Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy, Cited by: §2.1, §3.2.
-  (2018) EAD: elastic-net attacks to deep neural networks via adversarial examples. In AAAI, Cited by: §2.1.
-  (2017) Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. Cited by: §2.1.
-  (2017) Parseval networks: improving robustness to adversarial examples. In ICML, Cited by: §1, §3.2.
Double backpropagation increasing generalization performance. In IJCNN, Cited by: §2.2.
-  (2015) Explaining and harnessing adversarial examples. In ICLR, Cited by: §1, §2.1, §2.2, §3.2.
-  (2018) Sparse dnns with improved adversarial robustness. In NeurIPS, Cited by: §2.1, §3.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In CVPR, Cited by: §2.2.
-  (2016) Deep residual learning for image recognition. In CVPR, Cited by: §3.
-  (2017) Formal guarantees on the robustness of a classifier against adversarial manipulation. In NeurIPS, Cited by: item, §1, §1, §2.2, §2.2, §3.1, §3.1, §3.2, §3.2, §3.2, §4, §5.
-  (2018) Improving dnn robustness to adversarial attacks using jacobian regularization. In ECCV, Cited by: §1, §1, §2.2, §2.2, §3.2, §5.
-  (1992) A simple weight decay can improve generalization. In NeurIPS, Cited by: §1.
-  (2017) Adversarial machine learning at scale. In ICLR, Cited by: §1.
-  (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §3.2.
-  (2015) A unified gradient regularization family for adversarial examples. In ICDM, Cited by: §1, §2.2, §2.2.
Towards deep learning models resistant to adversarial attacks. In ICLR, Cited by: §1, §2.1, §3.2.
-  (2016) DeepFool: a simple and accurate method to fool deep neural networks. In CVPR, Cited by: §2.1, §3.2.
-  (2019) Robustness via curvature regularization, and vice versa. In CVPR, Cited by: item, §1, §1, §1, §2.2, §2.2, §3.1, §3.2, Lemma 3.1, item, §5.
Rectified linear units improve restricted boltzmann machines. In ICML, Cited by: §2.2.
-  (2017) Practical black-box attacks against machine learning. In Proceedings of the Asia Conference on Computer and Communications Security, Cited by: §2.1.
-  (2016) The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy, Cited by: §2.1.
-  (2018) Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In AAAI, Cited by: §1, §1, §2.2, §2.2, §5.
Understanding and improving convolutional neural networks via concatenated rectified linear units. In ICML, Cited by: §2.2.
-  (2019) Adversarial vulnerability of neural networks increases with input dimension. In ICML, Cited by: item, §2.2.
-  (2017) Robust large margin deep neural networks. IEEE Transactions on Signal Processing 65 (16), pp. 4265–4280. Cited by: §1, §2.2, §2.2.
-  (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15 (1), pp. 1929–1958. Cited by: §1.
-  (2014) Intriguing properties of neural networks. In ICLR, Cited by: §1, §2.1.
-  (2018) Ensemble adversarial training: attacks and defenses. In ICLR, Cited by: §1.
Robustness may be at odds with accuracy. In ICLR, Cited by: §3.2.