Neural networks are vulnerable to adversarial attacks. These are small (imperceptible to the human eye) perturbations of an image which cause a network to misclassify the image (Biggio et al., 2013; Szegedy et al., 2013; Goodfellow et al., 2014). The threat posed by adversarial attacks must be addressed before these methods can be deployed in error-sensitive and security-based applications (Potember, 2017).
Building adversarially robust models is an optimization problem with two objectives: (i) maintain test accuracy on clean unperturbed images, and (ii) be robust to large adversarial perturbations. The present state-of-the-art method for adversarial defence, adversarial training (Szegedy et al., 2013; Goodfellow et al., 2014; Tramèr et al., 2018; Madry et al., 2017; Miyato et al., 2018), in which models are trained on perturbed images, offers robustness at the expense of test accuracy (Tsipras et al., 2018). It is not clear that multi-step adversarial training is scaleable to large datasets such as ImageNet-1k (Deng et al., 2009). Previous attempts (Kannan et al., 2018; Xie et al., 2018) used hundreds of GPUs and took nearly a week to train, although recent work by Shafahi et al. (2019) has offered a remedy.
Assessing the empirical effectiveness of an adversarial defence requires careful testing with multiple attacks (Goodfellow et al., 2018). Furthermore, existing defences are vulnerable to new, stronger attacks: Carlini and Wagner (2017a) and Athalye et al. (2018) advocate designing specialized attacks to circumvent prior defences, while Uesato et al. (2018) warn against using weak attacks to evaluate robustness. This has led the community to develop theoretical
tools to certify adversarial robustness. Several certification approaches have been proposed: through linear programming(Wong and Kolter, 2018; Wong et al., 2018) or mixed-integer linear-programming (Xiao et al., 2018); semi-definite relaxation (Raghunathan et al., 2018b, a); randomized smoothing (Li et al., 2018; Cohen et al., 2019)
; or estimates of the local Lipschitz constant(Hein and Andriushchenko, 2017; Weng et al., 2018; Tsuzuku et al., 2018). The latter two approaches have scaled well to ImageNet-1k.
In practice, certifiably robust networks often perform worse than adversarially trained models, which lack theoretical guarantees. In this article, we work towards bridging the gap between theoretically robust networks and empirically effective training methods. Our approach relies on minimizing a loss regularized against large input gradients
where is dual to the one measuring adversarial attacks (for example the norm for attacks measured in the
norm). Heuristically, making loss gradients small should make gradient based attacks more challenging.
Drucker and LeCun (1991) implemented gradient regularization using ‘double backpropagation’, which has been shown to improve model generalization (Novak et al., 2018). It has been used to improve the stability of GANs (Roth et al., 2017; Nagarajan and Kolter, 2017) and to promote learning robust features with contractive auto-encoders (Rifai et al., 2011). While it has been proposed for adversarial attacks robustness (Ross and Doshi-Velez, 2018; Roth et al., 2018; Hein and Andriushchenko, 2017; Jakubovitz and Giryes, 2018; Simon-Gabriel et al., 2018), experimental evidence has been mixed, in particular, input gradient regularization has so far not been competitive with multi-step adversarial training.
On non-smooth networks (such as those built of s) small gradients are no guarantee of adversarial robustness (Papernot et al., 2017), and so it is thought input gradient regularization should not be effective on non-smooth networks. This raises the question, how often is the lack of smoothness an issue, in practice? In other words, when do Taylor approximations of the loss fail to predict adversarial robustness, and is smoothness only needed theoretically? The fact that first-order gradient-based attacks of the loss (like PGD (Madry et al., 2017)) are usually effective indicates that in many scenarios, non-smoothness is not an issue. However in a non-negligible minority of cases, attacks based on decision boundary information (Carlini and Wagner, 2017b; Brendel et al., 2018; Chen and Jordan, 2019; Finlay et al., 2019) outperform gradient based attacks. This indicates the curvature near these points is large, and first-order information is not sufficient to guarantee robustness. We illustrate this point in Fig 1. In this work we overcome the limitation of gradient regularization for non-smooth networks by instead building networks of ‘smooth’ s. At the expense of a minor drop in test accuracy, we obtain tighter theoretical lower bounds on robustness, since we can better approximate the loss using local information.
Another drawback of input gradient regularization is that it is not presently tractable to update model weights using double backpropagation on large networks. We circumvent this limitation by differentiating the regularization term without double backpropagation.
Our main contributions are the following. First, we motivate using input gradient regularization of the loss by deriving new theoretical robustness bounds. These bounds show that small loss gradients and small curvature are sufficient conditions for adversarial robustness. Second, we empirically show that input gradient regularization is competitive with adversarial training, even on non-smooth networks, at a fraction of the training time. Finally, we scale input gradient regularization to ImageNet-1k by using finite differences to estimate the gradient regularization term, rather than double backpropagation. This allows us to train adversarially robust networks on ImageNet-1k in 33 hours on four consumer grade GPUs.
2 Adversarial robustness bounds from the loss
Much effort has been directed towards determining theoretical lower bounds on the minimum sized perturbation necessary to perturb an image so that it is misclassified by a model. One promising approach, proposed by Hein and Andriushchenko (2017) and Weng et al. (2018), and which has scaled well to ImageNet-1k, is to use the Lipschitz constant of the model. In this section, we build upon these ideas: we propose using the Lipschitz constant of a suitable loss, designed to measure classification errors. In addition, when the loss is twice continuously differentiable, we propose a second-order bound based on the maximum curvature of the loss.
Our notation is as follows. Write
for a model which takes input vectors
to label probabilities, with parameters. Let be the loss and write , for the loss of a model .
Finding an adversarial perturbation is interpreted as a global minimization problem: find the closest image to a clean image, in some specified norm, that is is also misclassified by the model
However, (2) is a difficult and costly non-smooth, non-convex optimization problem. Instead, Goodfellow et al. (2014) proposed solving a surrogate problem: find a perturbation of a clean image that maximizes the loss, subject to the condition that the perturbation be inside a norm-ball of radius around the clean image. The surrogate problem is written
The hard constraint forces perturbations to be inside the norm-ball centred at the clean image . Ideally, solutions of this surrogate problem (3) will closely align with solutions of the original more difficult global minimization problem. However, the hard constraint in (3) forces a particular scale: it may miss attacks which would succeed with only a slightly bigger norm. Additionally, the maximization problem (3) does not force misclassification, it only asks that the loss be increased.
The advantage of (3) is that it may be solved with gradient-based methods: present best-practice is to use variants of projected gradient descent (PGD), such as the iterative fast-signed gradient method (Kurakin et al., 2016; Madry et al., 2017) when attacks are measured in the norm. However, gradient-based methods are not always effective: on non-smooth networks, such as those built of activation functions, a small gradient does not guarantee that the loss remains small locally. This deficiency was identified in (Papernot et al., 2016). See Figure 1: networks may increase rapidly with a very small perturbation, even when local gradients are small. PGD methods will fail to locate these worst-case perturbations, and give a false impression of robustness. Carlini and Wagner (2017b) avoid this scenario by incorporating decision boundary information into the loss; others solve (2) directly (Brendel et al., 2018; Chen and Jordan, 2019; Finlay et al., 2019).
2.2 Derivation of lower bounds
This leads us to consider the following compromise between (2) and (3). Consider the following modification of the Carlini and Wagner (2017b) loss , where is the index of the correct label, and is the model output for the -th label. This loss has the appealing property the sign of the loss determines if the classification is correct. Adversarial attacks are found by minimizing
The constant determines when classification is incorrect; for the modified Carlini-Wagner loss, . Problem (4) is closer to the true problem (2), and will always find an adversarial image. We use (4) to derive theoretical lower bounds on the minimum size perturbation necessary to misclassify an image. Suppose the loss is -Lipschitz with respect to model input. Then we have the estimate
Now suppose is adversarial, with minimum adversarial loss . Then rearranging (5), we obtain the lower bound
Unfortunately, the Lipschitz constant is a global quantity, and ignores local gradient information; see for example Huster et al. (2018). Thus this bound can be quite poor, even when networks have small Lipschitz constant. On the other hand, if the model is twice continuously differentiable, then the loss landscape is smoother. This allows us to achieve a tighter bound, using local gradient information, as illustrated in Figure 1. Let
be an upper bound on the maximum positive eigenvalue of the Hessian of the loss over all
This value will be estimated empirically by maximizing over the dataset. The constant is a measure of the largest positive curvature of the network. Using a Taylor approximation about , we may upper bound the perturbed loss with
These two bounds give us the following.
Suppose the loss is Lipschitz continuous with respect to model input , with Lipschitz constant . Let be such that if , the model is always correct. Then a lower bound on the minimum magnitude of perturbation necessary to adversarially perturb an image is
Suppose in addition that the loss is twice-differentiable, with maximum curvature (defined as in (6)). Then
The second-order bound requires that the network and loss are smooth with respect to the input, but almost all image classification networks now use s, which are not smooth. We use the following smoothed
This activation function is twice continuously differentiable, and avoids the vanishing gradient problem of smooth sigmoidal activation functions. Moreover because it agrees withoutside of the interval , it is fairly efficient during backpropagation. As for the loss, a smooth version of the Carlini-Wagner loss is available by using a soft maximum, rather than a strict .
Proposition 2.1 motivates the need for input gradient regularization. The Lipschitz constant is the maximum gradient norm of the loss over all inputs. Therefore (-bound) says that a regularization term encouraging small gradients (and so reducing ) should increase the minimum adversarial distance. This aligns with (Hein and Andriushchenko, 2017), who proposed the cross-Lipschitz regularizer, penalizing networks with large Jacobians in order to shrink the Lipschitz constant of the network.
However, this is not enough: the gap must be large as well. This explains one form of ‘gradient masking’ (Papernot et al., 2017). Shrinking the magnitude of gradients while also closing the gap
effectively does nothing to improve adversarial robustness. For example, in defense distillation, the magnitude of the model Jacobian is reduced by increasing the temperature of the final softmax layer of the network. However, this has the detrimental side-effect of sending the model output to, where is the number of classes, which effectively shrinks the loss gap to zero. Thus with high distillation temperatures the lower bound provided by Proposition 2.1 approaches zero.
Moreover, even supposing the loss gradients are small and the gap is large, there may still be adversarially vulnerable images. For example, suppose we have two smooth networks, one with large curvature, and another with small curvature. Suppose that there is an image with zero gradient on both networks, each with identically large loss gaps . The second-order bound (-bound) says that the minimum adversarial distance here is bounded below by . In other words, the network with smaller curvature is more robust.
Taken together, Proposition 2.1 provides three sufficient conditions for training robust networks: (i) the loss gap should be large; (ii) the gradients of the loss should be small; and (iii) the curvature of the loss should also be small. The first point will be satisfied by default when the loss is minimized. The second point will be satisfied by training with a loss regularized to penalize large input gradients. Experimentally the third point is satisfied with input gradient regularization. When these conditions are satisfied, local information is enough to guarantee robustness.
Our robustness bounds are most similar in spirit to Weng et al. (2018), who derive bounds using an estimate of the local Lipschitz constant of the model. Moosavi-Dezfooli et al. (2018) have also used a second order approximation to derive approximate robustness bounds for binary classification, but they neglected higher order error terms. Cohen et al. (2019)
derive bounds by training with normally distributed input noise, then averaging model predictions normally sampled about the input image. It is well known that training with normal noise is equivalent to squarednorm gradient regularization (Bishop, 1995); thus Cohen et al. (2019) achieve gradient regularization indirectly. Our bounds require at most one gradient and model evaluation per image once and have been estimated; whereas both Cohen et al. and Weng et al. require many hundreds of local model evaluations per image. Since and are globally estimated, our bounds could be improved using these local sampling techniques to obtain local values of and , with more computational effort.
3 Squared norm gradient regularization
Proposition 2.1 provides strong motivation for input gradient regularization as a method for promoting adversarial robustness. However, it does not tell us what form the gradient regularization term should take. In this section, we show how norm squared gradient regularization arises from a quadratic cost.
In adversarial training, solutions of (3) are used to generate images on which the network is trained. In effect, adversarial training seeks a solution of the minimax problem
where is the distribution of images. This is a robust optimization problem (Wald, 1945; Rousseeuw and Leroy, 1987). The cost function penalizes perturbed images from being too far from the original. When the cost function is the hard constraint from (3), perturbations must be inside a norm ball of radius . This leads to adversarial training with PGD (Kurakin et al., 2016; Madry et al., 2017). However this forces a particular scale: it is possible that no images are adversarial within radius , but that there are adversarial images with only a slightly larger distance. Instead of using a hard constraint, we can relax the cost function to be the quadratic cost . The quadratic cost allows attacks to be of any size, but penalizes larger attacks more than smaller attacks. With a quadratic cost, there is less of a danger that a local attack will be overlooked.
Solving (9) directly is expensive: on ImageNet-1k, both Kannan et al. (2018) and Xie et al. (2018) required large-scale distributed training with many dozens or hundreds of GPUs, and over a week of training time. Instead we take the view that (9) may be bounded above, and solved approximately. When the loss is smooth and , the optimal value of using the bound (7) is , provided . This gives the following proposition.
Suppose both the model and the loss are twice continuously differentiable. Suppose attacks are measured with quadratic cost . Then the optimal value of (9) is bounded above by
That is, we may bound the solution of the adversarial training problem (9) by solving the gradient regularization problem (10), when the cost function is quadratic. It is not necessary to know or compute ; they are absorbed into . In the adversarial robustness literature, input gradient regularization using the squared norm was proposed by Ross and Doshi-Velez (2018). It was expanded by Roth et al. (2018) to use a Mahalanobis norm with the correlation matrix of adversarial attacks. When is the hard constraint forcing attacks inside the norm ball and is small, supposing the curvature term is negligible, we can estimate the maximum in (9) by , using the dual norm for the gradient. This is norm gradient regularization (not squared), and was recently used for adversarial robustness on both CIFAR-10 (Simon-Gabriel et al., 2018), and MNIST (Seck et al., 2019).
3.1 Finite difference implementation
Norm squared input gradient regularization has long been used as a regularizer in neural networks: Drucker and LeCun (1991) first showed its effectiveness for generalization. Drucker and LeCun implemented gradient regularization with ‘double backpropagation’ to compute the derivatives of the penalty term with respect to the model parameters , which is needed to update the parameters during training. Double backpropagation involves two passes of automatic differentiation: one pass to compute the gradient of the loss with respect to the inputs , and another pass on the output of the first to compute the gradient of the penalty term with respect to model parameters . In neural networks, double backpropagation is the standard technique for computing the parameter gradient of a regularized loss. However, it is not currently scaleable to large neural networks. Instead we approximate the gradient regularization term with finite differences.
Proposition 3.2 (Finite difference approximation of squared gradient norm).
Let be the normalized input gradient direction: when the gradient is nonzero, and set otherwise. Let be the finite difference step size. Assume further that the loss is twice continuously differentiable. Then, the squared gradient norm is approximated by
The vector is normalized to ensure the accuracy of the finite difference approximation, which is of order , as can be seen by a Taylor approximation. The finite differences approximation (11) allows the computation of the gradient of the regularizer (with respect to model parameters ) to be done with only two regular passes of backpropagation, rather than with double backpropagation. On the first, the input gradient direction is calculated. The second computes the gradient with respect to model parameters by performing backpropagation on the right-hand-side of (11). Double backpropagation is avoided by detaching from the computational graph after the first pass. In practice, for large networks, we have found that the finite difference approximation of the regularization term is considerably more efficient than using double backpropagation.
The proposed training algorithm, with squared Euclidean input gradient regularization, is presented in Algorithm 1 of the appendix. Other gradient penalty terms can be approximated as well. For example, when defending against attacks measured in the norm, the squared norm penalty can approximated by setting instead when the gradient is nonzero.
4 Experimental results
In this section we provide empirical evidence that input gradient regularization is an effective tool for promoting adversarial robustness, even on non-smooth networks built with standard activation functions.
We train networks on the CIFAR-10 dataset (Krizhevsky and Hinton, 2009), and ImageNet-1k (Deng et al., 2009). On the CIFAR dataset we use the ResNeXt architecture111ResNeXt34-2x32 on CIFAR-10; ResNeXt34-2x64 on CIFAR-100 (Xie et al., 2017); on ImageNet-1k we use a ResNet-50 (He et al., 2016). The CIFAR networks were trained with standard data augmentation and learning rate schedules on a single GeForce GTI 1080 Ti. On ImageNet-1k, we modified the training code of Shaw et al.’s  submission to the DAWNBench competition (Coleman et al., 2018) and train with four GPUs. Training code and trained model weights are available on GitHub.222https://github.com/cfinlay/tulip
We train an undefended network as a baseline to compare various types of regularization. On CIFAR-10, networks are trained with squared and squared gradient norm regularization. The former is appropriate for defending against attacks measured in ; the latter for attacks measured in . We set the regularization strength to be either or 1; and set finite difference discretization
. We compare each network with the current state-of-the-art form of adversarial training, with models trained using the hyperparameters inMadry et al. (2017) (7-steps of FGSM, step size , projected onto an ball of radius ). On ImageNet-1k we only train adversarially robust models with squared regularization.
On each dataset, we attack 1000 randomly selected images. We perturb each image with attacks in both the Euclidean and norms, with a suite of current state-of-the-art attacks: the Carlini-Wagner attack (Carlini and Wagner, 2017b); the Boundary attack (Brendel et al., 2018); the LogBarrier attack (Finlay et al., 2019); and PGD (Madry et al., 2017) (in both the norm or the norm). The former three attacks are effective at evading gradient masking defences; the latter is very good at finding images close to the original when gradients are not close to zero. We record the best adversarial distance on a per image basis, for each norm.
Adversarial robustness results for networks attacked in the norm are presented in Table 1. These results are for networks built of standard s. Table 1 and Figure 2 demonstrate a clear trade-off between test accuracy and adversarial robustness, as the strength of the regularization is increased. On CIFAR-10, the undefended network achieves test error of 4.36%, but is not robust to attacks even at distance . However with a strong regularization parameter (), test error increases to 9.02% on clean images, and only 18.47% test error at attack distance . In contrast, the network trained with 7-steps of adversarial training appears to be over-regularized: on clean images, the adversarially trained network achieves 16.33% test error, but 22.86% error at distance . To be fair, at the commonly reported of , the adversarially trained network outperforms the best gradient regularized networks by about 12%, but at over twice the training time of the regularized networks. On ImageNet, we see a reduction of nearly 40% at distance .
It has been noted that adversarial robustness comes with a cost of degraded test error (Tsipras et al., 2018). This trade-off may be quantified. We measure the relative improvement in adversarial robustness against the cost of degraded test error with the following metric. Suppose an undefended network has test error , and let a regularized network’s network test error be denoted . Define the relative degradation in test error to be . Similarly define the relative improvement in robustness (measured by mean adversarial distance ) to be . We define the adversarial improvement ratio to be . This measures the improvement in adversarial robustness against the expense of poorer test error: high values mean the defended model is much more robust and has not lost significant test accuracy. Values close to zero imply the model is more robust but has a much worse test accuracy relative to the undefended model. The improvement ratio is non-dimensional, and so it allows for comparison between datasets.
Measured in this metric, the tradeoff between test accuracy and adversarial robustness is clear. On both ImageNet-1k and CIFAR-10, models regularized with offer the best trade-off between robustness and test error. If test accuracy is not of foremost concern, then stronger regularization parameters may be chosen. If neither training time nor test accuracy are important factors, then adversarial training is competitive with gradient regularization.
In Table 2 we report results on models trained for attacks in the norm. On CIFAR-10, the most robust model is trained with regularization strength , and outperforms even the adversarially trained model. On ImageNet-1k, we see the same pattern: the model trained with offers the best protection against adversarial attacks. Due to the long training time, we were not able to train ImageNet-1k with multi-step adversarial training.
In Table 2 we also report our theoretical bounds on the minimum distance required to adversarially perturb, using the Carlini-Wagner loss.444This loss can be modified for Top- mis-classification as well. Figures 4 and 5 of the appendix show these bounds on a per-image basis. The theoretical bounds require calculating constants and , which are not readily available. Instead, we estimate as the maximum gradient norm over test images; for smooth models we estimate as the maximum spectral norm of the Hessian.555We compute the spectral norm of the Hessian using the Lanczos algorithm (Golub and Van Loan, 2012, §10.1) on Hessian-vector products (computed via automatic differentiation). These estimates are reported in Table 3 of the appendix. Gradient regularization reduces and , by one to two orders of magnitude. Table 3 shows adversarial training also reduces : effectively adversarial training is a regularizer. Because and are estimated, and not exact, one would expect that our bounds would sometimes fail. However, on CIFAR-10, the bounds reliable held on all attacked images. On ImageNet-1k, the bounds failed on about 9% of attacked test images, which indicates that and could be estimated more accurately, for example using by estimating these constants locally like in (Weng et al., 2018).
We have provided motivation for training adversarially robust networks through input gradient regularization, by bounding the minimum adversarial distance with gradient and curvature statistics of the loss. We have shown empirically that gradient regularization is scaleable to ImageNet-1k, and provides adversarial robustness competitive with adversarial training. We gave theoretical per-image bounds on the minimum adversarial distance, for non-smooth models (using the Lipschitz constant of the loss), and augmented these bounds using smooth models with a second-order bound based on model curvature. These bounds were empirically validated against state-of-the-art attacks.
Athalye et al. 
Anish Athalye, Nicholas Carlini, and David Wagner.
Obfuscated gradients give a false sense of security: Circumventing
defenses to adversarial examples.
In Jennifer Dy and Andreas Krause, editors,
Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 274–283, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR. URL http://proceedings.mlr.press/v80/athalye18a.html.
- Biggio et al.  Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip Železný, editors, Machine Learning and Knowledge Discovery in Databases, pages 387–402, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. ISBN 978-3-642-40994-3.
- Bishop  Christopher M. Bishop. Training with noise is equivalent to Tikhonov regularization. Neural Computation, 7(1):108–116, 1995. doi: 10.1162/neco.1922.214.171.124. URL https://doi.org/10.1162/neco.19126.96.36.199.
- Brendel et al.  Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018. URL https://openreview.net/forum?id=SyZI0GWCZ.
Carlini and Wagner [2017a]
Nicholas Carlini and David A. Wagner.
Adversarial examples are not easily detected: Bypassing ten detection
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2017, Dallas, TX, USA, November 3, 2017, pages 3–14, 2017a. doi: 10.1145/3128572.3140444. URL https://doi.org/10.1145/3128572.3140444.
- Carlini and Wagner [2017b] Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pages 39–57, 2017b. URL https://doi.org/10.1109/SP.2017.49.
- Chen and Jordan  Jianbo Chen and Michael I. Jordan. Boundary attack++: Query-efficient decision-based adversarial attack. CoRR, abs/1904.02144, 2019. URL http://arxiv.org/abs/1904.02144.
- Cohen et al.  Jeremy M. Cohen, Elan Rosenfeld, and J. Zico Kolter. Certified adversarial robustness via randomized smoothing. CoRR, abs/1902.02918, 2019. URL http://arxiv.org/abs/1902.02918.
- Coleman et al.  Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Christopher Ré, and Matei Zaharia. Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark. CoRR, abs/1806.01427, 2018. URL http://arxiv.org/abs/1806.01427.
- Deng et al.  Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In , pages 248–255, 2009. doi: 10.1109/CVPRW.2009.5206848. URL https://doi.org/10.1109/CVPRW.2009.5206848.
- Drucker and LeCun  Harris Drucker and Yann LeCun. Double backpropagation increasing generalization performance. In IJCNN-91-Seattle International Joint Conference on Neural Networks, volume 2, pages 145–150. IEEE, 1991.
- Finlay et al.  Chris Finlay, Aram-Alexandre Pooladian, and Adam M. Oberman. The LogBarrier adversarial attack: making effective use of decision boundary information. CoRR, abs/1903.10396, 2019. URL http://arxiv.org/abs/1903.10396.
- Golub and Van Loan  Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU press, 2012.
- Goodfellow et al.  Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making machine learning robust against adversarial inputs. Communications of the ACM, 61(7):56–66, June 2018. URL http://dl.acm.org/citation.cfm?doid=3234519.3134599.
- Goodfellow et al.  Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014. URL http://arxiv.org/abs/1412.6572.
- He et al.  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, pages 630–645, 2016. URL https://doi.org/10.1007/978-3-319-46493-0_38.
Hein and Andriushchenko 
Matthias Hein and Maksym Andriushchenko.
Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation.In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 2263–2273, 2017. URL http://papers.nips.cc/paper/6821-formal-guarantees-on-the-robustness-of-a-classifier-against-adversarial-manipulation.
- Huster et al.  Todd Huster, Cho-Yu Jason Chiang, and Ritu Chadha. Limitations of the lipschitz constant as a defense against adversarial examples. In ECML PKDD 2018 Workshops - Nemesis 2018, UrbReas 2018, SoGood 2018, IWAISe 2018, and Green Data Mining 2018, Dublin, Ireland, September 10-14, 2018, Proceedings, pages 16–29, 2018. doi: 10.1007/978-3-030-13453-2_2. URL https://doi.org/10.1007/978-3-030-13453-2_2.
- Jakubovitz and Giryes  Daniel Jakubovitz and Raja Giryes. Improving DNN robustness to adversarial attacks using jacobian regularization. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XII, pages 525–541, 2018. doi: 10.1007/978-3-030-01258-8_32. URL https://doi.org/10.1007/978-3-030-01258-8_32.
- Kannan et al.  Harini Kannan, Alexey Kurakin, and Ian J. Goodfellow. Adversarial logit pairing. CoRR, abs/1803.06373, 2018. URL http://arxiv.org/abs/1803.06373.
- Krizhevsky and Hinton  Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. URL http://www.cs.toronto.edu/~kriz/cifar.html.
- Kurakin et al.  Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016. URL http://arxiv.org/abs/1607.02533.
- Li et al.  Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Second-order adversarial attack and certifiable robustness. CoRR, abs/1809.03113, 2018. URL http://arxiv.org/abs/1809.03113.
- Madry et al.  Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. CoRR, abs/1706.06083, 2017. URL http://arxiv.org/abs/1706.06083.
Miyato et al. 
Takeru Miyato, Shin-ichi Maeda, Shin Ishii, and Masanori Koyama.
Virtual adversarial training: a regularization method for supervised and semi-supervised learning.IEEE transactions on pattern analysis and machine intelligence, 2018.
- Moosavi-Dezfooli et al.  Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, and Pascal Frossard. Robustness via curvature regularization, and vice versa. CoRR, abs/1811.09716, 2018. URL http://arxiv.org/abs/1811.09716.
- Nagarajan and Kolter  Vaishnavh Nagarajan and J. Zico Kolter. Gradient descent GAN optimization is locally stable. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 5591–5600, 2017. URL http://papers.nips.cc/paper/7142-gradient-descent-gan-optimization-is-locally-stable.
- Novak et al.  Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, and Jascha Sohl-Dickstein. Sensitivity and generalization in neural networks: an empirical study. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018. URL https://openreview.net/forum?id=HJC2SzZCW.
Papernot et al. 
Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay
Celik, and Ananthram Swami.
The limitations of deep learning in adversarial settings.In IEEE European Symposium on Security and Privacy, EuroS&P 2016, Saarbrücken, Germany, March 21-24, 2016, pages 372–387, 2016. URL https://doi.org/10.1109/EuroSP.2016.36.
- Papernot et al.  Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, AsiaCCS 2017, Abu Dhabi, United Arab Emirates, April 2-6, 2017, pages 506–519, 2017. URL http://doi.acm.org/10.1145/3052973.3053009.
- Potember  Richard Potember. Perspectives on research in artificial intelligence and artificial general intelligence relevant to DoD. Technical report, The MITRE Corporation McLean United States, 2017. URL https://fas.org/irp/agency/dod/jason/ai-dod.pdf.
- Raghunathan et al. [2018a] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018a. URL https://openreview.net/forum?id=Bys4ob-Rb.
- Raghunathan et al. [2018b] Aditi Raghunathan, Jacob Steinhardt, and Percy S. Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 10900–10910, 2018b. URL http://papers.nips.cc/paper/8285-semidefinite-relaxations-for-certifying-robustness-to-adversarial-examples.
Rifai et al. 
Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio.
Contractive auto-encoders: Explicit invariance during feature extraction.In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 833–840, 2011. URL https://icml.cc/2011/papers/455_icmlpaper.pdf.
- Ross and Doshi-Velez  Andrew Slavin Ross and Finale Doshi-Velez. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 1660–1669, 2018. URL https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17337.
- Roth et al.  Kevin Roth, Aurélien Lucchi, Sebastian Nowozin, and Thomas Hofmann. Stabilizing training of generative adversarial networks through regularization. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 2015–2025, 2017. URL http://papers.nips.cc/paper/6797-stabilizing-training-of-generative-adversarial-networks-through-regularization.
- Roth et al.  Kevin Roth, Aurélien Lucchi, Sebastian Nowozin, and Thomas Hofmann. Adversarially robust training through structured gradient regularization. CoRR, abs/1805.08736, 2018. URL http://arxiv.org/abs/1805.08736.
Rousseeuw and Leroy 
Peter J Rousseeuw and Annick M Leroy.
Robust regression and outlier detection, volume 1. Wiley Online Library, 1987.
- Seck et al.  Ismaïla Seck, Gaëlle Loosli, and Stephane Canu. L1-norm double backpropagation adversarial defense. arXiv preprint arXiv:1903.01715, 2019.
- Shafahi et al.  Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John P. Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! CoRR, abs/1904.12843, 2019. URL http://arxiv.org/abs/1904.12843.
-  Andrew Shaw, Yaroslav Bulatov, and Jeremy Howard. ImageNet in 18 minutes. URL https://github.com/diux-dev/imagenet18.
- Simon-Gabriel et al.  Carl-Johann Simon-Gabriel, Yann Ollivier, Bernhard Schölkopf, Léon Bottou, and David Lopez-Paz. Adversarial vulnerability of neural networks increases with input dimension. CoRR, abs/1802.01421, 2018. URL http://arxiv.org/abs/1802.01421.
- Szegedy et al.  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013. URL http://arxiv.org/abs/1312.6199.
- Tramèr et al.  Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rkZvSe-RZ.
- Tsipras et al.  Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. CoRR, abs/1805.12152, 2018. URL http://arxiv.org/abs/1805.12152.
- Tsuzuku et al.  Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama. Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 6542–6551, 2018. URL http://papers.nips.cc/paper/7889-lipschitz-margin-training-scalable-certification-of-perturbation-invariance-for-deep-neural-networks.
- Uesato et al.  Jonathan Uesato, Brendan O’Donoghue, Pushmeet Kohli, and Aäron van den Oord. Adversarial risk and the dangers of evaluating against weak attacks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 5032–5041, 2018. URL http://proceedings.mlr.press/v80/uesato18a.html.
- Wald  Abraham Wald. Statistical decision functions which minimize the maximum risk. Annals of Mathematics, pages 265–280, 1945.
- Weng et al.  Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018. URL https://openreview.net/forum?id=BkUHlMZ0b.
- Wong and Kolter  Eric Wong and J. Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 5283–5292, 2018. URL http://proceedings.mlr.press/v80/wong18a.html.
- Wong et al.  Eric Wong, Frank R. Schmidt, Jan Hendrik Metzen, and J. Zico Kolter. Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 8410–8419, 2018. URL http://papers.nips.cc/paper/8060-scaling-provable-adversarial-defenses.
- Xiao et al.  Kai Y. Xiao, Vincent Tjeng, Nur Muhammad Shafiullah, and Aleksander Madry. Training for faster adversarial robustness verification via inducing relu stability. CoRR, abs/1809.03008, 2018. URL http://arxiv.org/abs/1809.03008.
- Xie et al.  Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L. Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. CoRR, abs/1812.03411, 2018. URL http://arxiv.org/abs/1812.03411.
- Xie et al.  Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 5987–5995, 2017. URL https://doi.org/10.1109/CVPR.2017.634.
Appendix A Additional methods and results
|Madry et al (7-step AT)||0.40||-||2.52||-|