Modern neural network classifiers are able to achieve very high accuracy on image classification tasks but are sensitive to small, adversarially chosen perturbations to the inputs (Szegedy et al., 2013; Biggio et al., 2013). Given an image that is correctly classified by a neural network, a malicious attacker may find a small adversarial perturbation such that the perturbed image , though visually indistinguishable from the original image, is assigned to a wrong class with high confidence by the network. Such vulnerability creates security concerns in many real-world applications.
Researchers have proposed a variety of defense methods to improve the robustness of neural networks. Most of the existing defenses are based on adversarial training (Szegedy et al., 2013; Madry et al., 2017; Goodfellow et al., 2015; Huang et al., 2015; Athalye et al., 2018). During training, these methods first learn on-the-fly adversarial examples of the inputs with multiple attack iterations and then update model parameters using these perturbed samples together with the original labels. However, such approaches depend on a particular (class of) attack method. It cannot be formally guaranteed whether the resulting model is also robust against other attacks. Moreover, attack iterations are usually quite expensive. As a result, adversarial training runs very slowly.
Another line of algorithms trains robust models by maximizing the certified radius provided by robust certification methods (Weng et al., 2018; Wong & Kolter, 2018; Zhang et al., 2018; Mirman et al., 2018; Wang et al., 2018; Gowal et al., 2018; Zhang et al., 2019c)
. Using linear or convex relaxations of fully connected ReLU networks, a robust certification method computes a “safe radius”for a classifier at a given input such that at any point within the neighboring radius- ball of the input, the classifier is guaranteed to have unchanged predictions. However, the certification methods are usually computationally expensive and can only handle shallow neural networks with ReLU activations, so these training algorithms have troubles in scaling to modern networks.
In this work, we propose an attack-free and scalable method to train robust deep neural networks. We mainly leverage the recent randomized smoothing technique (Cohen et al., 2019). A randomized smoothed classifier for an arbitrary classifier is defined as , in which . While Cohen et al. (2019) derived how to analytically compute the certified radius of the randomly smoothed classifier , they did not show how to maximize that radius to make the classifier robust. Salman et al. (2019) proposed SmoothAdv to improve the robustness of , but it still relies on the expensive attack iterations. Instead of adversarial training, we propose to learn robust models by directly taking the certified radius into the objective. We outline a few challenging desiderata any practical instantiation of this idea would however have to satisfy, and provide approaches to address each of these in turn. A discussion of these desiderata, as well as a detailed implementation of our approach is provided in Section 4. And as we show both theoretically and empirically, our method is numerically stable and accounts for both classification accuracy and robustness.
Our contributions are summarized as follows:
We propose an attack-free and scalable robust training algorithm by MAximizing the CErtified Radius (MACER). MACER has the following advantages compared to previous works:
Different from adversarial training, we train robust models by directly maximizing the certified radius without specifying any attack strategies, and the learned model can achieve provable robustness against any possible attack in the certified region. Additionally, by avoiding time-consuming attack iterations, our proposed algorithm runs much faster than adversarial training.
Different from other methods (Wong & Kolter, 2018) that maximize the certified radius but are not scalable to deep neural networks, our method can be applied to architectures of any size. This makes our algorithm more practical in real scenarios.
We empirically evaluate our proposed method through extensive experiments on Cifar-10, ImageNet, MNIST, and SVHN. On all tasks, MACER achieves better performance than state-of-the-art algorithms. MACER is also exceptionally fast. For example, on ImageNet, MACER uses 39% less training time than adversarial training but still performs better.
2 Related work
Neural networks trained by standard SGD are not robust – a small and human imperceptible perturbation can easily change the prediction of a network. In the white-box setting, methods have been proposed to construct adversarial examples with small or perturbations (Goodfellow et al., 2015; Madry et al., 2017; Carlini & Wagner, 2016; Moosavi-Dezfooli et al., 2015). Furthermore, even in the black-box setting where the adversary does not have access to the model structure and parameters, adversarial examples can be found by either transfer attack (Papernot et al., 2016) or optimization-based approaches (Chen et al., 2017; Rauber et al., 2017; Cheng et al., 2019). It is thus important to study how to improve the robustness of neural networks against adversarial examples.
Adversarial training So far, adversarial training has been the most successful robust training method according to many recent studies. Adversarial training was first proposed in Szegedy et al. (2013) and Goodfellow et al. (2015), where they showed that adding adversarial examples to the training set can improve the robustness against such attacks. More recently, Madry et al. (2017) showed that adversarial training can be formulated as a min-max optimization problem and demonstrated that adversarial training with PGD attack can lead to very robust models empirically. Zhang et al. (2019b) further proposed to decompose robust error as the sum of natural error and boundary error to achieve better performance. Although models obtained by adversarial training empirically achieve good performance, they do not have certified error guarantees.
Despite the popularity of PGD-based adversarial training, one major issue is that its speed is too slow. Some recent papers propose methods to accelerate adversarial training. For example, Free-m (Shafahi et al., 2019) replays an adversarial example several times in one iteration, YOPO-m-n (Zhang et al., 2019a) restricts back propagation in PGD within the first layer, and Qin et al. (2019)estimates the adversary with local linearization.
Robustness certification and provable defense Many defense algorithms proposed in the past few years were claimed to be effective, but Athalye et al. (2018) showed that most of them are based on “gradient masking” and can be bypassed by more carefully designed attacks. It is thus important to study how to measure the provable robustness of a network. A robustness certification algorithm takes a classifier and an input point as inputs, and outputs a “safe radius” such that for any subject to , . Several algorithms have been proposed recently, including the convex polytope technique (Wong & Kolter, 2018), abstract interpretation methods (Singh et al., 2018; Gehr et al., 2018) and the recursive propagation algrithms (Weng et al., 2018; Zhang et al., 2018). These methods can provide attack-agnostic robust error lower bounds. Moreover, to achieve networks with nontrivial certified robust error, one can train a network by minimizing the certified robust error computed by the above-mentioned methods, and several algorithms have been proposed in the past year (Wong & Kolter, 2018; Wong et al., 2018; Wang et al., 2018; Gowal et al., 2018; Zhang et al., 2019c; Mirman et al., 2018). Unfortunately, they can only be applied to shallow networks with limited activation and run very slowly.
More recently, researchers found a new class of certification methods called randomized smoothing. The idea of randomization has been used for defense in several previous works (Xie et al., 2017; Liu et al., 2018) but without any certification. Later on, Lecuyer et al. (2018) first showed that if a Gaussian random noise is added to the input or any intermediate layer. A certified guarantee on small perturbation can be computed via differential privacy. Li et al. (2018) and Cohen et al. (2019) then provided improved ways to compute the certified robust error for Gaussian smoothed models. In this paper, we propose a new algorithm to train on these certified error bounds to significantly reduce the certified error and achieve better provable adversarial robustness.
Consider a standard classification task with an underlying data distribution over pairs of examples and corresponding labels . Usually is unknown and we can only access a training set in which is i.i.d. drawn from ,
. The empirical data distribution (uniform distribution over) is denoted by . Let be the classifier of interest that maps any to . Usually is parameterized by a set of parameters , so we also write it as .
We call an adversarial example of to classifier if can correctly classify but assigns a different label to . Following many previous works (Cohen et al., 2019; Salman et al., 2019), we focus on the setting where satisfies norm constraint . We say that the model is -robust at if it correctly classifies as and for any , the model classifies as . In the problem of robust classification, our ultimate goal is to find a model that is -robust at
with high probability overfor a given .
In image classification we often use deep neural networks. Let be a neural network, whose output at input
is a vector. The classifier induced by is .
In order to train
to normalize it into a probability distribution. The resulting network is111The probability simplex in ., which is given by , , is the inverse temperature. For simplicity, we will use to refer to when the meaning is clear from context. The vector is commonly regarded as the “likelihood vector”, and measures how likely input belongs to class .
By definition, the -robustness of at a data point depends on the radius of the largest ball centered at in which does not change its prediction. This radius is called the robust radius, which is formally defined as
Recall that our ultimate goal is to train a classifier which is -robust at with high probability over the sampling of . Mathematically the goal can be expressed as to minimize the expectation of the 0/1 robust classification error. The error is defined as
and the goal is to minimize its expectation over the population
It is thus quite natural to improve model robustness via maximizing the robust radius. Unfortunately, computing the robust radius (1) of a classifier induced by a deep neural network is very difficult. Weng et al. (2018) showed that computing the robust radius of a deep neural network is NP-hard. Although there is no result for the radius yet, it is very likely that computing the robust radius is also NP-hard.
Many previous works proposed certification methods that seek to derive a tight lower bound of for neural networks (see Section 2 for related work). We call this lower bound certified radius and denote it by . The certified radius satisfies for any .
The certified radius leads to a guaranteed upper bound of the 0/1 robust classification error, which is called 0/1 certified robust error. The 0/1 certified robust error of classifier on sample is defined as
i.e. a sample is counted as correct only if the certified radius reaches . The expectation of certified robust error over serves as a performance metric of the provable robustness:
Recall that is a lower bound of the true robust radius, which immediately implies that . Therefore, a small 0/1 certified robust error leads to a small 0/1 robust classification error.
In this work, we use the recent randomized smoothing technique (Cohen et al., 2019), which is scalable to any architectures, to obtain the certified radius of smoothed deep neural networks. The key part of randomized smoothing is to use the smoothed version of , which is denoted by , to make predictions. The formulation of is defined as follows.
For an arbitrary classifier and , the smoothed classifier of is defined as
In short, the smoothed classifier returns the label most likely to be returned by
when its input is sampled from a Gaussian distributioncentered at . Cohen et al. (2019) proves the following theorem, which provides an analytic form of certified radius:
4 Robust training via maximizing the certified radius
As we can see from Theorem 1, the value of the certified radius can be estimated by repeatedly sampling Gaussian noises. More importantly, it can be computed for any deep neural networks. This motivates us to design a training method to maximize the certified radius and learn robust models.
To minimize the 0/1 robust classification error in (3) or the 0/1 certified robust error in (5), many previous works (Zhang et al., 2019b; Zhai et al., 2019) proposed to first decompose the error. Note that a classifier has a positive 0/1 certified robust error on sample if and only if exactly one of the following two cases happens:
, i.e. the classifier misclassifies .
, but , i.e. the classifier is correct but not robust enough.
Thus, the 0/1 certified robust error can be decomposed as the sum of two error terms: a 0/1 classification error and a 0/1 robustness error:
4.1 Desiderata for objective functions
Minimizing the 0-1 error directly is intractable. A classic method is to minimize a surrogate loss instead. The surrogate loss for the 0/1 classification error is called classification loss and denoted by . The surrogate loss for the 0/1 robustness error is called robustness loss and denoted by . Our final objective function is
We would like our loss functions and to satisfy some favorable conditions. These conditions are summarized below as (C1) - (C3):
(C1) (Surrogate condition): Surrogate loss should be an upper bound of the original error function, i.e. and should be upper bounds of and , respectively.
(C2) (Differentiablity): and should be (sub-)differentiable with respect to .
(C3) (Numerical stability): The computation of and and their (sub-)gradients with respect to should be numerically stable.
The surrogate condition (C1) ensures that itself meets the surrogate condition, i.e.
Conditions (C2) and (C3) ensure that (10) can be stably minimized with first order methods.
4.2 Surrogate losses (for Condition C1)
We next discuss choices of the surrogate losses that ensure we satisfy condition (C1). The classification surrogate loss is relatively easy to design. There are many widely used loss functions from which we can choose, and in this work we choose the cross-entropy loss as the classification loss:
For the robustness surrogate loss, we choose the hinge loss on the certified radius:
where and . We use the hinge loss because not only does it satisfy the surrogate condition, but also it is numerically stable, which we will discuss in Section 4.4.
4.3 Differentiable certified radius via soft randomized smoothing (for Condition C2)
The classification surrogate loss in (12) is differentiable with respect to , but the differentiability of the robustness surrogate loss in (13) requires differentiability of . In this section we will show that the randomized smoothing certified radius in (8) does not meet condition (C2), and accordingly, we will introduce soft randomized smoothing to solve this problem.
Whether the certified radius (8) is sub-differentiable with respect to boils down to the differentiablity of . Theoretically, the expectation is indeed differentiable. However, from a practical point of view, the expectation needs to be estimated by Monte Carlo sampling , where is i.i.d Gaussian noise and is the number of samples. This estimation, which is a sum of indicator functions, is not differentiable. Hence, condition (C2) is still not met from the algorithmic perspective.
To tackle this problem, we leverage soft randomized smoothing (Soft-RS). In contrast to the original version of randomized smoothing (Hard-RS), Soft-RS is applied to a neural network whose last layer is softmax. The soft smoothed classifier is defined as follows.
For a neural network whose last layer is softmax and , the soft smoothed classifier of is defined as
Let the ground truth of an input be . If classifies correctly, i.e.
Then is provably robust at x, with the certified radius given by
where is the c.d.f. of the standard Gaussian distribution.
We notice that in Salman et al. (2019) (see its Appendix B), a similar technique was introduced to overcome the non-differentiability in creating adversarial examples to a smoothed classifier. Different from their work, our method uses Soft-RS to obtain a certified radius that is differentiable in practice. The certified radius given by soft randomized smoothing meets condition (C2) in the algorithmic design. Even if we use Monte Carlo sampling to estimate the expectation, (16) is still sub-differentiable with respect to as long as is sub-differentiable with respect to .
Connection between Soft-RS and Hard-RS
We highlight two main properties of Soft-RS. Firstly, it is a differentiable approximation of the original Hard-RS. To see this, note that when , , so converges to almost everywhere. Consequently, the Soft-RS certified radius (16) converges to the Hard-RS certified radius (8) almost everywhere as goes to infinity. Secondly, Soft-RS itself provides an alternative way to get a provable robustness guarantee. In Appendix A, we will provide Soft-RS certification procedures that certify with the Hoeffding bound or the empirical Bernstein bound.
4.4 Numerical Stability (for Condition C3)
In this section, we will address the numerical stability condition (C3). While Soft-RS does provide us with a differentiable certified radius (16) which we could maximize with first-order optimization methods, directly optimizing (16) suffers from exploding gradients. The problem stems from the inverse cumulative density function , whose derivative is huge when is close to 0 or 1.
Fortunately, by minimizing the robustness loss (13) instead, we can maximize the robust radius free from exploding gradients. The hinge loss restricts that samples with non-zero robustness loss must satisfy , which is equivalent to where and . Under this restriction, the derivative of is always bounded as shown in the following proposition. The proof can be found in Appendix B.
Given any satisfies and , let , the derivative of with respect to and is bounded.
4.5 Complete implementation
We are now ready to present the complete MACER algorithm. Expectations over Gaussian samples are approximated with Monte Carlo sampling. Let be i.i.d. samples from . The final objective function is
where is the empirical expectation of and . During training we minimize . Detailed implementation is described in Algorithm 1. To simplify the implementation, we choose
to be a hyperparameter instead of. The inverse temperature of softmax is also a hyperparameter.
Compare to adversarial training
Adversarial training defines the problem as a mini-max game and solves it by optimizing the inner loop (attack generation) and the outer loop (model update) iteratively. In our method, we only have a single loop (model update). As a result, our proposed algorithm can run much faster than adversarial training because it does not require additional back propagations to generate adversarial examples.
Compare to previous work
The overall objective function of our method, a linear combination of a classification loss and a robustness loss, is similar to those of adversarial logit pairing (ALP)(Kannan et al., 2018) and TRADES (Zhang et al., 2019b). In MACER, the in the objective function (17) can also be viewed as a trade-off factor between accuracy and robustness. However, the robustness term of MACER does not depend on a particular adversarial example , which makes it substantially different from ALP and TRADES.
In this section, we empirically evaluate our proposed MACER algorithm on a wide range of tasks. We also study the influence of different hyperparameters in MACER on the final model performance.
For Cifar-10, MNIST and SVHN, we train the models for 440 epochs using our proposed algorithm. The learning rate is initialized to be 0.01, and is decayed by 0.1 at the 200/400 epoch. For all the models, we use , and . The value of trades off the accuracy and robustness and we find that different leads to different robust accuracy when the model is injected by different levels () of noise. We find setting for and for works best. For ImageNet, we train the models for 120 epochs. The initial learning rate is set to be 0.1 and is decayed by 0.1 at the 30/60/90 epoch. For all models on ImageNet, we use , and . More details can be found in Appendix C.
We compare the performance of MACER with two previous works. The first work (Cohen et al., 2019) trains smoothed networks by simply minimizing cross-entropy loss. The second one (Salman et al., 2019) uses adversarial training on smoothed networks to improve the robustness. For both baselines, we use checkpoints provided by the authors and report their original numbers whenever available. In addition, we run Cohen et al. (2019)’s method on all tasks as it is a speical case of MACER by setting and .
Following previous works, we report the approximated certified test set accuracy, which is the fraction of the test set that can be certified to be robust at radius . However, the approximated certified test set accuracy is a function of the radius . It is hard to compare two models unless one is uniformly better than the other for all . Hence, we also use the average certified radius (ACR) as a metric: for each test data and model , we can estimate the certified radius . The average certified radius is defined as where is the test set. To estimate the certified radius for data points, we use the source code provided by Cohen et al. (2019).
We report the results on Cifar-10 and ImageNet in the main body of the paper. Results on MNIST and SVHN can be found in Appendix C.2.
|Cifar-10||Cohen-0.25 (Cohen et al., 2019)||31.4||150||1.31||0.416|
|Salman-0.25 (Salman et al., 2019)||1990.1||150||82.92||0.538|
|Cohen-0.25 (Cohen et al., 2019)||2154.5||90||53.86||0.470|
|Salman-0.25 (Salman et al., 2019)||7723.8||90||193.10||0.528|
Performance The performance of different models on Cifar-10 are reported in Table 1, and in Figure 1 we display the radius-accuracy curves. Note that the area under a radius-accuracy curve is equal to the ACR of the model. First, the plots show that our proposed method consistently achieves significantly higher approximated certified test set accuracy than Cohen et al. (2019). This shows that robust training via maximizing the certified radius is more effective than simply minimizing the cross entropy classification loss. Second, the performance of our model is different from that of Salman et al. (2019) for different . For example, for , our model achieves higher accuracy than Salman et al. (2019)’s model when , but the performance of ours is worse when . For the average certified radius, our models are better than Salman et al. (2019)’s models222Salman et al. (2019) releases hundreds of models, and we select the model with the largest average certified radius for each as our baseline. in all settings. For example, when , the ACR of our model is about 3% larger than that of Salman et al. (2019)’s. The gain of our model is relatively smaller when . This is because is a very large noise level (Cohen et al., 2019) and both models perform poorly. The ImageNet results are displayed in Table 2 and Figure 2, and the observation is similar. All experimental results show that our proposed algorithm is more effective than previous ones.
Training speed Since MACER does not require adversarial attack during training, it runs much faster to learn a robust model. Empirically, we compare MACER with Salman et al. (2019) on the average training time per epoch and the total training hours, and list the statistics in Table 3. For a fair comparison, we use the codes333https://github.com/locuslab/smoothing444https://github.com/Hadisalman/smoothing-adversarial provided by the original authors and run all algorithms on the same machine. For Cifar-10 we use one NVIDIA P100 GPU and for ImageNet we use four NVIDIA P100 GPUs. According to our experiments, on ImageNet, MACER achieves ACR=0.544 in 117.90 hours. On the contrary, Salman et al. (2019) only achieves ACR=0.528 but uses 193.10 hours, which clearly shows that our method is much more efficient.
One might question whether the higher performance of MACER comes from the fact that we train for more epochs than previous methods. In Section C.3 we also run MACER for 150 epochs and compare it with the models in Table 3. The results show that when run for only 150 epochs, MACER still achieves a performance comparable with SmoothAdv, and is 4 times faster at the same time.
5.3 Effect of hyperparameters
In this section, we carefully examine the effect of different hyperparameters in MACER. All experiments are run on Cifar-10 with or . The results for are shown in Figure 3. All details can be found in Appendix C.4.
Effect of We sample Gaussian samples for each input to estimate the expectation in (16). We can see from Figure 3(a) that using more Gaussian samples usually leads to better performance. For example, the radius-accuracy curve of is uniformly above that of .
Effect of The radius-accuracy curves in Figure 3(b) demonstrate the trade-off effect of . From the figure, we can see that as increases, the clean accuracy drops while the certified accuracy at large radii increases.
Effect of is defined as the hyperparameter in the hinge loss. From Figure 3(c) we can see that when is small, the approximated certified test set accuracy at large radii is small since “truncates” the large radii. As increases, the robust accuracy improves. It appears that also acts as a trade-off between accuracy and robustness, but the effect is not as significant as the effect of .
Effect of Similar to Salman et al. (2019)’s finding (see its Appendix B), we also observe that using a larger produces better results. While Salman et al. (2019) pointed out that a large may make training unstable, we find that if we only apply a large to the robustness loss, we can maintain training stability and achieve a larger average certified radius as well.
6 Conclusion and future work
In this work we propose MACER, an attack-free and scalable robust training method via directly maximizing the certified radius of a smoothed classifier. We discuss the desiderata such an algorithm would have to satisfy, and provide an approach to each of them. According to our extensive experiments, MACER performs better than previous provable -defenses and trains faster. Our strong empirical results suggest that adversarial training is not a must for robust training, and defense based on certification is a promising direction for future research. Moreover, several recent papers (Carmon et al., 2019; Zhai et al., 2019; Stanforth et al., 2019) suggest that using unlabeled data helps improve adversarially robust generalization. We will also extend MACER to the semi-supervised setting.
- Athalye et al. (2018) Anish Athalye, Nicholas Carlini, and David A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. CoRR, abs/1802.00420, 2018. URL http://arxiv.org/abs/1802.00420.
Biggio et al. (2013)
Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim
Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli.
Evasion attacks against machine learning at test time.In Joint European conference on machine learning and knowledge discovery in databases, pp. 387–402. Springer, 2013.
- Carlini & Wagner (2016) Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. CoRR, abs/1608.04644, 2016. URL http://arxiv.org/abs/1608.04644.
- Carmon et al. (2019) Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C Duchi. Unlabeled data improves adversarial robustness. arXiv preprint arXiv:1905.13736, 2019.
Chen et al. (2017)
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh.
Zoo: Zeroth order optimization based black-box attacks to deep neural
networks without training substitute models.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. ACM, 2017.
- Cheng et al. (2019) Minhao Cheng, Thong Le, Pin-Yu Chen, Jinfeng Yi, Huan Zhang, and Cho-Jui Hsieh. Query-efficient hard-label black-box attack: An optimization-based approach. 2019.
- Cohen et al. (2019) Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 1310–1320, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
- Gehr et al. (2018) Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE, 2018.
- Goodfellow et al. (2015) Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. URL http://arxiv.org/abs/1412.6572.
- Gowal et al. (2018) Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018.
- Huang et al. (2015) Ruitong Huang, Bing Xu, Dale Schuurmans, and Csaba Szepesvári. Learning with a strong adversary. arXiv preprint arXiv:1511.03034, 2015.
- Kannan et al. (2018) Harini Kannan, Alexey Kurakin, and Ian J. Goodfellow. Adversarial logit pairing. CoRR, abs/1803.06373, 2018. URL http://arxiv.org/abs/1803.06373.
- Lecuyer et al. (2018) Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471, 2018.
- Li et al. (2018) Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Second-order adversarial attack and certifiable robustness. CoRR, abs/1809.03113, 2018. URL http://arxiv.org/abs/1809.03113.
Liu et al. (2018)
Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh.
Towards robust neural networks via random self-ensemble.
Proceedings of the European Conference on Computer Vision (ECCV), pp. 369–385, 2018.
- Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Maurer & Pontil (2009) Andreas Maurer and Massimiliano Pontil. Empirical Bernstein Bounds and Sample Variance Penalization. arXiv e-prints, art. arXiv:0907.3740, Jul 2009.
- Mirman et al. (2018) Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pp. 3575–3583, 2018.
- Moosavi-Dezfooli et al. (2015) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015. URL http://arxiv.org/abs/1511.04599.
- Papernot et al. (2016) Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814, 2016.
- Qin et al. (2019) Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Alhussein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli, et al. Adversarial robustness through local linearization. arXiv preprint arXiv:1907.02610, 2019.
- Rauber et al. (2017) Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox: A python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131, 2017. URL http://arxiv.org/abs/1707.04131.
- Salman et al. (2019) Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya P. Razenshteyn, and Sébastien Bubeck. Provably robust deep learning via adversarially trained smoothed classifiers. CoRR, abs/1906.04584, 2019. URL http://arxiv.org/abs/1906.04584.
- Shafahi et al. (2019) Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John P. Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! CoRR, abs/1904.12843, 2019. URL http://arxiv.org/abs/1904.12843.
- Singh et al. (2018) Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, pp. 10802–10813, 2018.
- Stanforth et al. (2019) Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli, et al. Are labels required for improving adversarial robustness? arXiv preprint arXiv:1905.13725, 2019.
- Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013. URL http://arxiv.org/abs/1312.6199.
- Wang et al. (2018) Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625, 2018.
- Weng et al. (2018) Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for ReLU networks. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 5276–5285, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
- Wong & Kolter (2018) Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 5286–5295, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
- Wong et al. (2018) Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J. Zico Kolter. Scaling provable adversarial defenses. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems 31, pp. 8400–8409. Curran Associates, Inc., 2018.
- Xie et al. (2017) Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991, 2017.
- Zhai et al. (2019) Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John E. Hopcroft, and Liwei Wang. Adversarially robust generalization just requires more unlabeled data. CoRR, abs/1906.00555, 2019. URL http://arxiv.org/abs/1906.00555.
- Zhang et al. (2019a) Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, and Bin Dong. You only propagate once: Accelerating adversarial training via maximal principle. arXiv preprint arXiv:1905.00877, 2019a.
- Zhang et al. (2019b) Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 7472–7482, Long Beach, California, USA, 09–15 Jun 2019b. PMLR. URL http://proceedings.mlr.press/v97/zhang19p.html.
Zhang et al. (2018)
Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel.
Efficient neural network robustness certification with general activation functions.In Advances in neural information processing systems, pp. 4939–4948, 2018.
- Zhang et al. (2019c) Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. arXiv preprint arXiv:1906.06316, 2019c.
Appendix A Soft randomized smoothing
In this section we provide theoretical analysis and certification procedures for Soft-RS.
a.1 Proof of theorem 2
Our proof is based on the following lemma:
For any measurable function , define , then is -Lipschitz.
This lemma is the generalized version of Lemma 2 in Salman et al. (2019).
Proof of Theorem 2. Let . For any , define as:
Because , by Lemma 1 we have is -Lipschitz. Thus, , for any such that :
Therefore, . Due to the monotonicity of , we have , which implies that . ∎
a.2 Soft-RS certification procedure
Let and . If there exist such that , then with probability at least , . Meanwhile, , so we can take , and
It reduces to find a confidence lower bound of . Here we provide two bounds:
The random variablehas mean , and are its observations. Because for any , we can use Hoeffding’s inequality to obtain a lower confidence bound:
(Hoeffding’s Inequality) Let be independent random variables bounded by the interval . Let , then for any
Denote . By Hoeffding’s inequality we have
Hence, a confidence lower bound of is
Empirical Bernstein Bound
Maurer & Pontil (2009) provides us with a tighter bound:
Consequently, a confidence lower bound of is
The full certification procedure with the above two bounds is described in Algorithm 2.