MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

01/08/2020 ∙ by Runtian Zhai, et al. ∙ Microsoft Carnegie Mellon University Peking University 4

Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. In this paper, we propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. Recent work shows that randomized smoothing can be used to provide a certified l2 radius to smoothed classifiers, and our algorithm trains provably robust smoothed classifiers via MAximizing the CErtified Radius (MACER). The attack-free characteristic makes MACER faster to train and easier to optimize. In our experiments, we show that our method can be applied to modern deep neural networks on a wide range of datasets, including Cifar-10, ImageNet, MNIST, and SVHN. For all tasks, MACER spends less training time than state-of-the-art adversarial training algorithms, and the learned models achieve larger average certified radius.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modern neural network classifiers are able to achieve very high accuracy on image classification tasks but are sensitive to small, adversarially chosen perturbations to the inputs (Szegedy et al., 2013; Biggio et al., 2013). Given an image that is correctly classified by a neural network, a malicious attacker may find a small adversarial perturbation such that the perturbed image , though visually indistinguishable from the original image, is assigned to a wrong class with high confidence by the network. Such vulnerability creates security concerns in many real-world applications.

Researchers have proposed a variety of defense methods to improve the robustness of neural networks. Most of the existing defenses are based on adversarial training (Szegedy et al., 2013; Madry et al., 2017; Goodfellow et al., 2015; Huang et al., 2015; Athalye et al., 2018). During training, these methods first learn on-the-fly adversarial examples of the inputs with multiple attack iterations and then update model parameters using these perturbed samples together with the original labels. However, such approaches depend on a particular (class of) attack method. It cannot be formally guaranteed whether the resulting model is also robust against other attacks. Moreover, attack iterations are usually quite expensive. As a result, adversarial training runs very slowly.

Another line of algorithms trains robust models by maximizing the certified radius provided by robust certification methods (Weng et al., 2018; Wong & Kolter, 2018; Zhang et al., 2018; Mirman et al., 2018; Wang et al., 2018; Gowal et al., 2018; Zhang et al., 2019c)

. Using linear or convex relaxations of fully connected ReLU networks, a robust certification method computes a “safe radius”

for a classifier at a given input such that at any point within the neighboring radius- ball of the input, the classifier is guaranteed to have unchanged predictions. However, the certification methods are usually computationally expensive and can only handle shallow neural networks with ReLU activations, so these training algorithms have troubles in scaling to modern networks.

In this work, we propose an attack-free and scalable method to train robust deep neural networks. We mainly leverage the recent randomized smoothing technique (Cohen et al., 2019). A randomized smoothed classifier for an arbitrary classifier is defined as , in which . While Cohen et al. (2019) derived how to analytically compute the certified radius of the randomly smoothed classifier , they did not show how to maximize that radius to make the classifier robust. Salman et al. (2019) proposed SmoothAdv to improve the robustness of , but it still relies on the expensive attack iterations. Instead of adversarial training, we propose to learn robust models by directly taking the certified radius into the objective. We outline a few challenging desiderata any practical instantiation of this idea would however have to satisfy, and provide approaches to address each of these in turn. A discussion of these desiderata, as well as a detailed implementation of our approach is provided in Section 4. And as we show both theoretically and empirically, our method is numerically stable and accounts for both classification accuracy and robustness.

Our contributions are summarized as follows:

  • We propose an attack-free and scalable robust training algorithm by MAximizing the CErtified Radius (MACER). MACER has the following advantages compared to previous works:

    • Different from adversarial training, we train robust models by directly maximizing the certified radius without specifying any attack strategies, and the learned model can achieve provable robustness against any possible attack in the certified region. Additionally, by avoiding time-consuming attack iterations, our proposed algorithm runs much faster than adversarial training.

    • Different from other methods (Wong & Kolter, 2018) that maximize the certified radius but are not scalable to deep neural networks, our method can be applied to architectures of any size. This makes our algorithm more practical in real scenarios.

  • We empirically evaluate our proposed method through extensive experiments on Cifar-10, ImageNet, MNIST, and SVHN. On all tasks, MACER achieves better performance than state-of-the-art algorithms. MACER is also exceptionally fast. For example, on ImageNet, MACER uses 39% less training time than adversarial training but still performs better.

2 Related work

Neural networks trained by standard SGD are not robust – a small and human imperceptible perturbation can easily change the prediction of a network. In the white-box setting, methods have been proposed to construct adversarial examples with small or perturbations (Goodfellow et al., 2015; Madry et al., 2017; Carlini & Wagner, 2016; Moosavi-Dezfooli et al., 2015). Furthermore, even in the black-box setting where the adversary does not have access to the model structure and parameters, adversarial examples can be found by either transfer attack (Papernot et al., 2016) or optimization-based approaches (Chen et al., 2017; Rauber et al., 2017; Cheng et al., 2019). It is thus important to study how to improve the robustness of neural networks against adversarial examples.

Adversarial training   So far, adversarial training has been the most successful robust training method according to many recent studies. Adversarial training was first proposed in Szegedy et al. (2013) and Goodfellow et al. (2015), where they showed that adding adversarial examples to the training set can improve the robustness against such attacks. More recently, Madry et al. (2017) showed that adversarial training can be formulated as a min-max optimization problem and demonstrated that adversarial training with PGD attack can lead to very robust models empirically. Zhang et al. (2019b) further proposed to decompose robust error as the sum of natural error and boundary error to achieve better performance. Although models obtained by adversarial training empirically achieve good performance, they do not have certified error guarantees.

Despite the popularity of PGD-based adversarial training, one major issue is that its speed is too slow. Some recent papers propose methods to accelerate adversarial training. For example, Free-m (Shafahi et al., 2019) replays an adversarial example several times in one iteration, YOPO-m-n (Zhang et al., 2019a) restricts back propagation in PGD within the first layer, and Qin et al. (2019)estimates the adversary with local linearization.

Robustness certification and provable defense   Many defense algorithms proposed in the past few years were claimed to be effective, but Athalye et al. (2018) showed that most of them are based on “gradient masking” and can be bypassed by more carefully designed attacks. It is thus important to study how to measure the provable robustness of a network. A robustness certification algorithm takes a classifier and an input point as inputs, and outputs a “safe radius” such that for any subject to , . Several algorithms have been proposed recently, including the convex polytope technique (Wong & Kolter, 2018), abstract interpretation methods (Singh et al., 2018; Gehr et al., 2018) and the recursive propagation algrithms (Weng et al., 2018; Zhang et al., 2018). These methods can provide attack-agnostic robust error lower bounds. Moreover, to achieve networks with nontrivial certified robust error, one can train a network by minimizing the certified robust error computed by the above-mentioned methods, and several algorithms have been proposed in the past year (Wong & Kolter, 2018; Wong et al., 2018; Wang et al., 2018; Gowal et al., 2018; Zhang et al., 2019c; Mirman et al., 2018). Unfortunately, they can only be applied to shallow networks with limited activation and run very slowly.

More recently, researchers found a new class of certification methods called randomized smoothing. The idea of randomization has been used for defense in several previous works (Xie et al., 2017; Liu et al., 2018) but without any certification. Later on, Lecuyer et al. (2018) first showed that if a Gaussian random noise is added to the input or any intermediate layer. A certified guarantee on small perturbation can be computed via differential privacy. Li et al. (2018) and Cohen et al. (2019) then provided improved ways to compute the certified robust error for Gaussian smoothed models. In this paper, we propose a new algorithm to train on these certified error bounds to significantly reduce the certified error and achieve better provable adversarial robustness.

3 Preliminaries

Problem setup

Consider a standard classification task with an underlying data distribution over pairs of examples and corresponding labels . Usually is unknown and we can only access a training set in which is i.i.d. drawn from ,

. The empirical data distribution (uniform distribution over

) is denoted by . Let be the classifier of interest that maps any to . Usually is parameterized by a set of parameters , so we also write it as .

We call an adversarial example of to classifier if can correctly classify but assigns a different label to . Following many previous works (Cohen et al., 2019; Salman et al., 2019), we focus on the setting where satisfies norm constraint . We say that the model is -robust at if it correctly classifies as and for any , the model classifies as . In the problem of robust classification, our ultimate goal is to find a model that is -robust at

with high probability over

for a given .

Neural network

In image classification we often use deep neural networks. Let be a neural network, whose output at input

is a vector

. The classifier induced by is .

In order to train

by minimizing a loss function such as cross entropy, we always use a softmax layer on

to normalize it into a probability distribution. The resulting network is

111The probability simplex in ., which is given by , , is the inverse temperature. For simplicity, we will use to refer to when the meaning is clear from context. The vector is commonly regarded as the “likelihood vector”, and measures how likely input belongs to class .

Robust radius

By definition, the -robustness of at a data point depends on the radius of the largest ball centered at in which does not change its prediction. This radius is called the robust radius, which is formally defined as

(1)

Recall that our ultimate goal is to train a classifier which is -robust at with high probability over the sampling of . Mathematically the goal can be expressed as to minimize the expectation of the 0/1 robust classification error. The error is defined as

(2)

and the goal is to minimize its expectation over the population

(3)

It is thus quite natural to improve model robustness via maximizing the robust radius. Unfortunately, computing the robust radius (1) of a classifier induced by a deep neural network is very difficult. Weng et al. (2018) showed that computing the robust radius of a deep neural network is NP-hard. Although there is no result for the radius yet, it is very likely that computing the robust radius is also NP-hard.

Certified radius

Many previous works proposed certification methods that seek to derive a tight lower bound of for neural networks (see Section 2 for related work). We call this lower bound certified radius and denote it by . The certified radius satisfies for any .

The certified radius leads to a guaranteed upper bound of the 0/1 robust classification error, which is called 0/1 certified robust error. The 0/1 certified robust error of classifier on sample is defined as

(4)

i.e. a sample is counted as correct only if the certified radius reaches . The expectation of certified robust error over serves as a performance metric of the provable robustness:

(5)

Recall that is a lower bound of the true robust radius, which immediately implies that . Therefore, a small 0/1 certified robust error leads to a small 0/1 robust classification error.

Randomized smoothing

In this work, we use the recent randomized smoothing technique (Cohen et al., 2019), which is scalable to any architectures, to obtain the certified radius of smoothed deep neural networks. The key part of randomized smoothing is to use the smoothed version of , which is denoted by , to make predictions. The formulation of is defined as follows.

Definition 1.

For an arbitrary classifier and , the smoothed classifier of is defined as

(6)

In short, the smoothed classifier returns the label most likely to be returned by

when its input is sampled from a Gaussian distribution

centered at . Cohen et al. (2019) proves the following theorem, which provides an analytic form of certified radius:

Theorem 1.

(Cohen et al., 2019) Let , and . Let the smoothed classifier be defined as in (6). Let the ground truth of an input be . If classifies correctly, i.e.

(7)

Then is provably robust at , with the certified radius given by

(8)

where is the c.d.f. of the standard Gaussian distribution.

4 Robust training via maximizing the certified radius

As we can see from Theorem 1, the value of the certified radius can be estimated by repeatedly sampling Gaussian noises. More importantly, it can be computed for any deep neural networks. This motivates us to design a training method to maximize the certified radius and learn robust models.

To minimize the 0/1 robust classification error in (3) or the 0/1 certified robust error in (5), many previous works (Zhang et al., 2019b; Zhai et al., 2019) proposed to first decompose the error. Note that a classifier has a positive 0/1 certified robust error on sample if and only if exactly one of the following two cases happens:

  • , i.e. the classifier misclassifies .

  • , but , i.e. the classifier is correct but not robust enough.

Thus, the 0/1 certified robust error can be decomposed as the sum of two error terms: a 0/1 classification error and a 0/1 robustness error:

(9)

4.1 Desiderata for objective functions

Minimizing the 0-1 error directly is intractable. A classic method is to minimize a surrogate loss instead. The surrogate loss for the 0/1 classification error is called classification loss and denoted by . The surrogate loss for the 0/1 robustness error is called robustness loss and denoted by . Our final objective function is

(10)

We would like our loss functions and to satisfy some favorable conditions. These conditions are summarized below as (C1) - (C3):

  • (C1) (Surrogate condition): Surrogate loss should be an upper bound of the original error function, i.e. and should be upper bounds of and , respectively.

  • (C2) (Differentiablity): and should be (sub-)differentiable with respect to .

  • (C3) (Numerical stability): The computation of and and their (sub-)gradients with respect to should be numerically stable.

The surrogate condition (C1) ensures that itself meets the surrogate condition, i.e.

(11)

Conditions (C2) and (C3) ensure that (10) can be stably minimized with first order methods.

4.2 Surrogate losses (for Condition C1)

We next discuss choices of the surrogate losses that ensure we satisfy condition (C1). The classification surrogate loss is relatively easy to design. There are many widely used loss functions from which we can choose, and in this work we choose the cross-entropy loss as the classification loss:

(12)

For the robustness surrogate loss, we choose the hinge loss on the certified radius:

(13)

where and . We use the hinge loss because not only does it satisfy the surrogate condition, but also it is numerically stable, which we will discuss in Section 4.4.

4.3 Differentiable certified radius via soft randomized smoothing (for Condition C2)

The classification surrogate loss in (12) is differentiable with respect to , but the differentiability of the robustness surrogate loss in (13) requires differentiability of . In this section we will show that the randomized smoothing certified radius in (8) does not meet condition (C2), and accordingly, we will introduce soft randomized smoothing to solve this problem.

Whether the certified radius (8) is sub-differentiable with respect to boils down to the differentiablity of . Theoretically, the expectation is indeed differentiable. However, from a practical point of view, the expectation needs to be estimated by Monte Carlo sampling , where is i.i.d Gaussian noise and is the number of samples. This estimation, which is a sum of indicator functions, is not differentiable. Hence, condition (C2) is still not met from the algorithmic perspective.

To tackle this problem, we leverage soft randomized smoothing (Soft-RS). In contrast to the original version of randomized smoothing (Hard-RS), Soft-RS is applied to a neural network whose last layer is softmax. The soft smoothed classifier is defined as follows.

Definition 2.

For a neural network whose last layer is softmax and , the soft smoothed classifier of is defined as

(14)

Using Lemma 2 in Salman et al. (2019), we prove the following theorem in Appendix A:

Theorem 2.

Let the ground truth of an input be . If classifies correctly, i.e.

(15)

Then is provably robust at x, with the certified radius given by

(16)

where is the c.d.f. of the standard Gaussian distribution.

We notice that in Salman et al. (2019) (see its Appendix B), a similar technique was introduced to overcome the non-differentiability in creating adversarial examples to a smoothed classifier. Different from their work, our method uses Soft-RS to obtain a certified radius that is differentiable in practice. The certified radius given by soft randomized smoothing meets condition (C2) in the algorithmic design. Even if we use Monte Carlo sampling to estimate the expectation, (16) is still sub-differentiable with respect to as long as is sub-differentiable with respect to .

Connection between Soft-RS and Hard-RS

We highlight two main properties of Soft-RS. Firstly, it is a differentiable approximation of the original Hard-RS. To see this, note that when , , so converges to almost everywhere. Consequently, the Soft-RS certified radius (16) converges to the Hard-RS certified radius (8) almost everywhere as goes to infinity. Secondly, Soft-RS itself provides an alternative way to get a provable robustness guarantee. In Appendix A, we will provide Soft-RS certification procedures that certify with the Hoeffding bound or the empirical Bernstein bound.

4.4 Numerical Stability (for Condition C3)

In this section, we will address the numerical stability condition (C3). While Soft-RS does provide us with a differentiable certified radius (16) which we could maximize with first-order optimization methods, directly optimizing (16) suffers from exploding gradients. The problem stems from the inverse cumulative density function , whose derivative is huge when is close to 0 or 1.

Fortunately, by minimizing the robustness loss (13) instead, we can maximize the robust radius free from exploding gradients. The hinge loss restricts that samples with non-zero robustness loss must satisfy , which is equivalent to where and . Under this restriction, the derivative of is always bounded as shown in the following proposition. The proof can be found in Appendix B.

Proposition 1.

Given any satisfies and , let , the derivative of with respect to and is bounded.

4.5 Complete implementation

We are now ready to present the complete MACER algorithm. Expectations over Gaussian samples are approximated with Monte Carlo sampling. Let be i.i.d. samples from . The final objective function is

(17)

where is the empirical expectation of and . During training we minimize . Detailed implementation is described in Algorithm 1. To simplify the implementation, we choose

to be a hyperparameter instead of

. The inverse temperature of softmax is also a hyperparameter.

1:Input: Training set , noise level , number of Gaussian samples , trade-off factor , hinge factor , inverse temperature , model parameters
2:for each iteration do
3:     Sample a minibatch
4:     For each , sample i.i.d. Gaussian samples
5:     Compute the empirical expectations: for
6:     Compute :
7:     For each , compute :
8:     For each , compute :
9:     Update with one step of any first-order optimization method to minimize
10:end for
Algorithm 1 MACER: robust training via MAximizing CErtified Radius
Compare to adversarial training

Adversarial training defines the problem as a mini-max game and solves it by optimizing the inner loop (attack generation) and the outer loop (model update) iteratively. In our method, we only have a single loop (model update). As a result, our proposed algorithm can run much faster than adversarial training because it does not require additional back propagations to generate adversarial examples.

Compare to previous work

The overall objective function of our method, a linear combination of a classification loss and a robustness loss, is similar to those of adversarial logit pairing (ALP)

(Kannan et al., 2018) and TRADES (Zhang et al., 2019b). In MACER, the in the objective function (17) can also be viewed as a trade-off factor between accuracy and robustness. However, the robustness term of MACER does not depend on a particular adversarial example , which makes it substantially different from ALP and TRADES.

5 Experiments

In this section, we empirically evaluate our proposed MACER algorithm on a wide range of tasks. We also study the influence of different hyperparameters in MACER on the final model performance.

5.1 Setup

To fairly compare with previous works, we follow Cohen et al. (2019) and Salman et al. (2019) to use LeNet for MNIST, ResNet-110 for Cifar-10 and SVHN, and ResNet-50 for ImageNet.

MACER Training

For Cifar-10, MNIST and SVHN, we train the models for 440 epochs using our proposed algorithm. The learning rate is initialized to be 0.01, and is decayed by 0.1 at the 200

/400 epoch. For all the models, we use , and . The value of trades off the accuracy and robustness and we find that different leads to different robust accuracy when the model is injected by different levels () of noise. We find setting for and for works best. For ImageNet, we train the models for 120 epochs. The initial learning rate is set to be 0.1 and is decayed by 0.1 at the 30/60/90 epoch. For all models on ImageNet, we use , and . More details can be found in Appendix C.

Baselines

We compare the performance of MACER with two previous works. The first work (Cohen et al., 2019) trains smoothed networks by simply minimizing cross-entropy loss. The second one (Salman et al., 2019) uses adversarial training on smoothed networks to improve the robustness. For both baselines, we use checkpoints provided by the authors and report their original numbers whenever available. In addition, we run Cohen et al. (2019)’s method on all tasks as it is a speical case of MACER by setting and .

Certification

Following previous works, we report the approximated certified test set accuracy, which is the fraction of the test set that can be certified to be robust at radius . However, the approximated certified test set accuracy is a function of the radius . It is hard to compare two models unless one is uniformly better than the other for all . Hence, we also use the average certified radius (ACR) as a metric: for each test data and model , we can estimate the certified radius . The average certified radius is defined as where is the test set. To estimate the certified radius for data points, we use the source code provided by Cohen et al. (2019).

5.2 Results

We report the results on Cifar-10 and ImageNet in the main body of the paper. Results on MNIST and SVHN can be found in Appendix C.2.

Model 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 ACR
0.25 Cohen-0.25 0.75 0.60 0.43 0.26 0 0 0 0 0 0 0.416
Salman-0.25 0.74 0.67 0.57 0.47 0 0 0 0 0 0 0.538
MACER-0.25 0.81 0.71 0.59 0.43 0 0 0 0 0 0 0.556
0.50 Cohen-0.50 0.65 0.54 0.41 0.32 0.23 0.15 0.09 0.04 0 0 0.491
Salman-0.50 0.50 0.46 0.44 0.40 0.38 0.33 0.29 0.23 0 0 0.709
MACER-0.50 0.66 0.60 0.53 0.46 0.38 0.29 0.19 0.12 0 0 0.726
1.00 Cohen-1.00 0.47 0.39 0.34 0.28 0.21 0.17 0.14 0.08 0.05 0.03 0.458
Salman-1.00 0.45 0.41 0.38 0.35 0.32 0.28 0.25 0.22 0.19 0.17 0.787
MACER-1.00 0.45 0.41 0.38 0.35 0.32 0.29 0.25 0.22 0.18 0.16 0.792

Table 1: Approximated certified test accuracy and ACR on Cifar-10: Each column is an radius.
(a)
(b)
(c)
Figure 1: Radius-accuracy curves of different Cifar-10 models.
Model 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ACR
0.25 Cohen-0.25 0.67 0.49 0 0 0 0 0 0.470
Salman-0.25 0.65 0.56 0 0 0 0 0 0.528
MACER-0.25 0.68 0.57 0 0 0 0 0 0.544
0.50 Cohen-0.50 0.57 0.46 0.37 0.29 0 0 0 0.720
Salman-0.50 0.54 0.49 0.43 0.37 0 0 0 0.815
MACER-0.50 0.64 0.53 0.43 0.31 0 0 0 0.831
1.00 Cohen-1.00 0.44 0.38 0.33 0.26 0.19 0.15 0.12 0.863
Salman-1.00 0.40 0.37 0.34 0.30 0.27 0.25 0.20 1.003
MACER-1.00 0.48 0.43 0.36 0.30 0.25 0.18 0.14 1.008
Table 2: Approximated certified test accuracy and ACR on ImageNet: Each column is an radius.
(a)
(b)
(c)
Figure 2: Radius-accuracy curves of different ImageNet models.
Dataset Model sec/epoch Epochs Total hrs ACR
Cifar-10 Cohen-0.25 (Cohen et al., 2019) 31.4 150 1.31 0.416
Salman-0.25 (Salman et al., 2019) 1990.1 150 82.92 0.538
MACER-0.25 (ours) 504.0 440 61.60 0.556


ImageNet
Cohen-0.25 (Cohen et al., 2019) 2154.5 90 53.86 0.470
Salman-0.25 (Salman et al., 2019) 7723.8 90 193.10 0.528
MACER-0.25 (ours) 3537.1 120 117.90 0.544

Table 3: Training time and performance of models.
(a) Effect of
(b) Effect of
(c) Effect of
(d) Effect of
Figure 3: Effect of hyperparameters on Cifar-10 ().

Performance   The performance of different models on Cifar-10 are reported in Table 1, and in Figure 1 we display the radius-accuracy curves. Note that the area under a radius-accuracy curve is equal to the ACR of the model. First, the plots show that our proposed method consistently achieves significantly higher approximated certified test set accuracy than Cohen et al. (2019). This shows that robust training via maximizing the certified radius is more effective than simply minimizing the cross entropy classification loss. Second, the performance of our model is different from that of Salman et al. (2019) for different . For example, for , our model achieves higher accuracy than Salman et al. (2019)’s model when , but the performance of ours is worse when . For the average certified radius, our models are better than Salman et al. (2019)’s models222Salman et al. (2019) releases hundreds of models, and we select the model with the largest average certified radius for each as our baseline. in all settings. For example, when , the ACR of our model is about 3% larger than that of Salman et al. (2019)’s. The gain of our model is relatively smaller when . This is because is a very large noise level (Cohen et al., 2019) and both models perform poorly. The ImageNet results are displayed in Table 2 and Figure 2, and the observation is similar. All experimental results show that our proposed algorithm is more effective than previous ones.

Training speed   Since MACER does not require adversarial attack during training, it runs much faster to learn a robust model. Empirically, we compare MACER with Salman et al. (2019) on the average training time per epoch and the total training hours, and list the statistics in Table 3. For a fair comparison, we use the codes333https://github.com/locuslab/smoothing444https://github.com/Hadisalman/smoothing-adversarial provided by the original authors and run all algorithms on the same machine. For Cifar-10 we use one NVIDIA P100 GPU and for ImageNet we use four NVIDIA P100 GPUs. According to our experiments, on ImageNet, MACER achieves ACR=0.544 in 117.90 hours. On the contrary, Salman et al. (2019) only achieves ACR=0.528 but uses 193.10 hours, which clearly shows that our method is much more efficient.

One might question whether the higher performance of MACER comes from the fact that we train for more epochs than previous methods. In Section C.3 we also run MACER for 150 epochs and compare it with the models in Table 3. The results show that when run for only 150 epochs, MACER still achieves a performance comparable with SmoothAdv, and is 4 times faster at the same time.

5.3 Effect of hyperparameters

In this section, we carefully examine the effect of different hyperparameters in MACER. All experiments are run on Cifar-10 with or . The results for are shown in Figure 3. All details can be found in Appendix C.4.

Effect of  We sample Gaussian samples for each input to estimate the expectation in (16). We can see from Figure 3(a) that using more Gaussian samples usually leads to better performance. For example, the radius-accuracy curve of is uniformly above that of .

Effect of  The radius-accuracy curves in Figure 3(b) demonstrate the trade-off effect of . From the figure, we can see that as increases, the clean accuracy drops while the certified accuracy at large radii increases.

Effect of   is defined as the hyperparameter in the hinge loss. From Figure 3(c) we can see that when is small, the approximated certified test set accuracy at large radii is small since “truncates” the large radii. As increases, the robust accuracy improves. It appears that also acts as a trade-off between accuracy and robustness, but the effect is not as significant as the effect of .

Effect of  Similar to Salman et al. (2019)’s finding (see its Appendix B), we also observe that using a larger produces better results. While Salman et al. (2019) pointed out that a large may make training unstable, we find that if we only apply a large to the robustness loss, we can maintain training stability and achieve a larger average certified radius as well.

6 Conclusion and future work

In this work we propose MACER, an attack-free and scalable robust training method via directly maximizing the certified radius of a smoothed classifier. We discuss the desiderata such an algorithm would have to satisfy, and provide an approach to each of them. According to our extensive experiments, MACER performs better than previous provable -defenses and trains faster. Our strong empirical results suggest that adversarial training is not a must for robust training, and defense based on certification is a promising direction for future research. Moreover, several recent papers (Carmon et al., 2019; Zhai et al., 2019; Stanforth et al., 2019) suggest that using unlabeled data helps improve adversarially robust generalization. We will also extend MACER to the semi-supervised setting.

References

Appendix A Soft randomized smoothing

In this section we provide theoretical analysis and certification procedures for Soft-RS.

a.1 Proof of theorem 2

Our proof is based on the following lemma:

Lemma 1.

For any measurable function , define , then is -Lipschitz.

This lemma is the generalized version of Lemma 2 in Salman et al. (2019).

Proof of Theorem 2. Let . For any , define as:

(18)

Because , by Lemma 1 we have is -Lipschitz. Thus, , for any such that :

(19)

Therefore, . Due to the monotonicity of , we have , which implies that . ∎

a.2 Soft-RS certification procedure

Let and . If there exist such that , then with probability at least , . Meanwhile, , so we can take , and

(20)

It reduces to find a confidence lower bound of . Here we provide two bounds:

Hoeffding Bound

The random variable

has mean , and are its observations. Because for any , we can use Hoeffding’s inequality to obtain a lower confidence bound:

Lemma 2.

(Hoeffding’s Inequality) Let be independent random variables bounded by the interval . Let , then for any

(21)

Denote . By Hoeffding’s inequality we have

(22)

Hence, a confidence lower bound of is

(23)
Empirical Bernstein Bound

Maurer & Pontil (2009) provides us with a tighter bound:

Theorem 3.

(Theorem 4 in Maurer & Pontil (2009)) Under the conditions of Lemma 2, with probability at least ,

(24)

where

is the sample variance of

, i.e.

(25)

Consequently, a confidence lower bound of is

(26)

The full certification procedure with the above two bounds is described in Algorithm 2.

1:# Certify the robustness of around an input with Hoeffding bound
2:function CertifyHoeffding(, , , , , )
3:     
4:     
5:     
6:     
7:     if then return prediction and radius
8:     else return ABSTAIN
9:end function 
10:# Certify with empirical Bernstein bound
11:function CertifyBernstein(, , , , , )
12:     
13:     
14: