Adversarial Boot Camp: label free certified robustness in one epoch

10/05/2020 ∙ by Ryan Campbell, et al. ∙ McGill University 1

Machine learning models are vulnerable to adversarial attacks. One approach to addressing this vulnerability is certification, which focuses on models that are guaranteed to be robust for a given perturbation size. A drawback of recent certified models is that they are stochastic: they require multiple computationally expensive model evaluations with random noise added to a given input. In our work, we present a deterministic certification approach which results in a certifiably robust model. This approach is based on an equivalence between training with a particular regularized loss, and the expected values of Gaussian averages. We achieve certified models on ImageNet-1k by retraining a model with this loss for one epoch without the use of label information.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Neural networks are very accurate on image classification tasks, but they are vulnerable to adversarial perturbations, i.e. small changes to the model input leading to misclassification (Szegedy et al., 2014). Adversarial training (Madry et al., 2018) improves robustness, at the expense of a loss of accuracy on unperturbed images (Zhang et al., 2019). Model certification (Lécuyer et al., 2019, Raghunathan et al., 2018, Cohen et al., 2019) is complementary approach to adversarial training, which provides a guarantee that a model prediction is invariant to perturbations up to a given norm.

Given an input , the model is certified to norm at if it gives the same classification on for all perturbation with norm up to ,

(1)

Cohen et al. (2019) and Salman et al. (2019) certify models by defining a “smoothed” model, , which is the expected Gaussian average of our initial model at a given input example ,

(2)

where the perturbation is sampled from a Gaussian, . Cohen et al. (2019) used a probabilistic argument to show that models defined by (2

) can be certified to a given radius by making a large number of stochastic model evaluations. Certified models can classify by first averaging the model,

(Salman et al., 2019), or by taking the mode, the most popular classification given by the ensemble (Cohen et al., 2019).

Cohen et al. and Salman et al. approximate the model stochastically, using a Gaussian ensemble, which consists of evaluating the base model multiple times on the image perturbed by noise. Like all ensemble models, these stochastic models require multiple inferences, which is more costly than performing inference a single time. In addition, these stochastic models require training the base model from scratch, by exposing it to Gaussian noise, in order to improve the accuracy of . Salman et al. (2019) additionally expose the model to adversarial attacks during training. In the case of certified models, there is a trade-off between certification and accuracy: the certified models lose accuracy on unperturbed images.

In this work, we present a deterministic model given by (2). Unlike the stochastic models, we do not need to train a new model from scratch with Gaussian noise data augmentations; instead, we can fine tune an accurate baseline model for one epoch, using a loss designed to encourage (2) to hold. The result is a certified deterministic model which is faster to train, and faster to perform inference

. In addition, the certification radius for our model is improved, compared to previous work, on both the CIFAR-10 and ImageNet databases, see 

Figure 1. Moreover, the accuracy on unperturbed images is improved on the ImageNet dataset (Deng et al., 2009) with a Top-5 accuracy of 86.1% for our deterministic model versus 85.3% for Cohen et al. (2019) and 78.8% for Salman et al. (2019).

To our knowledge, this is the first deterministic Gaussian smoothing certification technique. The main appeal of our approach is a large decrease in the number of function evaluations for computing certifiably robust models, and a corresponding decrease in compute time at inference. Rather than stochastically sampling many times from a Gaussian at inference time, our method is certifiably robust with only a single model query. This greater speed in model evaluation is demonstrated in Table 2. Moreover, the deterministic certified model can be obtained by retraining a model for one epoch, without using labels. The speed and flexibility of this method allows it to be used to make empirically robust models certifiably robust. To see this, we test our method on adversarially trained models from Madry et al. (2018), increasing the certified radius of the adversarially robust model, see Table 4.

[width=]figs/cifar10_cert.png

(a) CIFAR-10 top-1 certified accuracy,

[width=]figs/imagenet_cert.png

(b) ImageNet top-5 certified accuracy,
Figure 1: Certified accuracy as a function of radius.
Model Can be obtained from Evaluation in one Is certified?
any pretrained model forward pass
Deterministic Smoothing (ours)
Stochastic Smoothing
Adversarial Training
Table 1: A comparison of robust models. Stochastic smoothing arises from methods like the ones presented in Cohen et al. (2019) and Salman et al. (2019). Adversarial training from Madry et al. (2018).
Model CIFAR-10 ImageNet-1k
CPU GPU CPU GPU
Deterministic (ours) 0.0049 0.0080 0.0615 0.0113
Stochastic (Cohen et al., 2019) 0.0480 0.0399 0.1631 0.0932
Table 2: Average classification inference time (seconds)

2 Related work

The issue of adversarial vulnerability arose in the works of Szegedy et al. (2014) and Goodfellow et al. (2015), and has spawned a vast body of research. The idea of training models to be robust to adversarial attacks was widely popularized in Madry et al. (2018). This method, known as adversarial training, trains a model on images corrupted by gradient-based adversarial attacks, resulting in robust models. In terms of certification, early work by Cheng et al. (2017) provided a method of computing maximum perturbation bounds for neural networks, and reduced to solving a mixed integer optimization problem. Weng et al. (2018a) introduced non-trivial robustness bounds for fully connected networks, and provided tight robustness bounds at low computational cost. Weng et al. (2018b) proposed a metric that has theoretical grounding based on Lipschitz continuity of the classifier model and is scaleable to state-of-the-art ImageNet neural network classifiers. Zhang et al. (2018)

proposed a general framework to certify neural networks based on linear and quadratic bounding techniques on the activation functions, which is more flexible than its predecessors.

Training a neural network with Gaussian noise has been shown to be equivalent to gradient regularization (Bishop, 1995). This helps improve robustness of models; however, recent work has used additive noise during training and evaluation for certification purposes. Lécuyer et al. (2019) first considered adding random Gaussian noise as a certifiable defense in a method called PixelDP. In their method, they take a known neural network architecture and add a layer of random noise to make the model’s output random. The expected classification is in turn more robust to adversarial perturbations. Furthermore, their defense is a certified defense, meaning they provide a lower bound on the amount of adversarial perturbations for which their defence will always work. In a following work, Li et al. (2018) provided a defence with improved certified robustness. The certification guarantees given in these two papers are loose, meaning the defended model will always be more robust than the certification bound indicates.

In contrast, Cohen et al. (2019) provided a defence utilizing randomized Gaussian smoothing that leads to tight robustness guarantees under the norm. Moreover Cohen et al. used Monte Carlo sampling to compute the radius in which a model’s prediction is unchanged; we refer to this method as RandomizedSmoothing. In work building on Cohen et al., Salman et al. (2019) developed an adversarial training framework called SmoothAdv and defined a Lipschitz constant of averaged models. Yang et al. (2020) generalize previous randomized smoothing methods by providing robustness guarantees in the , , and

norms for smoothing with several non-Gaussian distributions.

3 Deterministic Smoothing

Suppose we are given a dataset consisting of paired samples where is an example with corresponding true classification

. The supervised learning approach trains a model

which maps images to a vector whose length equals the number of classes. Suppose

is the initial model, and let be the averaged model given by Equation (2). Cohen et al. (2019) find a Gaussian smoothed classification model by sampling independently times, performing classifications, and then computing the most popular classification. In the randomized smoothing method, the initial model is trained on data which is augmented with Gaussian noise to improve accuracy on noisy images.

We take a different approach to Gaussian smoothing. Starting from an accurate pretrained model , we now discard the training labels, and iteratively retrain a new model, using a quadratic loss between the model and the new model’s predictions, with an additional gradient regularization term. We have found that discarding the original one-hot labels and instead using model predictions helps make the model smoother.

To be precise, our new models is a result of minimizing the loss which we call HeatSmoothing,

(3)

The smoothing achieved by the new models is illustrated schematically in Figure 4.

3.1 Related regularized losses

Gradient regularization is known to be equivalent to Gaussian smoothing (Bishop, 1995, LeCun et al., 1998). Our deterministic smoothed model arises by training using the HeatSmoothing loss (3), which is designed so to ensure that (2) holds for our model. Our results is related to the early results on regularized networks (Bishop, 1995, LeCun et al., 1998): that full gradient regularization is equivalent to Gaussian smoothing. Formally this is stated as follows.

Theorem 1.

(Bishop, 1995)

Training a feed-forward neural-network model using the quadratic (or mean-squared error) loss, with added Gaussian noise of mean 0 and variance

to the inputs, is equivalent to training with

(4)

up to higher order terms.

The equivalence is normally used to go from models augmented with Gaussian noise to regularized models. Here, we use the result in the other direction; we train a regularized model in order to produce a model which is equivalent to evaluating with noise. In practice, this means that rather than adding noise to regularize models for certifiable robustness, we explicitly perform a type of gradient regularization, in order to produce a model which performs as if Gaussian noise was added. See Figure 3 in Appendix D for an illustration of the effect of this gradient regularization.

The gradient regularization term in the HeatSmoothing loss (3), is also related to adversarial training. Tikhonov regularization is used to produced adversarially robust models (Finlay and Oberman, 2019). However in adversarial training, the gradient of the loss is used, rather that the gradient of the full model. Also, our loss does not use information from the true labels. The reason for these differences is due to the fact that we wish to have a model that approximates the Gaussian average of our initial model (see Appendix A). Furthermore, minimizing the gradient-norm of the loss of the output gives us a smooth model in all directions, rather than being robust to only adversarial perturbations.

3.2 Algorithmic Details

We have found that early on in training, the value may be far greater than the term. So we introduced a softmax of the vectors in the distance-squared term to reduce the overall magnitude of this term. We perform the training minimization of (3) for one epoch. The pseudo-code for our neural network weight update is given by Algorithm 1 111Code and links to trained models are publicly available at https://github.com/ryancampbell514/HeatSmoothing/

Note that the term in (3) requires the computation of a Jacobian matrix norm. In high dimensions this is computationally expensive. To approximate this term, we make use of the Johnson-Lindenstrauss lemma (Johnson and Lindenstrauss, 1984, Vempala, 2005) followed by the finite difference approximation from Finlay and Oberman (2019). We are able to approximate by taking the average of the product of the Jacobian matrix and Gaussian noise vectors. Jacobian-vector products can be easily computed via reverse mode automatic differentiation, by moving the noise vector inside:

(5)

Further computation expense is reduced by using finite-differences to approximate the norm of the gradient. Once the finite-difference is computed, we detach this term from the automatic differentiation computation graph, further speeding training. More details of our implementation of these approximation techniques, and the definition of the term which is a regularization of the gradient, are presented in Appendix B.

3.3 Theoretical details

Input : Minibatch of input examples
A model set to ‘‘train’’ mode
Current model set to ‘‘eval’’ mode

, standard deviation of Gaussian smoothing

, number of Gaussian noise replications (default)
, finite difference step-size (default)
Update : learning-rate according to a pre-defined scheduler.
for  do
       Compute : 
      
       for  do
             Generate ,
             Compute the normalized gradient via (18)
             Detach from the computation graph
            
       end for
      
      
end for
Update the weights of

by running backpropagation on

at the current learning rate.
Algorithm 1 HeatSmoothing Neural Network Weight Update

We appeal to partial differential equations (PDE) theory for explaining the equivalence between gradient regularization and Gaussian convolution (averaging) of the model

222We sometimes interchange the terms Gaussian averaging and Gaussian convolution; they are equivalent, as shown in Theorem 2.. The idea is that the gradient term which appears in the loss leads to a smoothing of the new function (model). The fact that the exact form of the smoothing corresponds to Gaussian convolution is a mathematical result which can be interpreted probabilistically or using techniques from analysis. Briefly, we detail the link as follows.

Einstein (1906) showed that the function value of an averaged model under Brownian motion is related to the heat equation (a PDE); the theory of stochastic differential equations makes this rigorous (Karatzas and Shreve, 1998). Moreover, solutions of the heat equation are given by Gaussian convolution with the original model. Crucially, solutions of the heat equation can be interpreted as iterations of a regularized loss problem (called a variational energy) like that of Equation (3). The minimizer of this variational energy (3) satisfies an equation which is formally equivalent to the heat equation (Gelfand et al., 2000). Thus, taking these facts together, we see that a few steps of the minimization of the loss in (3) yields a model which approximately satisfies the heat equation, and corresponds to a model smoothed by Gaussian convolution. See Figure 3 for an illustration of a few steps of the training procedure. This result is summarized in the following theorem.

Theorem 2.

(Strauss, 2007) Let be a bounded function, , and . Then the following are equivalent:

  1. , the expected value of Gaussian averages of at .

  2. , the convolution of with the density of the distribution evaluated at .

  3. The solution of the heat equation,

    (6)

    at time , with initial condition .

In Appendix A, we use Theorem 2 to show the equivalence of training with noise and iteratively training (3).

To assess how well our model approximates the Gaussian average of the initial model, we compute the certified radius for averaged models introduced in Cohen et al. (2019). A larger radius implies a better approximation of the Gaussian average of the initial model. We compare our models with stochastically averaged models via certified accuracy. This is the fraction of the test set which a model correctly classifies at a given radius while ignoring abstained classifications. Throughout, we always use the same value for certification as for training. In conjunction with the certification technique of Cohen et al., we also provide the following theorem, which describes a bound based on the Lipschitz constant of a Gaussian averaged model. We refer to this bound as the -bound, which demonstrates the link between Gaussian averaging and adversarial robustness.

Theorem 3 (-bound).

Suppose is the convolution (average) of with a Gaussian kernel of variance ,

Then any perturbation which results in a change of rank of the -th component of must have norm bounded as follows:

(7)

where is the largest value in the vector .

See Appendix C for proof. This bound is equally applicable to deterministic or stochastically averaged models. In stochastically averaged models is replaced by the stochastic approximation of .

3.4 Adversarial Attacks

To test how robust our model is to adversarial examples, we calculate the minimum adversarial via our -bound and we attack our models using the projected gradient descent (PGD) (Kurakin et al., 2017, Madry et al., 2018) and decoupled direction and norm (DDN) (Rony et al., 2019) methods. These attacks are chosen because there is a specific way they can be applied to stochastically averaged models (Salman et al., 2019). In the setting of both attacks, it is standard to take the step

(8)

in the iterative algorithm. Here, is an input example with corresponding true class ; denotes the adversarial perturbation at its current iteration;

denotes the cross-entropy Loss function (or KL Divergence);

is the maximum perturbation allowed; and is the step-size. In the stochastically averaged model setting, the step is given by

(9)

where . For our deterministically averaged models, we implement the update (8). This is because our models are deterministic, meaning there is no need to sample noise at evaluation time. For stochastically averaged models (Cohen et al., 2019, Salman et al., 2019), we implement the update (9).

Model -bound PGD DDN
median mean median mean median mean
HeatSmoothing 0.240 0.190 2.7591 2.6255 1.0664 1.2261
SmoothAdv 0.160 0.160 3.5643 3.0244 1.1537 1.2850
RandomizedSmoothing 0.200 0.180 2.6787 2.5587 1.2114 1.3412
Undefended baseline - - 1.0313 1.2832 0.8573 0.9864
Table 3: adversarial distance metrics on ImageNet-1k

4 Experiments & Results

4.1 Comparison to Stochastic Methods

[width=]figs/imagenet_pgd.png

(a) ImageNet-1k top-5 PGD

[width=]figs/imagenet_ddn.png

(b) ImageNet-1k top-5 DDN
Figure 2: Attack curves: % of images successfully attacked as a function of adversarial distance.

We now execute our method on the ImageNet-1k dataset (Deng et al., 2009) with the ResNet-50 model architecture. The initial model was trained on clean images for 29 epochs with the cross-entropy loss function. Due to a lack of computing resources, we had to modify the training procedure (3) and Algorithm 1 to obtain our smoothed model . This new training procedure amounts to minimizing the loss function

(10)

for only 1 epoch of the training set using stochastic gradient descent at a fixed learning rate of 0.01 and with

. This is needed because the output vectors in the ImageNet setting are of length 1,000. Using softmax in the calculation of the distance metric prevents the metric from dominating the gradient-penalty term and the loss blowing up. Furthermore, we add noise to half of the training images.

We compare our results to a pretrained RandomizedSmoothing ResNet-50 model with provided by Cohen et al. (2019). We also compare to a pretrained SmoothAdv ResNet-50 model trained with 1 step of PGD and with a maximum perturbation of with provided by Salman et al. (2019). To assess certified accuracy, we run the Certify algorithm from Cohen et al. (2019) with for the stochastically trained models. We realize that this may not be an optimal number of noise samples, but it was the most our computational resources could handle. For the HeatSmoothing model, we run the same certification algorithm but without running SamplingUnderNoise to compute . For completeness, we also certify the baseline model . Certification results on 5,000 ImageNet test images are presented in Figure 0(b). We see that our model is indeed comparable to the stochastic methods presented in earlier paper, despite the fact that we only needed one training epoch. Next, we attack our four models using PGD and DDN. For the stochastic models, we do noise samples to compute the loss. We run both attacks with max perturbation of until top-5 misclassification is achieved or until 20 steps are reached. Results on 1,000 ImageNet test images are presented in Table 3 and Figures 1(a) and 1(b). We see that our model is comparable to the stochastic models, but does not outperform them. In Figure 1(a), it is clear that the model presented in Salman et al. (2019) performs best, since this model was trained on PGD-corrupted images. Note that CIFAR-10 results are presented in Appendix E.

4.2 Certifying Robust Models

So far, we have showed that we can take a non-robust baseline model and make it certifiably robust by retraining for one epoch with a regularized loss (10). A natural question arises: can we use this method to make robust models certifiably robust? To test this, we begin with an adversarially trained model (Madry et al., 2018). This pretrained model was downloaded from Madry’s GitHub repository and was trained with images corrupted by the PGD attack with maximum perturbation size . We certify this model by retraining it with (10) for one epoch using stochastic gradient descent with fixed learning rate 0.01. In Table 4, we compute the certified radius from Cohen et al. (2019) for these models using 1,000 ImageNet-1k test images with . The certified radii for the model trained with the loss function (10) are significantly higher than those of the adversarially trained model from Madry et al. (2018).

Model radius
median mean max.
Certified adversarially trained 0.4226 0.4193 0.6158
Adversarially trained 0.0790 0.1126 0.6158
Undefended baseline 0.0 0.1446 0.6158
Table 4: certified radii summary statistics for robust models on ImageNet-1k

5 Conclusion

Randomized smoothing is a well-known method to achieve a Gaussian average of some initial neural network. This is desirable to guarantee that a model’s predictions are unchanged given perturbed input data. In this work, we used a regularized loss to obtain deterministic Gaussian averaged models. By computing certified radii, we showed that our method is comparable to previously-known stochastic methods. This is confirmed by attacking our models, which results in adversarial distances similar to those seen with stochastically smoothed models. We also developed a new lower bound on perturbations necessary to throw off averaged models, and used it as a measure of model robustness. Lastly, our method is less computationally expensive in terms of inference time (see Table 2).

References

  • C. M. Bishop (1995) Training with noise is equivalent to Tikhonov regularization. Neural Computation 7 (1), pp. 108–116. External Links: Document, Link Cited by: §2, §3.1, Theorem 1.
  • C. Cheng, G. Nührenberg, and H. Ruess (2017) Maximum resilience of artificial neural networks. In Automated Technology for Verification and Analysis - 15th International Symposium, ATVA 2017, Pune, India, October 3-6, 2017, Proceedings, D. D’Souza and K. N. Kumar (Eds.), Lecture Notes in Computer Science, Vol. 10482, pp. 251–268. External Links: Document, Link Cited by: §2.
  • J. M. Cohen, E. Rosenfeld, and J. Z. Kolter (2019) Certified adversarial robustness via randomized smoothing. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, pp. 1310–1320. External Links: Link Cited by: Appendix E, Table 1, Table 2, §1, §1, §1, §1, §2, §3.3, §3.4, §3, §4.1, §4.2.
  • J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    ,
    pp. 248–255. Cited by: §1, §4.1.
  • A. Einstein (1906) On the theory of the Brownian movement. Ann. Phys 19 (4), pp. 371–381. Cited by: §3.3.
  • C. Finlay and A. M. Oberman (2019) Scaleable input gradient regularization for adversarial robustness. CoRR abs/1905.11468. External Links: 1905.11468, Link Cited by: Appendix C, §3.1, §3.2.
  • I. M. Gelfand, R. A. Silverman, et al. (2000) Calculus of variations. Courier Corporation. Cited by: §3.3.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: §2.
  • W. B. Johnson and J. Lindenstrauss (1984) Extensions of Lipschitz mappings into a Hilbert space. Contemporary mathematics 26 (189-206), pp. 1. Cited by: Appendix B, §3.2.
  • I. Karatzas and S. E. Shreve (1998) Brownian motion. In Brownian Motion and Stochastic Calculus, pp. 47–127. Cited by: §3.3.
  • A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Cited by: Appendix E.
  • A. Kurakin, I. J. Goodfellow, and S. Bengio (2017) Adversarial machine learning at scale. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, External Links: Link Cited by: §3.4.
  • Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §3.1.
  • M. Lécuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana (2019) Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019, pp. 656–672. External Links: Document, Link Cited by: §1, §2.
  • B. Li, C. Chen, W. Wang, and L. Carin (2018) Second-order adversarial attack and certifiable robustness. CoRR abs/1809.03113. External Links: 1809.03113, Link Cited by: §2.
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018)

    Towards deep learning models resistant to adversarial attacks

    .
    In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Link Cited by: Table 1, §1, §1, §2, §3.4, §4.2.
  • A. Raghunathan, J. Steinhardt, and P. Liang (2018) Certified defenses against adversarial examples. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Link Cited by: §1.
  • J. Rony, L. G. Hafemann, L. S. Oliveira, I. B. Ayed, R. Sabourin, and E. Granger (2019) Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 4322–4330. External Links: Document, Link Cited by: §3.4.
  • H. Salman, J. Li, I. P. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, and G. Yang (2019) Provably robust deep learning via adversarially trained smoothed classifiers. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett (Eds.), pp. 11289–11300. External Links: Link Cited by: Appendix C, Appendix E, Table 1, §1, §1, §1, §2, §3.4, §4.1.
  • W. A. Strauss (2007) Partial differential equations: an introduction. John Wiley & Sons. Cited by: Theorem 2.
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus (2014) Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: §1, §2.
  • S. S. Vempala (2005) The random projection method. Vol. 65, American Mathematical Soc.. Cited by: Appendix B, §3.2.
  • T. Weng, H. Zhang, H. Chen, Z. Song, C. Hsieh, L. Daniel, D. S. Boning, and I. S. Dhillon (2018a)

    Towards fast computation of certified robustness for relu networks

    .
    In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, J. G. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, pp. 5273–5282. External Links: Link Cited by: §2.
  • T. Weng, H. Zhang, P. Chen, J. Yi, D. Su, Y. Gao, C. Hsieh, and L. Daniel (2018b) Evaluating the robustness of neural networks: an extreme value theory approach. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Link Cited by: §2.
  • G. Yang, T. Duan, J. E. Hu, H. Salman, I. P. Razenshteyn, and J. Li (2020) Randomized smoothing of all shapes and sizes. CoRR abs/2002.08118. External Links: 2002.08118, Link Cited by: §2.
  • H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan (2019) Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573. Cited by: §1.
  • H. Zhang, T. Weng, P. Chen, C. Hsieh, and L. Daniel (2018) Efficient neural network robustness certification with general activation functions. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 4944–4953. External Links: Link Cited by: §2.

Appendix A Solving the heat equation by training with a regularized loss function

Theorem 2 tells us that training a model with added Gaussian noise is equivalent to training a model to solve the heat equation. We can discretize the heat equation (6) to obtain

(11)

for , where is the fixed number of timesteps, , and , our initial model. Notice how, using the Euler-Lagrange equation, we can express in (11) as the variational problem

(12)

where is the density from which our clean data comes form. Therefore, this is equivalent to solving

(13)

Note that the minimizer of the objective of (12) satisfies

(14)

which matches (11) if we set . In the derivation of (14), we take for granted the fact that empirically, is approximately uniform and is therefore constant. In the end, we iteratively compute (13) and obtain models , setting , our smoothed model.

Something to take note of is that our model outputs be vectors whose length corresponds to the total number of classes; therefore, the objective function in (13) will not be suitable for vector-valued outputs and . We instead use the following update

(15)

Appendix B Approximating the gradient-norm regularization term

By the Johnson-Lindenstrauss Lemma (Johnson and Lindenstrauss, 1984, Vempala, 2005), has the following approximation,

(16)

where

(17)

and is given by

(18)

In practice, we set , , and , the total number of classes.

Appendix C Proof of Theorem 3

Proof.

Suppose the loss function is Lipschitz continuous with respect to model input , with Lipschitz constant . Let be such that if , the model is always correct. Then by Proposition 2.2 in Finlay and Oberman (2019), a lower bound on the minimum magnitude of perturbation necessary to adversarially perturb an image is given by

(19)

By Lemma 1 of Appendix A in Salman et al. (2019), our averaged model

has Lipschitz constant . Replacing in (19) with and setting , gives us the proof. ∎

Appendix D Illustration of regularized training

[width=.5]figs/FigSmoothing2.png

Figure 3: Illustration of gradient regularization (4

) in the binary classification setting. The lighter line represents classification boundary for original model with large gradients, and the darker line represents classification boundary of the smoothed model. The symbols indicate the classification by the original model: a single red circle is very close to many blue stars. The smoothed model has a smoother classification boundary which flips the classification of the outlier.

[width=]figs/avingfig_init.png

(a)

[width=]figs/avingfig_step1.png

(b)

[width=]figs/avingfig_stepk.png

(c)
Figure 4: Illustration of performing the iterative model update (15) for 8 timesteps in the binary classification setting. The dashed black line represents our decision boundary. The blue line represents our current classification model. The blue stars and red circles represent our predicted classes using the current model iteration. Consider the datapoint at . In the initial model , the adversarial distance is . In model , the adversarial distance is increased to .

Appendix E Results on the CIFAR-10 dataset

[width=]figs/cifar10_pgd.png

(a) CIFAR-10 top-1 PGD

[width=]figs/cifar10_ddn.png

(b) CIFAR-10 top-1 DDN
Figure 5: CIFAR-10 attack curves: % of images successfully attacked as a function of adversarial distance.

We test our method on the CIFAR-10 dataset (Krizhevsky et al., 2009) with the ResNet-34 model architecture. The initial model was trained for 200 epochs with the cross-entropy loss function. Our smoothed model was computed by setting and running Algorithm 1 to minimize the loss (15) with for timesteps at 200 epochs each timestep. The training of our smoothed model took 5 times longer than the baseline model. We compare our results to a ResNet-34 model trained with noisy examples as stochastically averaged model using RandomizedSmoothing (Cohen et al., 2019). We also trained a SmoothAdv model (Salman et al., 2019) for 4 steps of PGD with the maximum perturbation set to . To assess certified accuracy, we run the Certify algorithm from Cohen et al. (2019) with for the stochastically trained models. For the HeatSmoothing model, we run the same certification algorithm, but without running SamplingUnderNoise to compute . For completeness, we also certify the baseline model . Certification plots are presented in Figure 0(a). In this plot, we see that our model’s certified accuracy outperforms the stochastic models. Next, we attack our four models using PGD and DDN. For the stochastic models, we do noise samples to compute the loss. We run both attacks with 20 steps and maximum perturbation to force top-1 misclassification. Results are presented in Table 5 and Figures 4(a) and 4(b). In Table 5, we see that HeatSmoothing outperforms the stochastic models in terms of robustness. The only exception is robustness to mean PGD perturbations. This is shown in Figures 4(a). Our model performs well up to an PGD perturbation of just above 1.0.

Model -bound PGD DDN
median mean median mean median mean
HeatSmoothing 0.094 0.085 0.7736 0.9023 0.5358 0.6361
SmoothAdv 0.090 0.078 0.7697 1.3241 0.4812 0.6208
RandomizedSmoothing 0.087 0.081 0.7425 1.2677 0.4546 0.5558
Undefended baseline - - 0.7088 0.8390 0.4911 0.5713
Table 5: adversarial distance metrics on CIFAR-10. A larger distance implies a more robust model.