Towards Assessment of Randomized Mechanisms for Certifying Adversarial Robustness

05/15/2020 ∙ by Tianhang Zheng, et al. ∙ 8

As a certified defensive technique, randomized smoothing has received considerable attention due to its scalability to large datasets and neural networks. However, several important questions remain unanswered, such as (i) whether the Gaussian mechanism is an appropriate option for certifying ℓ_2-norm robustness, and (ii) whether there is an appropriate randomized mechanism to certify ℓ_∞-norm robustness on high-dimensional datasets. To shed light on these questions, we introduce a generic framework that connects the existing frameworks to assess randomized mechanisms. Under our framework, we define the magnitude of the noise required by a mechanism to certify a certain extent of robustness as the metric for assessing the appropriateness of the mechanism. We also derive lower bounds on the metric as the criteria for assessment. Assessment of Gaussian and Exponential mechanisms is achieved by comparing the magnitude of noise needed by these mechanisms and the criteria, and we conclude that the Gaussian mechanism is an appropriate option to certify both ℓ_2-norm and ℓ_∞-norm robustness. The veracity of our framework is verified by evaluations on CIFAR10 and ImageNet.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The past decade has witnessed tremendous success of deep learning in handling various learning tasks like image classification

Krizhevsky et al. (2012)

, natural language processing

Cho et al. (2014), and game playing Silver et al. (2016). Nevertheless, a major unresolved issue of deep learning is its vulnerability to adversarial samples that are almost indistinguishable from natural samples to humans but can mislead deep neural networks (DNNs) to make wrong predictions with high confidence Szegedy et al. (2013); Goodfellow et al. (2014)

. This phenomenon, referred to as adversarial attack, is considered to be one of the biggest threats to the deployment of many deep learning systems. Thus, a great deal of effort has been devoted to developing defensive techniques against it. However, the majority of the existing defenses are of heuristic nature (

i.e., without any theoretical guarantees), implying that they may be ineffective against stronger attacks. Recent work He et al. (2017); Athalye et al. (2018); Uesato et al. (2018)

has confirmed this concern by showing that most of those heuristic defenses actually fail to defend strong adaptive attacks. This forces us to shift our attention to certifiable defenses as they can classify all the samples in a predefined neighborhood of the natural samples with a theoretically-guaranteed error bound. Among all the existing certifiable defensive techniques, randomized smoothing is becoming increasingly popular due to its scalability to large datasets and arbitrary networks.

Lecuyer et al. (2018) first relates adversarial robustness to differential privacy, and proves that adding noise is a certifiable defense against adversarial perturbation. Li et al. (2019)

connects adversarial robustness with the concept of Rényi divergence, and improves the estimate on the lower bounds of the robust radii. Recently,

Cohen et al. (2019) successfully certifies accuracy on the original ImageNet dataset under adversarial perturbations with norm less than .

Despite these successes, there are still several unanswered questions regarding randomized smoothing mechanisms. One of such questions is, why we should use the Gaussian mechanism for randomized smoothing to certify -norm robustness, or, is there any mechanism more appropriate than Gaussian mechanism? Another important question is regarding the ability of this method to certify -norm robustness. If randomized smoothing can be used to certify -norm robustness, what mechanism is an appropriate choice? All these questions motivate us to develop a framework to assess the appropriateness of a randomized smoothing mechanism for certifying -norm robustness.

In this paper, we take a promising step towards answering the above questions by proposing a generic and self-contained framework, which applies to different norms and connects the existing frameworks in Lecuyer et al. (2018); Li et al. (2019)

, for assessing randomized mechanisms. Our framework employs the Maximal Relative Rényi (MR) divergence as the probability distance measurement, and thus, the definition of robustness under this measurement is named as

robustness. Under our framework, we define the magnitude of the noise required by a mechanism to certify certain extent of robustness as the metric for assessing the appropriateness of the mechanism. To be specific, a more “appropriate” randomized mechanism under this definition refers to a mechanism that can certify certain extent of robustness with “less” amount of noise. Given this definition, it is natural to define the assessment criteria as the lower bounds on the magnitude of the noise required to certify robustness, in that we can judge whether a mechanism is an appropriate option based on the gap between the magnitude of noise needed by the mechanism and the lower bounds.

Inspired by the theories regarding the sample complexity of DP algorithms, we derive the lower bounds on the noise required for certifying -norm or -norm robustness. We demonstrate that the Gaussian mechanism is an appropriate option by showing the gap between the required Gaussian noise and the lower bounds is , where is the dimensionality of the data. This gap is small for datasets like CIFAR-10 and ImageNet. We also show that the Exponential mechanism is not an appropriate option since the gap scales in . All in all, our contribution is three-fold:

  • We present a generic and self-contained framework for the assessment of randomized smoothing mechanisms induced by a new definition of robustness, namely robustness, which connects the existing frameworks such as Lecuyer et al. (2018) and Li et al. (2019).

  • We define a metric for assessing randomized mechanisms, i.e., the magnitude of the noise required to certify robustness, and we derive the lower bounds on the magnitude of the noise required to certify -norm and -norm robustness as the criteria for the assessment.

  • We assess the Gaussian mechanism and the Exponential mechanism based on the metric and the lower bounds (criteria). Specifically, we compare the magnitude of the noise used in the Gaussian and Exponential mechanism with our lower bounds to justify that the Gaussian mechanism is an appropriate option for certifying both -norm and -norm robustness.

2 Related Work

There are three approaches to certify adversarial robustness standing out recently. The first approach formulates the task of adversarial verification as a non-convex optimization problem and solves it by tools like convex relaxations and duality Dvijotham et al. (2018); Raghunathan et al. (2018); Wong and Kolter (2018). Given a convex set (usually an ball) as input, the second approach maintains a convex outer approximation of all the possible outputs at each layer by various techniques, such as interval bound propagation, hybrid zonotope, abstract interpretations, and etc. Mirman et al. (2018); Wang et al. (2018); Gowal et al. (2018); Zhang et al. (2019); Balunovic and Vechev (2020). The third approach uses randomized smoothing to certify robustness, which is the main focus of this paper. Randomized smoothing for certifying robustness becomes increasingly popular due to its strong scalability to large datasets and arbitrary networks Lecuyer et al. (2018); Li et al. (2018); Cohen et al. (2019); Dvijotham et al. (2018); Salman et al. (2019). For this approach, Lecuyer et al. (2018) first proves that randomized smoothing can certify the and -norm robustness using the differential privacy theory. Li et al. (2018) derives a tighter lower bound on the -norm robust radius based on a lemma on Rényi divergence. Cohen et al. (2019) further obtains a tight guarantee on the -norm robustness using the Neyman-Pearson lemma. Dvijotham et al. (2020) proposes a new framework based on f-divergence that applies to different measures. Salman et al. (2019) combines Cohen et al. (2019) with adversarial training, and Jia et al. (2019) extends the method in Cohen et al. (2019) to the top-k classification setting. We note that the method in Cohen et al. (2019) is only applicable to the Gaussian mechanism in the -norm case since it requires isotropy, and the frameworks proposed in Lecuyer et al. (2018); Li et al. (2019) are more general. In the following, we briefly review the basic definitions and theorems in the frameworks of Lecuyer et al. (2018); Li et al. (2019), which helps us demonstrate the connections between our framework and those two frameworks.

Our review begins with several definitions and notations. In general, we denote any randomized mechanism by

, which outputs a random variable depending on the input. We represent any deterministic classifier that outputs a prediction label by

. A commonly-used randomized classifier can be constructed by . We denote a data sample and its ground-truth label by and , respectively. A -norm ball centered at with radius is represented by . We say a data sample is in the iff . Next, we can detail the frameworks in Lecuyer et al. (2018) and Li et al. (2019), i.e., PixelDP and Rényi-Divergence-based Bound.

PixelDP

PixelDP Lecuyer et al. (2018) is the first framework to prove that randomized smoothing is a certified defense by connecting the concepts of adversarial robustness and differential privacy. The definition of adversarial robustness in the framework of PixelDP can be stated as follows:

Definition 1 (PixelDP Lecuyer et al. (2018))

For any , and in the , if a randomized mechanism satisfies

(1)

where denotes the output space of . Then we can say is -PixelDP.

Lecuyer et al. (2018) connects PixelDP with adversarial robustness by the following lemma.

Lemma 1 (Robustness Condition Lecuyer et al. (2018))

Suppose is randomized K-class classifier defined by that satisfies -PixelDP (in ). For the ground-truth class , if

(2)

then the classification result is robust in , i.e., , .

Note that the definition of the randomized classifier is a little different from the definition of since the output of

is a scalar not a vector (prediction label).

is more popular in the follow-up works such as Li et al. (2019); Cohen et al. (2019). Lecuyer et al. (2018) mainly utilizes two mechanisms, i.e., Laplace mechanism and Gaussian mechanism, to guarantee PixelDP. Specifically, adding Laplace noise (i.e., ) to the data samples can certify -PixelDP in for any , and adding Gaussian noise (i.e., ) can certify -PixelDP in for any .

Rényi Divergence-based Bound

Li et al. (2019) proves a tighter estimate (compared with Lecuyer et al. (2018)) on the lower bound of robust radii based on the following lemma.

Lemma 2 (Rényi Divergence Lemma Li et al. (2019))

Let and be two multi-nomial distributions. If the indices of the largest probabilities do not match on and , then the Rényi divergence between and satisfies

where and refer to the largest and the second largest probabilities in , respectively.

If the Gaussian mechanism is applied to certify -norm robustness, the estimate on the lower bound of the robust radii can be given by the following lemma.

Lemma 3

Let be any deterministic classifier and be its corresponding randomized classifier for samples , where . Then , , i.e., is robust in , and the robust radii that could be certified is given by

(3)

and refer to the largest and the second largest probabilities in , where is the probability that returns the -th class, i.e., .

3 Overview of Our Framework

In this section, we present a generic framework based on the Definition 2, 3, and 4, for assessing randomized mechanisms. According to Definition 3, our framework applies to all -norms (actually all measures). Moreover, we show that our proposed framework connects the existing general frameworks in Lecuyer et al. (2018); Li et al. (2019) by Theorem 3.1 & 3.2. Also, we note that it is difficult to involve the framework in Cohen et al. (2019) since Cohen et al. (2019) requires the additive noise to be isotropic like the Gaussian noise.

3.1 Main Definitions

Under our framework, the definition of adversarial robustness is induced by maximal relative Rényi divergence (MR divergence), namely robustness, so we start from introducing the definition of MR divergence.

Definition 2 (Maximal Relative Rényi Divergence)

The Maximal Relative Rényi Divergence of distributions and is defined as

(4)

Using as the probability measure, we can define adversarial robustness as follows:

Definition 3 ( Robustness)

A randomized smoothing mechanism is a -robust mechanism if

(5)

If a randomized smoothing classifier satisfies the above equation, we can say it is a -robust classifier or it certifies -robustness.

A property of robustness we use throughout this paper is its postprocessing property, which can be stated as follows:

Corollary 1 (Postprocessing Property)

Let be a randomized classifier, where is any deterministic function (classifier). is -robust if is -robust.

This postprocessing property can be easily proved by Van Erven and Harremos (2014). This property allows us to only concentrate on the randomized smoothing mechanism without considering the specific form of the deterministic classifier , and therefore makes the framework applicable to an arbitrary neural network.

3.2 Connections between robustness and the existing frameworks

The framework defined by Definition 2 & 3 is generic since it is closely connected with the existing ones Lecuyer et al. (2018); Li et al. (2019). Here we demonstrate the connections by the following two theorems.

Theorem 3.1 ( Robustness & PixelDP)

If is -robust, then is also -PixelDP.

We note that the opposite of Theorem 3.1 holds only when , which indicates our framework is a relaxed version of the PixelDP framework. But this should not be a surprise since most of the following frameworks Li et al. (2019); Cohen et al. (2019); Dvijotham et al. (2020) can somehow be considered more relaxed than the PixelDP framework and thus yield tighter certified bounds. Similarly, our framework can provide the same bound on the robust radius as in Li et al. (2019), which is tighter than the bound in Lecuyer et al. (2018) (Theorem 3.2).

Theorem 3.2 ( Robustness & Rényi Divergence-based Bound)

If a randomized classifier is -robust, then , as long as

(6)

where and also refer to the largest and the second largest probabilities in , and is the probability that returns the -th class, i.e., . Based on the above theorem, we can derive the same and robust radius as in Lemma 3 Li et al. (2019). We will detail how to derive the robust radius in Section 4.

All the omitted proof is provided in the Appendix. An interpretation of Theorem 3.1 and 3.2 is that, as long as we can use a randomized mechanism with a certain amount of noise to certify robustness, we can use the same mechanism with the same amount of noise to certify PixelDP and the Rényi Divergence-based Bound. Thus, Theorem 3.1 and 3.2 indicate the assessment results based on the metric defined in section 3.3 is very likely to generalize to the other frameworks.

3.3 Assessment of Randomized Mechanisms

Since there are infinite randomized mechanisms, the natural problem is to determine whether a certain randomized mechanism is an appropriate option to certify adversarial robustness. However, we note that all the previous work Li et al. (2019); Cohen et al. (2019); Salman et al. (2019) overlook this problem and assume the Gaussian mechanism to be an appropriate mechanism for certifying -norm robustness without sufficient assessment. While in this paper, we attempt to provide a solution to this problem under our proposed framework. Specifically, we define a metric to assess randomized mechanisms as follows:

Definition 4

Specify a -norm, a robust radius , and an epsilon , the magnitude (-norm) of the noise required by a randomized mechanism to certify -robustness is defined as the metric to assess the appropriateness of .

We define this metric for assessing randomized mechanisms because the accuracy of neural networks tends to decrease as the magnitude of the noise added to the inputs increases. Note that if the magnitude of the noise required for a randomized classifier is too large, the accuracy of its predictions on clean samples might be very low, then robustness will be useless. Given the above metric, we also need criteria to assess the (relative) appropriateness of a randomized mechanism. In this paper, we employ the lower bounds on the magnitude of the noise required to certify certain -robustness as the criteria. In the following, we will provide the lower bounds for (-norm) and (-norm), i.e., the two most popular norms, and assess the appropriateness of the Gaussian and Exponential mechanisms for certifying -norm and -norm robustness.

4 Assessing Mechanisms for Certifying -norm Robustness

In this section, we first elaborate on how the Gaussian mechanism certifies robustness, and then provide the lower bound on the magnitude of the noise required by all the randomized mechanisms to certify -norm robustness. By comparing the magnitude of the noise required by the Gaussian mechanism with the lower bound, we conclude that the Gaussian mechanism is an appropriate option to certify -norm robustness.

Theorem 4.1 (Gaussian Mechanism for Certifying -norm robustness)

Let be any deterministic classifier and be its corresponding randomized classifier for samples , where . Then, is -robust.

According to Theorem 3.2, if we substitute with , can be given by which is same as the bound in Li et al. (2019) (Lemma 3). To provide a criterion for assessment of the Gaussian mechanism, we prove a lower bound on the magnitude of the noise required by any randomized smoothing mechanism to ensure that (as well as ) is -robust. If the magnitude of Gaussian noise is close to the lower bound, then the Gaussian mechanism is considered as an appropriate option. The lower bound is given by the following theorem.

Theorem 4.2 (-norm Criterion for Assessment)

For any , if there is a -robust randomized smoothing mechanism such that

(7)

for some , then it must be true that . In another word, is the lower bound of the expected magnitude of the random noise, i.e., the criterion.

Theorem 4.2 indicates that the expected magnitude of the additive noise should be at least to certify -robustness. For the Gaussian mechanism, the expected magnitude is according to Orabona and Pál (2015), which is to guarantee -robustness, according to Theorem 4.1. This means that the gap between the Gaussian mechanism and the optimal mechanism is bounded by .

Remark 1

We say Gaussian mechanism is an appropriate option because is small for most commonly-used datasets. For instance, for CIFAR-10 (), , and for ImageNet (), .

Equivalently, if we fix the expected -norm of the added noise as , the largest radius that can be certified by any -robust randomized smoothing mechanisms is upper bounded by , and the robust radius certified by Gaussian mechanism is . The upper bound can be simply derived by transforming in Theorem 4.2. For Gaussian mechanism, since and according to Theorem 4.1, .

5 Assessing Mechanisms for Certifying -norm Robustness

In this section, we first discuss the possibility of using the Exponential mechanism, an analogue of the Gaussian mechanism in the -norm case, to certify -norm robustness. Then, we prove the lower bound on the magnitude of the noise required by all the randomized mechanisms to certify -norm robustness. By comparing the magnitude of the noise required by the Exponential mechanism with the lower bound, we conclude that the Exponential mechanism is not an appropriate option to certify -norm robustness. Surprisingly, we find that the Gaussian mechanism is a more appropriate option than the Exponential mechanism to certify -norm robustness.

We first recall the form of the density function of Gaussian noise: . Based on this, we conjecture that, to certify -norm robustness, we can sample the noise using the Exponential mechanism, an analogue of the Gaussian mechanism in the -norm case:

(8)

We show in the following theorem that randomized smoothing using the Exponential mechanism can certify -robustness, which is seemingly an extension of the -norm case. However, the certified radius is

, which implies that it is unscalable to high-dimensional data,

i.e., The Exponential mechanism should not be an appropriate mechanism to certify -norm robustness. This conclusion is further verified by our assessment method, which will be detailed later.

Theorem 5.1 (Exponential Mechanism for Certifying -norm Robustness)

Let be any deterministic classifier and be its corresponding randomized classifier for samples , where the noise is sampled from Exponential mechanism. Then, is -robust and also -robust.

According to Theorem 3.2, if we substitute with or , then can be given by or Comparing this result and Theorem 4.1, we can see that randomized smoothing via the Exponential mechanism certifies a region with (almost) the same radius as that certified by the Gaussian mechanism in the -norm case, indicating similarity in their robustness guarantees. However, the following corollary shows that the magnitude of the noise required by the Exponential mechanism is much larger than that of the Gaussian mechanism in the -norm case.

Corollary 2

For the Exponential mechanism that can guarantee Theorem 5.1, the following holds

(9)

Corollary 2 indicates that the Exponential mechanism requires noise with magnitude to certify -norm robustness, which should be far from an appropriate option for high-dimensional datasets like ImageNet (). The following theorem further verifies that there is indeed a huge gap between the noise required by the Exponential mechanism and the lower bound.

Theorem 5.2 (-norm Criterion for Assessment)

For any -robust mechanism that satisfies

it must be true that . In another word, is the lower bound of the expected magnitude of the required noise, i.e., the criterion.

From Corollary 2 and Theorem 5.2, we can see that the gap between the noise required by the Exponential mechanism and the lower bound is , which can be very large for high-dimensional datasets. Therefore, we can conclude that the Exponential mechanism is not an appropriate mechanism for certifying -norm robustness. Surprisingly, the following theorem shows that the Gaussian mechanism is an appropriate choice for certifying -robustness.

Theorem 5.3 (Gaussian Mechanism for Certifying -norm robustness)

Let be some fixed number and with . Then, is -robust, and is upper bounded by .

From Theorem 5.2 and 5.3, we can see that the gap between the magnitude of the noise required by the Gaussian mechanism to certify -robustness and the lower bound is also . Thus, we can say the Gaussian mechanism is a more appropriate option to certify -norm robustness (see Remark 1).

6 Experiments

Datasets and Models

Our theories and analysis are verified on two widely-used datasets, i.e., CIFAR10 and ImageNet***Pixel value range is . We use a 110-layer residual network and the classical ResNet-50 as the base models for CIFAR10 and ImageNet, respectively. Note that it is difficult for the models to classify noisy images without seeing any noisy samples in the training stage. Thus, we train all the models by adding appropriate Gaussian noise on the training images. The certified accuracy for radius is defined as the fraction of the test set whose certified radii are larger than and predictions are correct. We provide more details about the numerical methods in the supplementary material. We note that the lower bounds are not verifiable by experiments since we are still not sure if there exist any mechanism that can achieve those lower bound. So in the experiments, we only verify the theoretical results regarding the Gaussian mechanism and the Exponential mechanism in this paper.

Empirical Results

In the following, we verify our framework by comparing our theoretical results on the scales of the robust radii with the radii at which the Gaussian/Exponential mechanism can certify accuracy. Note that in the previous literature, model accuracy on adversarial examples is considered as a fairly good performance Madry et al. (2017); Cohen et al. (2019). Besides, selecting another accuracy, e.g., from , does not affect the verification results too much because what our theoretical results characterize are the asymptotic behaviors not the exact values of the robust radii. In Fig. 1 & 2, we demonstrate the certified results of the Gaussian mechanism. In the -norm case, Fig. 1 shows that the Gaussian mechanism can certify approximately accuracy at (CIFAR-10, ) and (ImageNet, ), i.e., approximately , which verifies that the robust radius certified by Gaussian mechanism is , and .. We note that, is the scale of the largest certified radius (i.e., ) achieved by the previous literature (). In the -norm case, Fig. 2 shows that the Gaussian mechanism certifies approximately accuracy at (CIFAR-10, ) and (ImageNet, d=), i.e., approximately , which verifies that the radius certified by the Gaussian mechanism is . It is worth noting that the performance of the Gaussian mechanism can be better with the bound proved in Cohen et al. (2019), which is even comparable to if not better than the other approaches introduced in Section 2.

Figure 1: Certified accuracy for CIFAR-10 (top) and ImageNet (bottom) in norm. The Gaussian mechanism with different is used to certify robustness.
Figure 2: Certified accuracy (lower bound) for CIFAR-10 (top) and ImageNet (bottom) in norm. The Gaussian mechanism with different is used to certify robustness.

In Fig. 3, we demonstrate the performance of the Exponential mechanism. As shown in Fig. 3, the Exponential mechanism certifies approximately accuracy at (CIFAR-10, d=) and (ImageNet), i.e., approximately , which verifies that the robust radius certified by the Exponential mechanism is . Also, comparing Fig. 2 and Fig. 3, we can see that the Gaussian mechanism is obviously more appropriate than the Exponential mechanism for certifying -norm robustness.

Figure 3: Certified accuracy for CIFAR-10 (top) and ImageNet (bottom) in norm. The Exponential mechanism is used to certify robustness. Note that since the dimensionality of CIFAR-10 is different from that of ImageNet, we apply different to CIFAR-10 and ImageNet, according to Theorem 2.

7 Conclusion

In this paper, we present a generic and self-contained framework, which connects the existing frameworks such as Lecuyer et al. (2018); Li et al. (2019), for assessing randomized mechanisms. Under our framework, we define the magnitude of the noise required by a randomized mechanism to certify a certain extent of robustness as the metric for assessing this mechanism. We also provide the lower bounds on the magnitudes of the required noise as the assessment criteria. Comparing the noise required by the Gaussian and Exponential mechanism and the criteria, we conclude that (i) The Gaussian mechanism is an appropriate option to certify -norm and -norm robustness. (ii) The Exponential mechanism is not an appropriate mechanism to certify -norm robustness, although it seems an analogue of the Gaussian mechanism in the -norm case.

References

  • [1] A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420. Cited by: §1.
  • [2] M. Balunovic and M. Vechev (2020) Adversarial training and provable defenses: bridging the gap. In International Conference on Learning Representations, Cited by: §2.
  • [3] M. Bun and T. Steinke (2016) Concentrated differential privacy: simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pp. 635–658. Cited by: Lemma 6.
  • [4] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Cited by: §1.
  • [5] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter (2019) Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918. Cited by: §1, §2, §2, §3.2, §3.3, §3, §6.
  • [6] K. Dvijotham, J. Hayes, B. Balle, Z. Kolter, C. Qin, A. Gyorgy, K. Xiao, S. Gowal, and P. Kohli (2020) A framework for robustness certification of smoothed classifiers using f-divergences. In International Conference on Learning Representations, Cited by: §2, §3.2.
  • [7] K. Dvijotham, R. Stanforth, S. Gowal, T. A. Mann, and P. Kohli (2018) A dual approach to scalable verification of deep networks.. In UAI, pp. 550–559. Cited by: §2.
  • [8] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §1.
  • [9] S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, T. Mann, and P. Kohli (2018) On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715. Cited by: §2.
  • [10] W. He, J. Wei, X. Chen, N. Carlini, and D. Song (2017) Adversarial example defense: ensembles of weak defenses are not strong. In 11th USENIX Workshop on Offensive Technologies (WOOT 17), Cited by: §1.
  • [11] J. Jia, X. Cao, B. Wang, and N. Z. Gong (2019) Certified robustness for top-k predictions against adversarial perturbations via randomized smoothing. arXiv preprint arXiv:1912.09899. Cited by: §2.
  • [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012)

    Imagenet classification with deep convolutional neural networks

    .
    In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
  • [13] M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana (2018) Certified robustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471. Cited by: Towards Assessment of Randomized Mechanisms for Certifying Adversarial Robustness, 1st item, §1, §1, §2, §2, §2, §2, §2, §2, §3.2, §3.2, §3, §7, Definition 1, Lemma 1.
  • [14] B. Li, C. Chen, W. Wang, and L. Carin (2018) Second-order adversarial attack and certifiable robustness. arXiv preprint arXiv:1809.03113. Cited by: §2.
  • [15] B. Li, C. Chen, W. Wang, and L. Carin (2019) Certified adversarial robustness with additive noise. In Advances in Neural Information Processing Systems, pp. 9459–9469. Cited by: Towards Assessment of Randomized Mechanisms for Certifying Adversarial Robustness, 1st item, §1, §1, §2, §2, §2, §2, §3.2, §3.2, §3.2, §3.3, §3, §4, §7, Lemma 2.
  • [16] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §6.
  • [17] M. Mirman, T. Gehr, and M. Vechev (2018) Differentiable abstract interpretation for provably robust neural networks. In

    International Conference on Machine Learning

    ,
    pp. 3575–3583. Cited by: §2.
  • [18] I. Mironov (2017) Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263–275. Cited by: Lemma 4.
  • [19] F. Orabona and D. Pál (2015) Optimal non-asymptotic lower bound on the minimax regret of learning with expert advice. arXiv preprint arXiv:1511.02176. Cited by: Appendix A, §4.
  • [20] A. Raghunathan, J. Steinhardt, and P. Liang (2018) Certified defenses against adversarial examples. arXiv preprint arXiv:1801.09344. Cited by: §2.
  • [21] H. Salman, G. Yang, J. Li, P. Zhang, H. Zhang, I. Razenshteyn, and S. Bubeck (2019) Provably robust deep learning via adversarially trained smoothed classifiers. arXiv preprint arXiv:1906.04584. Cited by: §2, §3.3.
  • [22] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. (2016) Mastering the game of go with deep neural networks and tree search. nature 529 (7587), pp. 484–489. Cited by: §1.
  • [23] T. Steinke and J. Ullman (2015) Between pure and approximate differential privacy. arXiv preprint arXiv:1501.06095. Cited by: Appendix B.
  • [24] T. Steinke and J. Ullman (2016) Between pure and approximate differential privacy. Journal of Privacy and Confidentiality 7 (2). Cited by: Lemma 5.
  • [25] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1.
  • [26] J. Uesato, B. O’Donoghue, P. Kohli, and A. Oord (2018) Adversarial risk and the dangers of evaluating against weak attacks. In International Conference on Machine Learning, pp. 5032–5041. Cited by: §1.
  • [27] T. Van Erven and P. Harremos (2014)

    Rényi divergence and kullback-leibler divergence

    .
    IEEE Transactions on Information Theory 60 (7), pp. 3797–3820. Cited by: §3.1.
  • [28] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana (2018) Efficient formal safety analysis of neural networks. In Advances in Neural Information Processing Systems, pp. 6367–6377. Cited by: §2.
  • [29] E. Wong and Z. Kolter (2018) Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, pp. 5283–5292. Cited by: §2.
  • [30] H. Zhang, H. Chen, C. Xiao, B. Li, D. Boning, and C. Hsieh (2019) Towards stable and efficient training of verifiably robust neural networks. arXiv preprint arXiv:1906.06316. Cited by: §2.

Appendix A Omitted Proof

Proof [Proof of Theorem 3.1] According to Definition 3, we have . Therefore, satisfies -RDP. According to the following lemma, i.e.,

Lemma 4 ([18])

If a randomized mechanism is -RDP, then it is -DP,

is -DP, for all . Since , is Pixel-DP (DP).  

Proof [Proof of Theorem 3.2] Lemma 2 indicates that as long as . Since is -robust, . Thus, the above condition can be restated as  

Proof [Proof of Theorem 4.2] We consider a special case: let . Since for all , , is -robust on . According to Theorem 3.1, is also -PixelDP (DP).

Let us first connect the lower bound of one-way marginal (i.e., mean estimation) with the lower bound studied in Theorem 4.2. Suppose an -size dataset , the one-way marginal is , where is the -th row of . In particular, when , one-way marginal is just the data point itself, and thus, the condition in Theorem 4.2 can be rewritten as

(10)

Based on this connection, we first prove the case where , and then generalize it to any . For , the conclusion reduces to . To prove this, we employ the following lemma, which provides a one-way margin estimation for all DP mechanisms.

Lemma 5 (Theorem 1.1 in [24])

For every , every and every , if is -DP and , then

Setting in Lemma 5, we can see that if , then , where the last inequality is due to the fact that , since . Therefore, we have the following theorem,

Theorem A.1

If a -robust randomized smoothing mechanism satisfies that for all

(11)

then .

Now we come back to the proof for any . If satisfies , then we have . Since is -PixelDP (DP) on , is -PixelDP (DP) on . By Theorem A.1 with , we have

(12)

 

Proof [Proof of Theorem 5.1] We first prove that for all . Since , for any ,

Since , , is -robust. Also, based on the following lemma,

Lemma 6 ([3])

Let and

be two probability distributions satisfying

and . Then, ,

we have , i.e., is -robust.  

Proof [Proof of Corollary 2] Define the distribution on to be , meaning for , where is defined in Eq. 8

. The probability density function of

is given by

which is obtained by integrating the probability density function in Eq. 8 over the infinity ball of radius with surface area .

is the Gamma distribution with shape

and mean , and thus .  

Proof [Proof of Theorem 5.2] The proof is almost the same as that of Theorem 4.2. Assume that we have a set of data points . Since , is -robust on . According to Theorem 3.1, is also -PixelDP (DP). Thus, if

then

This means that is -PixelDP (DP). Then, if we set in Lemma 5, we can get by a similar proof as that of Theorem 4.2.  

Proof [Proof of Theorem 5.3] Therefore, with is -robust. The bound of can be easily proved by substituting in ([19]) with .  

Appendix B Additional Details

We first detail the numerical method for the experiments in the following. The certification algorithm is first detailed in Alg. 1.

0:  Input , a classifier , parameter

, number of samples for estimating confidence interval

.
  Sample samples from the Gaussian/Exponential mechanism
  Output , and estimate the distribution of , i.e.,
  if Choose the Gaussian mechanism then
     Compute the robust radius by
  else if Choose the Exponential mechanism then