A Stratified Approach to Robustness for Randomly Smoothed Classifiers

06/12/2019 ∙ by Guang-He Lee, et al. ∙ MIT ibm 1

Strong theoretical guarantees of robustness can be given for ensembles of classifiers generated by input randomization. Specifically, an ℓ_2 bounded adversary cannot alter the ensemble prediction generated by an isotropic Gaussian perturbation, where the radius for the adversary depends on both the variance of the perturbation as well as the ensemble margin at the point of interest. We build on and considerably expand this work across broad classes of perturbations. In particular, we offer guarantees and develop algorithms for the discrete case where the adversary is ℓ_0 bounded. Moreover, we exemplify how the guarantees can be tightened with specific assumptions about the function class of the classifier such as a decision tree. We empirically illustrate these results with and without functional restrictions across image and molecule datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many powerful classifiers lack robustness in the sense that a slight, potentially unnoticeable manipulation of the input features, e.g., by an adversary, can cause the classifier to change its prediction goodfellow14 . The effect is clearly undesirable in decision critical applications. Indeed, a lot of recent work has gone into analyzing such failures together with providing certificates of robustness.

Robustness can be defined with respect to a variety of metrics that bound the magnitude or the type of adversarial manipulation. The most common approach to searching for violations is by finding an adversarial example within a small neighborhood of the example in question, e.g., using gradient-based algorithms finlay2019logbarrier ; goodfellow14 ; madry2017towards . The downside of such approaches is that failure to discover an adversarial example does not mean that another technique could not find one. For this reason, a recent line of work has instead focused on certificates of robustness, i.e., guarantees that ensure, for specific classes of methods, that no adversarial examples exist within a certified region. Unfortunately, obtaining exact guarantees can be computationally intractable katz2017reluplex ; lomuscio2017approach ; tjeng2017evaluating , and guarantees that scale to realistic architectures have remained somewhat conservative croce2018provable ; mirman2018differentiable ; weng2018towards ; wong2017provable ; zhang2018efficient .

Ensemble classifiers have recently been shown to yield strong guarantees of robustness cohen2019certified . The ensembles, in this case, are simply induced from randomly perturbing the input to a base classifier. The guarantees state that, given an isotropic Gaussian perturbation of the input example, an adversary cannot alter the prediction of the corresponding ensemble within an radius, where the radius depends on the noise variance as well as the ensemble margin at the given point cohen2019certified .

In this work, we substantially extend robustness certificates for such noise-induced ensembles. We provide guarantees for alternative metrics and noise distributions (e.g., uniform), develop a stratified likelihood ratio analysis that allows us to provide tight certificates over discrete spaces with respect to distance. We also introduce scalable algorithms for computing the certificates. The guarantees can be further tightened by introducing additional assumptions about the family of classifiers. We illustrate this in the context of ensembles derived from decision trees. Empirically, our ensemble classifiers yield stronger certified guarantees with respect to bounded adversaries across image and molecule datasets in comparison to the previous methods adapted from continuous spaces.

2 Related Work

In a classification setting, the role of robust certificates is to guarantee a constant classification within a local region; a certificate is always sufficient to claim robustness. When a certificate is both sufficient and necessary, it is called an exact certificate. For example, the exact certificate of a linear classifier is the distance between the classifier and a given point. Below we focus the discussions on the recent development of robustness guarantees for deep networks.

Most of the exact methods are derived on piecewise linear networks, defined as any network architectures with piecewise linear activation functions. Such class of networks has a mix integer-linear representation 

lee2018towards

, which allows the usage of mix integer-linear programming 

cheng2017maximum ; dutta2018output ; fischetti2018deep ; lomuscio2017approach ; tjeng2017evaluating or satisfiability modulo theories carlini2017provably ; ehlers2017formal ; katz2017reluplex ; scheibler2015towards to find the exact adversary under an radius. However, the exact method is in general NP-complete, and thus does not scale to large problems tjeng2017evaluating .

A certificate that only holds a sufficient condition is conservative but can be more scalable than exact methods. Such guarantees may be derived as a linear program wong2017provable ; wong2018scaling , a semidefinite program raghunathan2018certified ; raghunathan2018semidefinite , or a dual optimization problem dvijotham2018training ; dvijotham2018dual

through relaxation. Alternative approaches conduct layer-wise relaxations of feasible neuron values to derive the certificates 

gowal2018effectiveness ; mirman2018differentiable ; singh2018fast ; weng2018towards ; zhang2018efficient . Unfortunately, there is no empirical evidence of an effective certificate from the above methods in large scale problems. This does not entail that the certificates are not tight enough in practice; it might also be attributed to the fact that it is challenging to obtain a robust network in a large scale setting.

Recent works propose a new modeling scheme that ensembles a classifier by input randomization cao2017mitigating ; liu2018towards , mostly done via an isotropic Gaussian perturbation. Lecuyer et al. lecuyer2018certified first propose a certificate based on differential privacy, which is improved by Li et al. li2018second using Rnyi divergence. Cohen et al. cohen2019certified proceed with the analysis by proving the tight certificate with respect to all the measurable classifiers based on the Neyman-Pearson Lemma neyman1933ix , which yields the state-of-the-art provably robust classifier. However, the tight certificate is tailored to an isotropic Gaussian perturbation and robustness, while we generalize the result across broad classes of perturbations and metrics. In addition, we show that such tight guarantee can be tightened with assumptions about the classifier.

3 Robustness Certificates of Randomly Smoothed Classifiers

Given an input , a random perturbation assigns a probability mass/density for each perturbed outcome . We can define a probabilistic classifier either by specifying the associated conditional distribution for a class or by viewing it as a random function where the randomness in the output is independent for each . We compose the perturbation with a classifier to get a randomly smoothed classifier , where the probability for outputting a class is denoted as .

Under this setting, we develop tight robustness guarantees of the probability

in this section, exemplify the framework in §4, and illustrate how the guarantees can be refined with further assumption in §4.4. We defer all the proofs to Appendix A.

3.1 Point-wise Robustness Certificates

Given a classifier smoothed by a perturbation , we abbreviate the probability score for an input and a label as , whenever and is clear from the context. Given , we first identify a tight lower bound on the probability score for another (neighboring) point . Here we denote the set of measurable classifiers with respect to as . Without any additional assumptions on , the tight bound can be found by the minimization problem:

(1)

Intuitively, we can cast the optimization problem (1) as an assignment problem that initialize , and then assign for some until the constraint is met. Then a greedy strategy that assigns in the order of decreasing likelihood ratio will achieve the minimum since the objective and the constraint can be rewritten as the measure over the input space where with respect to the corresponding perturbations and .

However, the above argument implicitly assumes that is countable and the optimal is a deterministic classifier. Below we formally follow the idea to solve Eq. (1) without these assumptions. For each point , we define the likelihood ratio .111If , can be defined arbitrarily in without affecting the solution in Lemma 1. If we can partition into regions for some , such that the likelihood ratio within each region is a constant : , then we can sort the regions such that . Note that can still be uncountable (see the example in §4.1). Then,

Lemma 1.

, , let then , any satisfying Eq. (2) is a minimizer of Eq. (1),

(2)

and

The minimization problem can be interpreted as a likelihood ratio testing neyman1933ix , by casting and as likelihoods for two hypothesis with the significance level . Cohen et al. cohen2019certified exploits the Neyman-Pearson lemma neyman1933ix to construct the bound for an isotropic Gaussian perturbation, but we show in Remark 2 that our result is necessary to solve Eq. (1) in some cases.

Remark 2.

The standard Neyman-Pearson Lemma neyman1933ix does not always suffice to solve Eq. (1) since it can only assign a deterministic prediction across the entire region with the same likelihood ratio. Assuming a countable as in the extension to randomized tests tocher1950extension also does not always suffice.

Remark 3.

is an increasing continuous function of ; if , is a strictly increasing continuous function of ; if and , is a bijection.

Remark 3 will be used in §4.2 to derive an efficient algorithm to compute robustness certificates.

3.2 Regional Robustness Certificates

We can extend the point-wise certificate to a regional certificate by examining the worst case over the neighboring region around . Formally, given an metric , the neighborhood around with radius is defined as . Assuming for a , a robustness certificate on the radius can be found by

(3)

Note that here we assume and ignore the case that . By definition, the radius is tight for binary classification, and provides a reasonable sufficient condition to guarantee robustness for . The tight guarantee for will involve the maximum prediction probability over all the remaining classes (see Theorem 1 of cohen2019certified ). However, when the prediction probability

is intractable to compute and relies on statistical estimation for each class

(e.g., when is a deep network), the tight guarantee is statistically challenging to obtain. The actual algorithm used by Cohen et al. cohen2019certified is also a special case of Eq. (3).

4 Examples

In this section, we provide examples with efficient solutions to the optimization problem and thus the resulting certificate . Specifically, we consider the case where the random perturbation is i.i.d. in each coordinate via a randomized function :

(4)

We will start with the uniform perturbation in as a warm-up example, and proceed to a more complex scenario that yields a robustness certificate for distance in a discrete space.

4.1 Warm-up: the Uniform Perturbation

Figure 1: Uniform perturbations.

We consider a uniform perturbation in with a parameter as

(5)

Given a point of interest and its prediction probability for a class , the worst case prediction probability at another point has an analytical solution via Lemma 1 since there are only 3 possible likelihood ratios in the space: and .222For all such that , we define . If we denote the -balls with radius around and as and , respectively, then we can make , and as illustrated in Figure 1. Using Lemma 1, we have

where and is a constant. Hence, the minimizers of are simply the points that maximize the volume of . Accordingly,

Proposition 4.

If is defined as Eq. (5), we have and .

4.2 A Discrete Perturbation for Robustness

We consider robustness guarantees in a discrete space for some ; we define the following perturbation parameterized by a constant :

(6)

Here

can be regarded as the composition of a Bernoulli distribution and a uniform distribution. Due to the symmetry of the perturbation with respect to all the configurations of

such that (for some ), we have the following Lemma for the equivalence of :

Lemma 5.

If is defined as Eq. (6), given

, define the canonical vectors

and , where . Let . Then for all such that , we have .

Figure 2: Illustration for Eq. (7)

Based on Lemma 5, finding for a given , it suffices to find the maximum such that . Since the likelihood ratio is always positive and finite, the inverse exists (due to Remark 3), which allows us to pre-compute and check for each , instead of computing for each given and . Then is simply the maximum such that . Below we discuss how to compute in a scalable way. Our first step is to identify a set of likelihood ratio regions such that and as used in Lemma 1 can be computed efficiently. Note that, due to Lemma 5, it suffices to consider such that throughout the derivation.

For an radius , , we construct the region

(7)

which contains points that can be obtained by “flipping” coordinates from or coordinates from . See Figure 2 for an illustration, where different colors represent different types of coordinates: orange means both are flipped on this coordinate and they were initially the same; red means both are flipped and were initially different; green means only is flipped and blue means only is flipped. By denoting the numbers of these coordinates as , respectively, we have the following formula for computing the cardinality of each region .

Lemma 6.

For any we have , and

Therefore, for a fixed , the complexity of computing all the cardinalities is . Since each region has a constant likelihood ratio and we have , we can apply the regions to find the function via Lemma 1. Under this representation, the number of nonempty likelihood ratio regions is bounded by , the perturbation probability used in Lemma 1 is simply , and similarly for the . Based on Lemma 1 and Lemma 6, we may use a for-loop to compute the bijection for the input until , and return the corresponding as . The procedure is illustrated in Algorithm 1.

1:  sort by likelihood ratio
2:  
3:  for  do
4:     
5:     
6:     
7:     if  then
8:        
9:        
10:     else
11:        
12:        return
13:     end if
14:  end for
Algorithm 1 Computing

Scalable implementation. In practice, Algorithm 1 can be challenging to implement; the probability values (e.g., ) can be extremely small, which is infeasible to be computationally represented using floating points. If we set to be a rational number, both and can be represented in fractions, and thus all the corresponding probability values can be represented by two (large) integers; we also observe that computing the (large) cardinality is feasible in modern large integer computation frameworks in practice (e.g., python), which motivates us to adapt the computation in Algorithm 1 to large integers.

For simplicity, we assume with some . If we define , we may implement Algorithm 1 in terms of the non-normalized, integer version . Specifically, we replace and the constant with and , respectively. Then all the computations in Algorithm 1 can be trivially adapted except the division . Since the division is bounded by (see the comparison between line 9 and line 11), we can implement the division by a binary search over , which will result in an upper bound with an error bounded by in the original space, which is in turn bounded by assuming . Finally, to map the computed, unnormalized , denoted as , back to the original space, we find an upper bound of the division up to the precision of for some (we set in the experiments): we find the smallest upper bound of over via binary search, and report an upper bound of as with an error bounded by in total. Note that an upper bound of is still a valid certificate.

As a side note, simply computing the probabilities in the log-domain will lead to uncontrollable approximate results due to floating point arithmetic; using large integers to ensure a verifiable approximation error in Algorithm

1 is necessary to ensure a computationally accurate certificate.

4.3 Connection Between the Discrete Perturbation and an Isotropic Gaussian Perturbation

When the inputs are binary vectors , one may still apply the prior work cohen2019certified using an isotropic Gaussian perturbation to obtain an certificates since there is a bijection between and distance in . If one uses a denoising function that projects each perturbed coordinate back to the space using the (likelihood ratio testing) rule

then the composition is equivalent to our discrete perturbation with where is the CDF function of the Gaussian perturbation with mean and variance .

If one applies a classifier upon the composition (or, equivalently, the discrete perturbation), then the certificates obtained via the discrete perturbation is always tighter than the one via Gaussian perturbation. Concretely, we denote as the set of measurable functions with respect to the Gaussian perturbation that can be written as the composition for some , and we have

where the LHS corresponds to the certificate derived from the discrete perturbation (i.e., applying to an isotropic Gaussian), and the RHS corresponds to the certificate from the Gaussian perturbation.

4.4 A Certificate with Additional Assumptions

In the previous analyses, we assume nothing but the measurability of the classifier. If we further make assumptions about the functional class of the classifier, we can obtain a tighter certificate than the ones derived in §3. Assuming an extra denoising step in the classifier over a Gaussian perturbation as illustrated in §4.3 is one example.

Here we illustrate the idea with another example. We assume that the inputs are binary vectors , the outputs are binary , and that the classifier is a decision tree that each input coordinate can be used at most once in the entire tree. Under the discrete perturbation, the prediction probability under the perturbation can be computed via tree recursion, since a decision tree over the discrete perturbation can be interpreted assigning a probability of visiting the left child and the right child for each decision node. To elaborate, we denote and as the split feature index, the left child and the right child of the node. Without loss of generality, we assume that each decision node routes its input to the right branch if . Then can be found by the recursion

(8)

where the boundary condition is the output of the leaf nodes. Effectively, we are recursively aggregating the partial solutions found in the left subtree and the right subtree rooted at each node , and is the final prediction probability. Note that changing one input coordinate in is equivalent to changing the recursion in the corresponding unique node (if exists) that uses feature as the splitting index, which gives

In addition, changes in the left subtree do not affect the partial solution found in the right subtree, and vice versa. Hence, we may use dynamic programming to find the exact adversary under each radius by aggregating the worst case changes found in the left subtree and the right subtree rooted at each node . See Appendix B.1 for details.

4.5 Learning and Prediction in Practice

Since we focus on the development of certificates, here we only briefly discuss how we train the classifiers and compute the prediction probability in practice.

Deep networks: We follow the approach proposed by the prior work lecuyer2018certified : training is conducted on samples drawn from the input perturbation via a cross entropy loss. The prediction probability

is estimated by the lower bound of the Clopper-Pearson Bernoulli confidence interval 

clopper1934use with 100K samples drawn from the perturbation and the confidence level. Since is an increasing function of (Remark 3), a lower bound of entails a valid certificate.

Decision trees: we train the decision tree greedily in a breadth-first ordering with a depth limit; for each split, we only search coordinates that are not used before to enforce the functional constraint in §4.4, and optimize a weighted gini index, which weights each training example by the probability that it is routed to the node by the discrete perturbation. The details of the training algorithm is in Appendix B.2. The prediction probability is computed by Eq. (8).

5 Experiment

In this section, we validate the robustness certificates of the proposed discrete perturbation () in norm. We compare to the state-of-the-art isotropic Gaussian perturbation (cohen2019certified , since an certificate with radius in can be obtained from an certificate with radius . Note that the derived certificate from Gaussian perturbation is still tight with respect to all the measurable classifiers (see Theorem 1 in cohen2019certified ). We consider the following evaluation measures:

  • [leftmargin=4mm]

  • : the average certified radius (with respect to the labels) across the testing set.

  • ACC@: the certified accuracy within a radius (the average in the testing set).

5.1 Binarized MNIST

Certificate ACC@
3.456 0.921 0.774 0.539 0.524 0.357 0.202 0.097
 cohen2019certified 1.799 0.830 0.557 0.272 0.119 0.021 0.000 0.000
 cohen2019certified 2.378 0.884 0.701 0.464 0.252 0.078 0.000 0.000
Table 1: Randomly smoothed CNN models on the MNIST dataset. The first two rows refer to the same model with certificates computed via different methods (see details in §4.3).

We use a // split of the MNIST dataset for training/validation/testing. For each data point

in the dataset, we binarize each coordinate by setting the threshold as

. Experiments are conducted on randomly smoothed CNN models and the implementation details are in Appendix C.1.

The results are shown in Table 1. For the same randomly smoothed CNN model (the and rows in Table 1), our certificates are consistently better than the ones derived from the Gaussian perturbation (see §4.3). The gap between the average certified radius is about in distance, and the gap between the certified accuracy can be as large as . Compared to the models trained with Gaussian noise (the row in Table 1), our model is also consistently better in terms of the measures.

Since the above comparison between our certificates and the Gaussian-based certificates is relative, we conduct an exhaustive search over all the possible adversary within radii and to study the tightness against the exact certificate. The resulting certified accuracies at radii and are and , respectively, which suggest that our certificate is reasonably tight when ( vs. ), but still too pessimistic when ( vs. ). The phenomenon is expected since the certificate is based on all the measurable functions for the discrete perturbation. A tighter certificate requires additional assumptions on the classifier such as the example in §4.4.

5.2 ImageNet

and certificate ACC@
0.538 0.394 0.338 0.274 0.234 0.190 0.176
 cohen2019certified 0.372 0.292 0.226 0.194 0.170 0.154 0.138
Table 2:

The guaranteed accuracy of randomly smoothed ResNet50 models on ImageNet.

We conduct experiments on ImageNet deng2009imagenet , a large scale image dataset with labels. Following common practice, we consider the input space by scaling the images. We consider the same ResNet50 classifier he2016deep and learning procedure as Cohen et al. cohen2019certified with the only modification on the perturbation distribution. The details and visualizations can be found in Appendix C.2. For comparison, we report the best guaranteed accuracy of each method for each radius in Table 2. Our model outperforms the competitor by a large margin at ( vs. ), and consistently outperforms the baseline across different radii.

(a) # of nonempty
(b) for an
(c) The certified accuracy for an
Figure 3: Analysis of the proposed method in the ImageNet dataset.
(a)
(b)
(c)
Figure 4: The guaranteed AUC in the Bace dataset across different radius and the ratio of testing data that the adversary can manipulate.

Analysis. We analyze our method in ImageNet in terms of 1) the number of nonempty likelihood ratio region in Algorithm 1, 2) the computed , and 3) the certified accuracy at each . The results are in Figure 3. 1) The number of nonempty likelihood ratio regions is much smaller than the bound for small radii. 2) The value approaches more rapidly for a higher value than a lower one. Note that only reaches when due to Remark 3. Computing in large integer is time-consuming, which takes about days for each and , but this is only computed once and can be parallelized across different and .333As a side note, computing in MNIST takes less than second for each and . 3) The certified accuracy behaves nonlinearly across different radii; relatively, a high value exhibits a high certified accuracy at small radii and low certified accuracy at large radii, and vice versa.

5.3 Chemical Property Prediction

The experiment is conducted on the Bace dataset subramanian2016computational , a binary classification dataset for biophysical property prediction on molecules. We use the Morgan fingerprints rogers2010extended to represent molecules, which are commonly used binary features wu2018moleculenet indicating the presence of various chemical substructures. The dimension of the features (fingerprints) is . Here we focus on an ablation study comparing the proposed randomly smoothed decision tree with a vanilla decision tree, where the adversary is found by dynamic programming in §4.4 (thus the exact worse case) and a greedy search, respectively. More details can be found in Appendix C.3.

Since the chemical property prediction is typically evaluated via AUC wu2018moleculenet , we define a robust version of AUC that takes account of the radius of the adversary as well as the ratio of testing data that can be manipulated. Note that to maximally decrease the score of AUC via a positive (negative) example, the adversary only has to maximally decrease (increase) its prediction probability, regardless of the scores of the other examples. Hence, given an radius and a ratio of testing data, we first compute the adversary for each testing data, and then find the combination of adversaries and the clean data under the ratio constraint that leads to the worst AUC score. See details in Appendix C.4.

The results are in Figure 4. Empirically, the adversary of the decision tree at always changes the prediction probability of a positive (negative) example to (). Hence, the plots of the decision tree model are constant across different radii. The randomly smoothed decision tree is consistently more robust than the vanilla decision tree model. We also compare the exact certificate of the prediction probability with the one derived from Lemma 1; the average difference across the training data is and when equals to and , respectively. The phenomenon encourages the development of a classifier-aware guarantee that is tighter than the classifier-agnostic guarantee.

6 Conclusion

We present a stratified approach to certifying the robustness of randomly smoothed classifiers, where the robustness guarantees can be obtained in various resolutions and perspectives, ranging from a pointwise certificate to a regional certificate and from general results to specific examples. The hierarchical investigation opens up many avenues for future extensions at different levels.

References

  • [1] G. W. Bemis and M. A. Murcko. The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry, 39(15):2887–2893, 1996.
  • [2] X. Cao and N. Z. Gong.

    Mitigating evasion attacks to deep neural networks via region-based classification.

    In Proceedings of the 33rd Annual Computer Security Applications Conference, pages 278–287. ACM, 2017.
  • [3] N. Carlini, G. Katz, C. Barrett, and D. L. Dill. Provably minimally-distorted adversarial examples. arXiv preprint arXiv:1709.10207, 2017.
  • [4] C.-H. Cheng, G. Nührenberg, and H. Ruess. Maximum resilience of artificial neural networks. In International Symposium on Automated Technology for Verification and Analysis, pages 251–268. Springer, 2017.
  • [5] C. J. Clopper and E. S. Pearson. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4):404–413, 1934.
  • [6] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter. Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918, 2019.
  • [7] F. Croce, M. Andriushchenko, and M. Hein.

    Provable robustness of relu networks via maximization of linear regions.

    the 22nd International Conference on Artificial Intelligence and Statistics, 2018.
  • [8] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In

    Proceedings of the IEEE international conference on computer vision

    , pages 248–255. Ieee, 2009.
  • [9] S. Dutta, S. Jha, S. Sankaranarayanan, and A. Tiwari. Output range analysis for deep feedforward neural networks. In NASA Formal Methods Symposium, pages 121–138. Springer, 2018.
  • [10] K. Dvijotham, S. Gowal, R. Stanforth, R. Arandjelovic, B. O’Donoghue, J. Uesato, and P. Kohli. Training verified learners with learned verifiers. arXiv preprint arXiv:1805.10265, 2018.
  • [11] K. Dvijotham, R. Stanforth, S. Gowal, T. Mann, and P. Kohli. A dual approach to scalable verification of deep networks. the 34th Annual Conference on Uncertainty in Artificial Intelligence, 2018.
  • [12] R. Ehlers.

    Formal verification of piece-wise linear feed-forward neural networks.

    In International Symposium on Automated Technology for Verification and Analysis, pages 269–286. Springer, 2017.
  • [13] C. Finlay, A.-A. Pooladian, and A. M. Oberman. The logbarrier adversarial attack: making effective use of decision boundary information. arXiv preprint arXiv:1903.10396, 2019.
  • [14] M. Fischetti and J. Jo. Deep neural networks and mixed integer linear optimization. Constraints, 23:296–309, 2018.
  • [15] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
  • [16] S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, T. Mann, and P. Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018.
  • [17] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pages 770–778, 2016.
  • [18] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97–117. Springer, 2017.
  • [19] M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana. Certified robustness to adversarial examples with differential privacy. IEEE Symposium on Security and Privacy (SP), 2019.
  • [20] G.-H. Lee, D. Alvarez-Melis, and T. S. Jaakkola. Towards robust, locally linear deep networks. In International Conference on Learning Representations, 2019.
  • [21] B. Li, C. Chen, W. Wang, and L. Carin. Second-order adversarial attack and certifiable robustness. arXiv preprint arXiv:1809.03113, 2018.
  • [22] X. Liu, M. Cheng, H. Zhang, and C.-J. Hsieh. Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pages 369–385, 2018.
  • [23] A. Lomuscio and L. Maganti. An approach to reachability analysis for feed-forward relu neural networks. arXiv preprint arXiv:1706.07351, 2017.
  • [24] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu.

    Towards deep learning models resistant to adversarial attacks.

    In International Conference on Learning Representations, 2018.
  • [25] M. Mirman, T. Gehr, and M. Vechev. Differentiable abstract interpretation for provably robust neural networks. In

    International Conference on Machine Learning

    , pages 3575–3583, 2018.
  • [26] J. Neyman and E. S. Pearson. Ix. on the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694-706):289–337, 1933.
  • [27] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer.

    Automatic differentiation in pytorch.

    2017.
  • [28] A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. International Conference on Learning Representations, 2018.
  • [29] A. Raghunathan, J. Steinhardt, and P. S. Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, pages 10877–10887, 2018.
  • [30] D. Rogers and M. Hahn. Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.
  • [31] K. Scheibler, L. Winterer, R. Wimmer, and B. Becker. Towards verification of artificial neural networks. In MBMV, pages 30–40, 2015.
  • [32] G. Singh, T. Gehr, M. Mirman, M. Püschel, and M. Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, pages 10802–10813, 2018.
  • [33] G. Subramanian, B. Ramsundar, V. Pande, and R. A. Denny. Computational modeling of -secretase 1 (bace-1) inhibitors using ligand based approaches. Journal of chemical information and modeling, 56(10):1936–1949, 2016.
  • [34] V. Tjeng, K. Xiao, and R. Tedrake. Evaluating robustness of neural networks with mixed integer programming. International Conference on Learning Representations, 2017.
  • [35] K. Tocher. Extension of the neyman-pearson theory of tests to discontinuous variates. Biometrika, 37(1/2):130–144, 1950.
  • [36] T.-W. Weng, H. Zhang, H. Chen, Z. Song, C.-J. Hsieh, D. Boning, I. S. Dhillon, and L. Daniel. Towards fast computation of certified robustness for relu networks. the 35th International Conference on Machine Learning, 2018.
  • [37] E. Wong and J. Z. Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. the 35th International Conference on Machine Learning, 2018.
  • [38] E. Wong, F. Schmidt, J. H. Metzen, and J. Z. Kolter. Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems, pages 8400–8409, 2018.
  • [39] Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
  • [40] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel. Efficient neural network robustness certification with general activation functions. In Advances in Neural Information Processing Systems, pages 4939–4948, 2018.

Appendix A Proofs

To simplify exposition, we use to denote the set .

a.1 The proof of Lemma 1

Proof.

, We may rewrite the probabilities in an integral form:

Note that for all possible , we can re-assign all the function output within a likelihood region to be constant without affecting and . Concretely, we define as

then we have

Since in , is constant, we also have

Therefore,

Hence, it suffices to consider the following program

where the optimum is equivalent to the program

and the each corresponds to a solution . For example, the in the statement corresponds to the defined as:

(9)

We may simplify the program as

Clearly, if , all the optimal will assign ; our solution satisfies this property since