CLEVER (Cross-Lipschitz Extreme Value for nEtwork Robustness) is a robustness metric for deep neural networks
CLEVER (Cross-Lipschitz Extreme Value for nEtwork Robustness) is an Extreme Value Theory (EVT) based robustness score for large-scale deep neural networks (DNNs). In this paper, we propose two extensions on this robustness score. First, we provide a new formal robustness guarantee for classifier functions that are twice differentiable. We apply extreme value theory on the new formal robustness guarantee and the estimated robustness is called second-order CLEVER score. Second, we discuss how to handle gradient masking, a common defensive technique, using CLEVER with Backward Pass Differentiable Approximation (BPDA). With BPDA applied, CLEVER can evaluate the intrinsic robustness of neural networks of a broader class -- networks with non-differentiable input transformations. We demonstrate the effectiveness of CLEVER with BPDA in experiments on a 121-layer Densenet model trained on the ImageNet dataset.READ FULL TEXT VIEW PDF
The robustness of neural networks to adversarial examples has received g...
A robustness certificate is the minimum distance of a given input to the...
Deep neural networks have been shown to lack robustness to small input
Research in computational deep learning has directed considerable effort...
In this paper we investigate the Burr distributions Family which contain...
This paper proposes a black box based approach for analysing deep neural...
Quantization, a commonly used technique to reduce the memory footprint o...
CLEVER (Cross-Lipschitz Extreme Value for nEtwork Robustness) is a robustness metric for deep neural networks
It is well-known that deep neural networks (DNNs) are vulnerable to adversarial examples, and a small perturbation added to the input can mislead the network to classify in any desired class. There has been significant efforts developing verification techniques to prove that no adversarial perturbation exists if given an input and a classifier function . However, the verification problem is hard and generally intractable because a general neural network classifier is highly non-convex and non-smooth.
Alternatively, instead of verifying the exact robustness , one idea is to provide a lower bound of , which guarantees that no adversarial examples exist within an ball of radius . We call the robustness lower bound of the input image on classifier function . CLEVER (Cross-Lipschitz Extreme Value for nEtwork Robustness)  is the first attack-agnostic robustness score to estimate the robustness lower bound for large-scale DNNs, e.g. modern ImageNet networks such as ResNet, Inception, etc. It is based on a theoretical analysis of formal robustness guarantee with Lipschitz continuity assumption. The authors of 
propose a sampling based approach with Extreme Value Theory to estimate the local Lipschitz constant, and empirically, this estimation aligns well with other robustness evaluation metrics, for example, the distortion of adversarial perturbation found by strong attacks.
In this work, we provide two extensions of CLEVER. First, we derive a new robustness guarantee for classifier functions that are twice differentiable, and we estimate the theoretical bounds via extreme value theory. Second, we extend CLEVER to be capable of evaluating the robustness of networks with non-differentiable input transformations, making it available for a wider class of neural networks deployed with gradient masking based defense.
Evaluating the robustness of a neural network can be done by crafting adversarial examples with a specific attack algorithm [2, 3, 4, 5]. However, this methodology has a major drawback as the resilience of a network to existing attacks is not guaranteed to be extended to subsequent attacks. In fact, many defensive methods have been shown either partially or completely broken after stronger and adaptive attacks are proposed [6, 7, 8, 9]. Thus, it is of great importance to provide an attack-agnostic robustness evaluation metric.
On the other hand, existing formal verification methods that solves the exact minimum adversarial distortion
(which is independent of attack algorithm) are quite expensive – verifying a small network with only a few hundred neurons on one input example can take a few hours, and in fact, even finding a non-trivial lower bound for can be hard, and so far only results on CIFAR and MNIST networks are available [11, 12].  presents a framework to estimate local Lipschitz constant using extreme value theory, and then obtain an attack-agnostic robustness score (CLEVER) based on first-order Lipschitz continuity condition. CLEVER can scale to ImageNet networks.
Recently, Goodfellow  raises concerns on CLEVER in the case of networks with gradient masking, a defensive technique that obfuscates model gradients to prevent gradient based attacks. One of the main objective of this work is to show that such concerns can be safely eliminated with the BPDA technique proposed in . Moreover, we also experimentally show how CLEVER can successfully handle networks with non-differentiable input transformations, including the stair-case function example in .
Let be the input of a -class classifier , the predicted class of is . Given and , we say is an adversarial example if there exists a makes while is small. A successful untargeted attack is to find a such that while a successful targeted attack is to find a such that given a target class . On the other hand, the definition of norm-bounded robustness is the following: given a target class , is the targeted robustness of , if
where . Similarly, is the untargeted robustness if (1) holds for all classes .
In , the authors have shown that if the classifier function has continuously differentiable components , the targeted robustness is
where is the local Lipschitz constant for the function within a local region and . A simple proof of this guarantee is based on the mean value theorem on the first order expansion of :
With Hölder’s inequality,
further extend their analysis to neural networks with ReLU activations, which is a special case ofnon-differentiable functions.
In this work, we provide formal robustness guarantees when classifier functions are twice differentiable – for example, neural networks with twice differentiable activations such as tanh, sigmoid, softplus, etc. For a twice-differentiable function , there exists such that
where is the Hessian of at . This is analogous to the Mean Value Theorem in the first order case, but extended with a second order term. This expansion of can be used to derive the targeted robustness of in the following Theorem:
Given an input and a -class classifier , the targeted robustness of is
where , , and .
Theorem 3.1 needs the value , which is the maximum subordinate norm of the Hessian matrix within . When , it becomes the well-known spectral norm, and can be evaluated efficiently on a single point using power iteration or Lanczos method. Under the framework of CLEVER, we apply extreme value theory to estimate by sampling different and running power iterations on each sampled point. In this paper, we focus on the case of only ( robustness). After we get an estimate of , a second order robustness lower bound can be estimated at point using (5). The estimated bound of (2) is named 1st-order CLEVER while the estimated bound of (5) is called 2nd-order CLEVER.
Gradient masking  is a popular defending method against adversarial examples where the model does not provide useful gradients for generating adversarial examples. Typical gradient masking techniques include adding non-differentiable layers 
(bit-depth reduction, JPEG compression, etc) to the network, numerically making the gradient vanish (Defensive Distillation
), and modifying the optimization landscape of the loss function in a local region
of each data point. These methods typically prevent gradient-based adversarial attacks by providing non-informative gradients. However, many of the gradient masking techniques have been shown ineffective as a defense. Notably, Defensive Distillation can be bypassed by attacking the logit (unnormalized probability) layer values to avoid the saturated softmax functions; many non-differentiable transformation functions can be bypassed using the Backward Pass Differentiable Approximation (BPDA); the modifications in local landscape of the loss function can be escaped by adding a small random noise when performing the attack .
When CLEVER is evaluated, we always use the logit layer values, thus we are not subject to the saturation of the sigmoid units. Additionally, during the sampling processes, we evaluate gradients using a large number of randomly perturbed images, thus CLEVER is likely to escape the region of masked gradients in local loss landscape. The remaining concern is thus whether CLEVER can be evaluated on networks with a non-differentiable layer as a defense. For example, if the input image is quantized via bit-depth reduction, a staircase function is applied to the network and thus its gradient cannot be computed via automatic differentiation. We will formally discuss this situation in the next section.
For a neural network classifier , we can apply a non-differentiable transformation to the input and then feed the data after transformation into . The function thus becomes non-differentiable, and gradient based adversarial attacks fail to find successful adversarial examples. An example of is a staircase function, as suggested in . This transformation also hinders the direct use of CLEVER to evaluate the robustness of .
To handle non-differentiable transformations, we use the Backward Pass Differentiable Approximation (BPDA)  technique. The intuition behind BPDA is that although is non-differentiable (e.g., bit-depth reduction, JPEG compression, etc), it usually holds that
. Thus, in backpropagation, we can assume that
To evaluate CLEVER for a network with an input transformation (for example, a staircase function), is sampled within an ball around . Then, a transformation is applied, such that . Then, the backpropagation procedure computes . We simply collect as the gradient, and compute its norm as a sample for Lipschitz constant estimation.
CLEVER is intended to be a tool for network designers and to evaluate network robustness in the “white-box” setting in which we know how a (defended) neural network processes the input. In this case, we can deal with the non-differentiable transformation with BPDA, and evaluate the intrinsic robustness of the model, without the “False Sense of Security ” provided by gradient masking.
In black-box attack setting, the gradient of must be evaluated via finite differences , thus a non-differentiable prevents gradient based attacks in black-box settings because the estimated gradient becomes infinite (i.e., the value of is unlikely to change when is changed by a small amount). Goodfellow  raises concerns on the effectiveness of CLEVER in this setting, but this setting is different from our intended usage of CLEVER. Most importantly, CLEVER computes gradients using backpropagation via automatic differentiation in the white-box setting, rather than using finite differences. Despite the limited numerical precision on digital computers, CLEVER is not subject to the same numerical issues as in the black-box attack setting. Unless backpropagation fails, CLEVER is able to estimate a reasonable robustness score reflecting the intrinsic model robustness.
We compute the targeted robustness bounds for a 7-layer CNN model with tanh activations (which is twice differentiable) on CIFAR dataset with a validation accuracy of 72.6%. We calculated both Eq. (2) and (5) via sampling with extreme value theory, and we denote the estimated scores as “1st order” and “2nd order” CLEVER scores respectively in the Tables. In particular, we follow the sampling procedure proposed in  to estimate the Lipschitz constant by fitting the samples with maximum likelihood estimation on Reversed Weibull distribution and calculate the estimated robustness scores of (2). For the “2nd order” bound (5), we also use sampling and extreme value theory to calculate the estimated bounds, as describe in Section 3.4. For fair comparison, we use the same number of samples ( and ) for both estimated bounds and we compare their average as well as the percentage of image examples that the score is larger than the other. For each image, we select three attack target classes: least likely, random and runner-up. The results are summarized in Tables 1, 2 and 3. We observe that the 1st order and 2nd order average CLEVER scores usually stay close, indicating that both scores agree with each other.
Since CLEVER is a score of estimated lower bound, we desire the score is not trivially small, but smaller than the upper bound found by adversarial attacks (in our case the CW attack). As shown in Tables 1, 2 and 3, all CLEVER scores are less then CW distortion. Second order CLEVER can sometimes give a better result than its first order counterpart, indicating that second order approximation is probably more accurate for these examples. The “avg. % of increase on the score” rows in tables report the improvement of score when one method is better than the other; for example, in runner-up target, second order CLEVER increases the score for 82% of the examples, and the average improvement of score comparing to first order CLEVER is 58%.
|Least-likely Target||1st order||2nd order|
|% of images with larger score||54||46|
|avg % of increase on the score||47%||44%|
|Runner-up Target||1st order||2nd order|
|% of images with larger score||18||82|
|avg % of increase on the score||77%||58%|
|Random Target||1st order||2nd order|
|% of images with larger score||76||24|
|avg % of increase on the score||55%||68%|
We conduct experiments on a 121-layer DenseNet  network pretrained on ImageNet dataset111model available at https://github.com/pudae/tensorflow-densenet. We employ two non-differentiable input transfomrations that mask gradients: bit-depth reduction (reducing each color channel from 8-bit to 3-bit, setting all lower bits to 0) and JPEG compression (quality set to 75%). We compute CLEVER (first order) scores for the network with and without input transformations, with CLEVER parameter and . We randomly choose 100 images from the ImageNet validation set, and select three attack target classes for each image (least likely, random and runner-up). Misclassified images are skipped.
Table 4 compares the CLEVER scores for three target classes, for the original model, and for bit-depth reduction or JPEG compression as input transformations. BPDA is used to compute CLEVER when an input transformation is applied. Not surprisingly, the CLEVER scores for networks with input transformation as a gradient masking method do not noticeably increase, indicating that these transformations do not increase the model’s intrinsic robustness; in other words, with BPDA applied, we can still obtain similar gradients as the original model, thus it is expected that CLEVER scores do not change too much in this situation.
CLEVER  is a first-order approximation based robustness score. We move one step further to give a second order formal guarantee for DNN robustness. We show that it improves the estimated robustness lower bound for some examples, and in many cases both first and second order CLEVER scores are coherent. Additionally, we successfully apply Backward Pass Differentiable Approximation (BPDA) to compute CLEVER scores for a network with non-differentiable input transformations, including staircase functions. Our discussions and results remedy the concerns raised in .
Tsui-Wei Weng and Luca Daniel acknowledge partial support of MIT IBM Watson AI Lab.
35th International Conference on Machine Learning (ICML), 2018.
ACM Workshop on Artificial Intelligence and Security, 2017, pp. 15–26.