1 Introduction
A key problem in research on adversarial examples is that vulnerability to adversarial examples is usually measured by running attack algorithms (Szegedy et al., 2014). Because the attack algorithms are not optimal, the attack algorithms are prone to overestimating the size of perturbation needed to fool the target model. In other words, the attack-based methodology provides an upper-bound on the size of a perturbation that will fool the model, but security guarantees require a lower bound. CLEVER (Weng et al., 2018) is a proposed scoring method to estimate a lower bound. In this report, we show that gradient masking, a common problem that causes attack methodologies to provide only a very loose upper bound, causes CLEVER to overestimate the size of perturbation needed to fool the model. In other words, CLEVER does not resolve the key problem with the attack-based methodology, because it fails to provide a lower bound.
2 Clever
CLEVER is based on estimating the local Lipschitz constant of a model represented by a function applied to an input . The estimate is formed by observing -norms of the gradient, at random sampled points . These observations are then used to form a statistical estimate of the local Lipschitz constant.
CLEVER is only intended to be applied to Lipschitz-continous functions.
3 Counterexample allowing for infinite precision
First, we show that CLEVER can underestimate the local Lipschitz constant, even in a theoretical setting, where we assume CLEVER is able to represent real numbers with unlimited precision.
Suppose that we have some function which has a large local Lipschitz constant at some input . For example,
could be a typical neural network with no defenses against adversarial examples.
Define where is a staircase function, rounding the input to a set of quantized values. One example of such a staircase function is
where
is a hyperparameter determining how many levels to quantize each unit interval into.
Because the function has zero gradient “almost everywhere” our samples of will all observe zero gradient of “almost surely”. Here “almost everywhere” and “almost surely” are used in the measure theoretic sense, meaning that they hold except for a set of measure zero.
Having observed no gradient of , the CLEVER score will regard as constant, even though by construction (we assumed is highly sensitive to its input) it is not. This shows that CLEVER will conclude the model is highly robust, even though it is not.
There is one problem with this example though: the function is not Lipschitz continuous. CLEVER is intended only for Lipschitz continuous functions. This difficulty can be resolved by approximating the staircase function with a Lipschitz continuous function . For example, instead of a staircase function that increases with linear jumps, use a staircase-like function that increases with linear ramps of width , as illustrated in Figure 1. As
, the probability of CLEVER observing a non-zero gradient with finite
becomes arbitrarily small.4 Implementation on Digital Computers
The problems in the previous section are applicable even if CLEVER is able to represent real numbers with unlimited precision.
In practice, because CLEVER is a metric to be used in experiments, CLEVER must be regarded as a form of software that executes on a digital computer to evaluate finite-precision machine learning models, rather than as an abstract function applied to abstract real-valued variables.
In a digital computer, every function is either constant or it is not Lipschitz continuous, because the rounding to a finite number of bits means that any function that increases or decrease does so in discrete jumps. Because of this, it is not possible to satisfy the conditions of the theory for CLEVER in actual usage.
When CLEVER is used on a digital computers, it can be prone to various difficult-to-predict failures. For example, sigmoid units in the network may saturate to the point that the gradient through them is numerically rounded to machine zero. This would cause an effect similar to the staircase example above, with the sharp sigmoid function appearing not to be Lipschitz continuous due to numerical error, and the function appearing to CLEVER to be extremely robust due to the lack of observed gradient. Numerical saturation of sigmoid units has already been observed to interfere with attack-based benchmarking of defenses against adversarial examples
(Brendel & Bethge, 2017) and CLEVER does not resolve this problem.5 Silent failures
CLEVER is not intended for use on functions that are not Lipschitz continuous, but numerical error in digital computers can cause functions that are Lipschitz continuous in theory to be far from Lipschitz continuous in practice. CLEVER does not offer a mechanism to detect when this happens. It may be possible to get reasonable estimates from CLEVER if the user of CLEVER is aware of the problems causing loss of gradient and can mitigate them (e.g., by removing the staircase function proposed in Section 3). Unfortunately, inaccuracies resulting from numerical error are difficult to characterize, anticipate, or detect, so a user of CLEVER who obtains a good score will not know whether the model is robust or whether CLEVER has returned an inaccurate estimate without raising a warning that the method is not applicable to the current model.
6 Conclusion
The use of the staircase function described in Section 3 and the various numerical difficulties described in Section 4 are all examples of gradient masking (Papernot et al., 2017). Gradient masking is any defense against adversarial examples that works by breaking attack algorithms by making the gradient useless (small or pointed in the wrong direction, too noisy, combined with a poorly conditioned Hessian, etc.). A key flaw in the methodology of evaluating defenses against adversarial examples by testing them against attacks is that the attacks can fail when the defender (intentionally or unintentionally, knowingly or unknowingly) uses gradient masking. This report shows that CLEVER suffers from the same flaw as the attack-based methodology, and does not actually offer a lower bound on the size of perturbation required to fool the model.
References
- Brendel & Bethge (2017) Wieland Brendel and Matthias Bethge. Comment on “biologically inspired protection of deep networks from adversarial attacks”. arXiv preprint arXiv:1704.01547, 2017.
- Papernot et al. (2017) Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519. ACM, 2017.
- Szegedy et al. (2014) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014. URL http://arxiv.org/abs/1312.6199.
- Weng et al. (2018) Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BkUHlMZ0b.