Gradient Masking Causes CLEVER to Overestimate Adversarial Perturbation Size

04/21/2018 ∙ by Ian Goodfellow, et al. ∙ Google 0

A key problem in research on adversarial examples is that vulnerability to adversarial examples is usually measured by running attack algorithms. Because the attack algorithms are not optimal, the attack algorithms are prone to overestimating the size of perturbation needed to fool the target model. In other words, the attack-based methodology provides an upper-bound on the size of a perturbation that will fool the model, but security guarantees require a lower bound. CLEVER is a proposed scoring method to estimate a lower bound. Unfortunately, an estimate of a bound is not a bound. In this report, we show that gradient masking, a common problem that causes attack methodologies to provide only a very loose upper bound, causes CLEVER to overestimate the size of perturbation needed to fool the model. In other words, CLEVER does not resolve the key problem with the attack-based methodology, because it fails to provide a lower bound.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A key problem in research on adversarial examples is that vulnerability to adversarial examples is usually measured by running attack algorithms (Szegedy et al., 2014). Because the attack algorithms are not optimal, the attack algorithms are prone to overestimating the size of perturbation needed to fool the target model. In other words, the attack-based methodology provides an upper-bound on the size of a perturbation that will fool the model, but security guarantees require a lower bound. CLEVER (Weng et al., 2018) is a proposed scoring method to estimate a lower bound. In this report, we show that gradient masking, a common problem that causes attack methodologies to provide only a very loose upper bound, causes CLEVER to overestimate the size of perturbation needed to fool the model. In other words, CLEVER does not resolve the key problem with the attack-based methodology, because it fails to provide a lower bound.

2 Clever

CLEVER is based on estimating the local Lipschitz constant of a model represented by a function applied to an input . The estimate is formed by observing -norms of the gradient, at random sampled points . These observations are then used to form a statistical estimate of the local Lipschitz constant.

CLEVER is only intended to be applied to Lipschitz-continous functions.

3 Counterexample allowing for infinite precision

First, we show that CLEVER can underestimate the local Lipschitz constant, even in a theoretical setting, where we assume CLEVER is able to represent real numbers with unlimited precision.

Suppose that we have some function which has a large local Lipschitz constant at some input . For example,

could be a typical neural network with no defenses against adversarial examples.

Define where is a staircase function, rounding the input to a set of quantized values. One example of such a staircase function is

where

is a hyperparameter determining how many levels to quantize each unit interval into.

Because the function has zero gradient “almost everywhere” our samples of will all observe zero gradient of “almost surely”. Here “almost everywhere” and “almost surely” are used in the measure theoretic sense, meaning that they hold except for a set of measure zero.

Having observed no gradient of , the CLEVER score will regard as constant, even though by construction (we assumed is highly sensitive to its input) it is not. This shows that CLEVER will conclude the model is highly robust, even though it is not.

There is one problem with this example though: the function is not Lipschitz continuous. CLEVER is intended only for Lipschitz continuous functions. This difficulty can be resolved by approximating the staircase function with a Lipschitz continuous function . For example, instead of a staircase function that increases with linear jumps, use a staircase-like function that increases with linear ramps of width , as illustrated in Figure 1. As

, the probability of CLEVER observing a non-zero gradient with finite

becomes arbitrarily small.

Figure 1: The staircase function with steps per unit interval, and the Lipschitz continuous approximation using linear ramps of width set to smaller than the step width. In the upper, zoomed out view, both and resemble the identity function. In the lower, zoomed in view, we see that in fact has a derivative of zero almost everywhere. The Lipschitz continuous approximation of has a derivative of zero with probability at points sampled uniformly at random. By shrinking further we can make the probability of observing nonzero derivatives arbitrarily small. Inserting this function anywhere in a model will not interfere with the operation of the model (beyond reducing its precision to bits for ) but will prevent CLEVER from observing nonzero gradient with arbitrarily high probability.

4 Implementation on Digital Computers

The problems in the previous section are applicable even if CLEVER is able to represent real numbers with unlimited precision.

In practice, because CLEVER is a metric to be used in experiments, CLEVER must be regarded as a form of software that executes on a digital computer to evaluate finite-precision machine learning models, rather than as an abstract function applied to abstract real-valued variables.

In a digital computer, every function is either constant or it is not Lipschitz continuous, because the rounding to a finite number of bits means that any function that increases or decrease does so in discrete jumps. Because of this, it is not possible to satisfy the conditions of the theory for CLEVER in actual usage.

When CLEVER is used on a digital computers, it can be prone to various difficult-to-predict failures. For example, sigmoid units in the network may saturate to the point that the gradient through them is numerically rounded to machine zero. This would cause an effect similar to the staircase example above, with the sharp sigmoid function appearing not to be Lipschitz continuous due to numerical error, and the function appearing to CLEVER to be extremely robust due to the lack of observed gradient. Numerical saturation of sigmoid units has already been observed to interfere with attack-based benchmarking of defenses against adversarial examples

(Brendel & Bethge, 2017) and CLEVER does not resolve this problem.

5 Silent failures

CLEVER is not intended for use on functions that are not Lipschitz continuous, but numerical error in digital computers can cause functions that are Lipschitz continuous in theory to be far from Lipschitz continuous in practice. CLEVER does not offer a mechanism to detect when this happens. It may be possible to get reasonable estimates from CLEVER if the user of CLEVER is aware of the problems causing loss of gradient and can mitigate them (e.g., by removing the staircase function proposed in Section 3). Unfortunately, inaccuracies resulting from numerical error are difficult to characterize, anticipate, or detect, so a user of CLEVER who obtains a good score will not know whether the model is robust or whether CLEVER has returned an inaccurate estimate without raising a warning that the method is not applicable to the current model.

6 Conclusion

The use of the staircase function described in Section 3 and the various numerical difficulties described in Section 4 are all examples of gradient masking (Papernot et al., 2017). Gradient masking is any defense against adversarial examples that works by breaking attack algorithms by making the gradient useless (small or pointed in the wrong direction, too noisy, combined with a poorly conditioned Hessian, etc.). A key flaw in the methodology of evaluating defenses against adversarial examples by testing them against attacks is that the attacks can fail when the defender (intentionally or unintentionally, knowingly or unknowingly) uses gradient masking. This report shows that CLEVER suffers from the same flaw as the attack-based methodology, and does not actually offer a lower bound on the size of perturbation required to fool the model.

References