Network Moments: Extensions and Sparse-Smooth Attacks

06/21/2020 ∙ by Modar Alfadly, et al. ∙ King Abdullah University of Science and Technology Université de Montréal 16

The impressive performance of deep neural networks (DNNs) has immensely strengthened the line of research that aims at theoretically analyzing their effectiveness. This has incited research on the reaction of DNNs to noisy input, namely developing adversarial input attacks and strategies that lead to robust DNNs to these attacks. To that end, in this paper, we derive exact analytic expressions for the first and second moments (mean and variance) of a small piecewise linear (PL) network (Affine, ReLU, Affine) subject to Gaussian input. In particular, we generalize the second-moment expression of Bibi et al. to arbitrary input Gaussian distributions, dropping the zero-mean assumption. We show that the new variance expression can be efficiently approximated leading to much tighter variance estimates as compared to the preliminary results of Bibi et al. Moreover, we experimentally show that these expressions are tight under simple linearizations of deeper PL-DNNs, where we investigate the effect of the linearization sensitivity on the accuracy of the moment estimates. Lastly, we show that the derived expressions can be used to construct sparse and smooth Gaussian adversarial attacks (targeted and non-targeted) that tend to lead to perceptually feasible input attacks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 7

page 8

page 9

page 10

page 11

page 12

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep neural networks (DNNs) have revolutionized not only the computer vision and machine learning communities but several other fields throughout science and engineering such as natural language processing, bioinformatics and medicine

[lecun2015yoshua]. While major advances in the areas of object classification [krizhevsky2012imagenet], and speech recognition [hinton2012deep] to name a few, have been attributed to DNNs, a rigorous theoretical understanding of their effectiveness remains elusive. For instance, while DNNs have shown impressive performance on visual recognition tasks, they still exhibit uncouth behaviour when they are subject to carefully tailored inputs [szegedy2013intriguing]. Many prior works show that it is rather easy, through simple routines, to craft imperceptible input perturbations, referred to as adversarial attacks. Such attacks can result in a drastic negative effect on the classification performance of many popular deep models [goodfellow2014explaining, moosavi2016deepfool, szegedy2013intriguing]. Even more surprisingly, one can design such adversarial perturbations to be agnostic to both the input image and the network architecture [moosavi2016universal], which are referred to as universal perturbations. Unfortunately, less progress has been made towards systematically addressing and understanding this challenge. One of the early and naive approaches towards addressing this nuisance is simply through augmenting the training dataset with data corrupted with adversaries. While this has been shown to improve network robustness against such adversaries [goodfellow2014explaining, moosavi2016deepfool], unfortunately, this is a vacuous brute force approach that does not provide insights on the reasons behind such behaviour. Moreover, it does not scale for large dimensional inputs, as the amount of corresponding augmentation has to necessarily be prohibitively large to capture the variation in input space. This effectively deems the augmentation approach infeasible in large dimensions.

Fig. 1: Two-stage linearization of an arbitrarily deep network. Any PL-DNN can be linearized before and after a given ReLU through a two-stage linearization truncating it into a (Affine, ReLU, Affine) network, whose and moments can be derived analytically when it is exposed to Gaussian input noise. We show that these moments are helpful in predicting how PL-DNNs react to noise and in constructing adversarial Gaussian input attacks.

In this paper, we derive expressions for the first and second moments (the mean and consequently the variance), referred to as Network Moments, of a small piecewise linear (PL) network in the form of (Affine, ReLU, Affine) subject to a general Gaussian input. The preliminary version of these Network Moments were derived and analyzed in [bibi2018analytic]. Beyond these preliminary results, we derive in this paper a new variance expression, which does not claim any assumptions on the mean or the covariance of the input Gaussian. This generalizes the previous result in [bibi2018analytic], which only holds under a zero mean input assumption. These expressions provide a powerful tool for analyzing deeper PL-DNNs by means of two-stage linearization (as shown in Figure 1) with a plethora of applications. For instance, it has been shown that such expressions can be quite useful in training robust networks very efficiently [alfadly2019train], avoiding any need for noisy data augmentation. In particular, empirical evidence in [alfadly2019train] indicates that simple regularizers based on the mean and variance expressions can boost network robustness by two orders of magnitude not only against Gaussian attacks but also against other popular adversarial attacks (e.g. PGD, LBFGS [szegedy2013intriguing], FGSM [goodfellow2014explaining] and DF2 [moosavi2016deepfool]). In this paper, we show that network moments can be used to systematically design Gaussian distributions that can serve as input adversaries. In particular, we conduct several experiments on MNIST [lecun1998mnist] and Facial Emotion Recognition datasets [goodfellow2015] to demonstrate that these expressions can be used to craft sparse and smooth Gaussian attacks that are structured and perceptually feasible, i.e. they exhibit interesting semantic information aligned with human perception.

Contributions. (i) We provide a fresh perspective on analyzing PL-DNNs by deriving closed form expressions for the output mean and variance of a network in the form (Affine, ReLU, Affine) in the presence of general Gaussian input noise. In particular, we generalize the results of [bibi2018analytic] and derive a closed form expression for the second moment under no assumptions on the mean nor covariance of the input Gaussian. Through network linearization, extensive experiments show that the new expression for the output variance can be efficiently approximated leading to much tighter second-moment estimates than that of [bibi2018analytic]. (ii) We formalize a new objective as a function of the derived output mean and variance to construct sparse and smooth Gaussian adversarial attacks. We conduct extensive experiments on both MNIST and Facial Emotion datasets demonstrating that the constructed adversaries are perceptually feasible.

2 Related Work

Despite the impressive performance of deep neural networks on visual recognition tasks, their performance can still be drastically obstructed in the presence of small imperceptible adversarial noise [goodfellow2014explaining, moosavi2016deepfool, szegedy2013intriguing]

. Alarmingly, such adversaries are abundant and easy to construct, where in some scenarios constructing an adversary is as simple as performing a single gradient ascent step of some loss function with respect to the input

[szegedy2013intriguing]. More surprisingly, there exist deterministic input samples that are agnostic of both the input and network architecture that can cause severe reduction in the network performance [moosavi2016universal]. Moreover, in some extreme cases, it can be sufficient to perturb a single input pixel that can result in a misclassification rate as high as on popular benchmarks [su2017one].

This nuisance is serious and menacing and has to be addressed, particularly since DNNs are now deployed in sensitive real-world applications (e.g. self driving cars). Thereafter, there have been several directions towards understanding and circumventing this. Early works aimed at analyzing the behaviour of DNNs in the general presence of input noise. For instance, Fawzi et al. [fawzi2016measuring]

proposed a generic probabilistic framework for analyzing the robustness of a classifier under different nuisance factors. Another seminal work particularly assessed the robustness of a classifier undergoing geometric transformations

[fawzi2015manitest]. On the other hand, there has been several other works on the design and training of networks that are robust against adversarial attacks. One of the earliest approaches on this was the direct augmentation of adversarial samples to the training data, which has been shown to indeed lead to more robust networks [goodfellow2014explaining, moosavi2016deepfool]. Later, the work of [madry2017towards] adopted a similar strategy but by incorporating the adversarial augmentation during the iterative training process. In particular, it was shown that one can achieve significant boosts in network robustness against first-order adversarial attacks, i.e. attacks that depend only on gradient information, by minimizing the worst adversarial loss over all bounded energy (often measured in norm) perturbations around a given input.

Since then, there has been a surge in literature studying verification approaches for DNNs. In this line of work, the aim is to design networks that are accurate and provably robust against all bounded input attacks. In general, verification approaches can be coarsely categorized as exact or relaxed verifiers. The former try to find the exact largest adversarial loss over all possible bounded inputs. Such verifiers often require piecewise linear networks and rely on either Mixed Integer Solvers (MIS) [cheng2017maximum, lomuscio2017approach] or on Satisfiability Modulo Theories (SMT) solvers [scheibler2015towards, katz2017reluplex]. These verifiers are too expensive for DNNs due to their NP-complete nature. Relaxed verifiers on the other hand scale better, since they only find an upper bound to the worst adversarial loss [zhang2018efficient, wong2017provable]. There has been several new directions that aim at addressing the verification problem by constructing networks with smoothed decision boundaries [lecuyer2019certified, cohen_randomized_1].

In this paper, we are not concerned with such techniques but only focus on analyzing the behaviour of networks in the presence of input noise. We focus our analysis on PL-DNNs with ReLU activations. Unlike previous work, we study how the probabilistic moments of the output of a PL-DNN with a Gaussian input can be computed analytically. A similar work to ours is [gast2018lightweight], where the probabilistic output mean and variance of a deep network are estimated by propagating the estimates of the moments per layer

under the assumption that the joint distribution after each affine layer is still Gaussian (through the central limit theorem). On the contrary, we derive the

exact first and second moments of a simple two-layer (Affine, ReLU, Affine) network. We extrapolate these expressions to deeper PL-DNNs by employing a simple two-stage linearization step that locally approximates them with a (Affine, ReLU, Affine) network. Since these expressions are a function of the noise parameters, they are particularly useful in analyzing and inferring the behaviour of the original PL-DNN without having to probe the network with inputs sampled from the noise distribution as regularly done in previous work [goodfellow2014explaining, moosavi2016deepfool].

3 Network Moments§§§All proofs are omitted for the Appendix.

We start by analyzing a particularly shaped network in the form of (Affine, ReLU, Affine) in the presence of Gaussian input noise. The functional form of the network of interest is given as , where is an element wise operator. The affine mappings can be of any size, and we assume throughout the paper that and , where

is the number of output logits. Note that

and can be of any structure (circular or Toeplitz) generalizing both fully connected and convolutional layers.

In this section, we analyze when

is a Gaussian random vector,

i.e.

. Seeking the probability density function (PDF) through the nonlinear random variable mapping

is possible for when but much more difficult for arbitrary in general. Thus, we instead focus on deriving the probabilistic moments of the unknown distribution of . For ease of notation, we denote as the function in , i.e. . At first, and for completeness, we present the results of our preliminary work [bibi2018analytic], where the first moment (mean) expression is derived for a general Gaussian input distribution, while the second moment is derived under a zero input mean assumption, i.e. with . We then derive and generalize the expression for the second moment of for a generic Gaussian distribution under no assumptions in Lemma 4.

3.1 Deriving the Output Moment:

To derive the first moment of , we first consider the scalar function acting on a single Gaussian random variable .

Remark 1.

The PDF of where is:

where is the Gaussian Q-function, is the dirac function, is the Gaussian PDF, and is the unit step function. It follows directly that when .

Now, we present the first moment of .

Theorem 1.

For any function in the form of where , we have:

where , , and is the error function.

3.2 Deriving the Output Moment:

Here, we need three pre-requisite lemmas: one that characterizes the PDF of a squared ReLU (Lemma 1), another that extends Price’s Theorem [price1958useful] (Lemma 2), and one that derives the first moment of the product of two ReLU functions (Lemma 3).

Lemma 1.

The PDF of where is :

and its first moment is .

Lemma 2.

Let for any even p, where . Under mild assumptions on the nonlinear map , we have

% odd

iσii+1 .

Lemma (2) relates the mean of the gradients/subgradients of any nonlinear function to the gradients/subgradients of the mean of that function. This lemma has Price’s theorem [price1958useful] as a special case when the function has the structure with . It is worthwhile to note that there is an extension to Price’s theorem [mcmahon1964extension], where the assumptions and are dropped; however, it only holds for the bivariate case, i.e. , and thus is also a special case of Lemma (2).

Lemma 3.

For any bivariate Gaussian random variable , the following holds for :

where and .

Theorem 2.

For any function in the form of where and that then:

Lastly, the variance of can be directly derived: . While the previous expression assumes a zero-mean Gaussian input and bias-free first layer, i.e. , we extend these results next to arbitrary Gaussian distributions without assumptions on . The key element here is to extend the result of Lemma 3.

Lemma 4.

For any bivariate Gaussian , where and , then we have that

(1)

where

(2)

and where

(3)

Note that and are the two dimensional canonical vectors. Moreover, note that where rearranges the elements of the vector into a diagonal matrix and denotes the matrix determinant. The constants and are , , and , respectively. Lastly, is the Hermite polynomial, is the normalized incomplete Gamma function and is the standard Gamma function.

Proof.

This is a sketch of the proof.

(4)

where . The functions , and are the joint bivariate, conditional and marginal Gaussian distributions. By integration by parts, Leibniz’s rule, some identities and substitutions, Equation (4) reduces to:

(5)

where is given by Equation (2). As for the remaining integral, we exploit identities (2.1) and (2.2) in [fayed2014evaluation], which states that has a closed form solution given in Equation (3). Thus, one can represent the integral in Equation (5) as where , and . Now note that the infinite series corresponding to and in Equation (3) converges when or equivalently which proves the first case in Equation (1). As for the case , by integrating the integral in Equation (5) by parts, we have

(6)

Note that the series from the identity replacing converges when or equivalently . Thus, substituting this result back in Equation (5) derives the second case of Equation (1) and completing the proof. ∎

Following Theorem 2, a closed form expression for under generic Gaussian distributions can be derived by substituting the result from Lemma 4 (in lieu of Lemma 3) in the proof of Theorem 2 deriving an expression for the variance of . Moreover, we show in the Appendix that Equation (1) recovers Lemma 3 for when .

3.3 Extension to Deeper PL-DNNs

To extend the previous results to deeper DNNs that are not in the form (Affine, ReLU, Affine), we first denote the larger DNN as (e.g. a mapping of the input to the logits of classes). By choosing the ReLU layer, any can be decomposed into: . In this paper, we employ a simple two-stage linearization based on Taylor series approximation to cast into the form (Affine, ReLU, Affine). For example, we can linearize it around points and , such that and . The resulting function after linearization is . Figure 1 shows this two-stage linearization. Details in regards to the selection of the layer of linearization and the points of linearization are discussed thoroughly next.

4 Experiments

In this section, we discuss a variety of experiments to provide the following insights. (i) Although the derived output variance of the Affine-ReLU-Affine network based on Equation (1) is impractical, the infinite sum can be accurately approximated with as few as 20 terms leading to an efficient computation. (ii) We conduct several controlled experiments to investigate the choice of the linearization layer , at which two-stage linearization is performed. We also validate the tightness of both the first and second moment expressions for deeper networks under different linearization points, as well as, showing that the new derived variance based on Lemma 4 is much tighter than the one based on Lemma 3 for general input Gaussian distributions. (iii) Lastly, extensive experiments on MNIST and Emotion datasets validate that our derived expressions can be used to construct targeted and non-targeted adversarial Gaussian attacks. In particular, and following the recent successes of sparse pixel attacks [modas019sparsefool], we demonstrate that our expressions can indeed be utilized to design sparse and smooth Gaussian perturbations leading to perceptually feasible input attacks.

4.1 On the Efficacy of Approximating Equation (1)

Computing the variance of the Affine-ReLU-Affine network, i.e. , under general Gaussian input , as per Equation (1) in Lemma 4, requires the evaluation of Equation (3), which is impractical as it involves an infinite series. We show here that the series can be sufficiently well approximated with as few as 20 terms. To demonstrate this along with the sensitivity of Equation (1) to , , , and , we report the maximum absolute error between the Monte Carlo estimates of and truncated versions of the sum in Equation (1) with , , , , and terms over a grid of all combinations of the five arguments. In particular, and are sampled uniformly from the grid , and are on the uniform grid , and lastly is sampled uniformly from the grid , where all parameters are sampled with spacing. In addition, we also include and . Figure 2 reports the maximum absolute error of all possible combinations of the aforementioned parameters in log-scale with an increasing number of terms of Equation (3). We observe from Figure 2 that, with as few as 20 terms, the maximum absolute error between the Monte Carlo estimates and the truncated version of Equation (1) is . This occurs regardless of the choice of , , and and particularly when is close to , which is the disjunction in Equation (1). Recall that the disjunction occurs at these values of , since the infinite series diverges in such cases. On the other hand, the maximum absolute error decreases rapidly so long as is away from . Now that Equation (1) can reliably and efficiently be approximated with a small number of terms, deeming it efficient, the closed form expression of Equation (1) can be used to compute the output variance of for various applications. Throughout all remaining experiments, we will use only 5 terms, since the absolute error is of order for all choices of except for the improbable two singularities .

Fig. 2: Approximating Equation (1). The maximum absolute error between the Monte Carlo estimates and the truncated Equation (1) for different parameterizations of the bivariate Gaussian decreases rapidly as the number of terms in the truncated sum increases. The two plots show that the error decreases very quickly and regardless of all the other parameters, when is not close to the disjunction, i.e. .

4.2 Tightness of Network Moments

Choice of the Two-Stage Linearization Layer . The derived expressions for the first and second moments are for a small network in the form Affine-ReLU-Affine. As detailed in Subsection 3.3, such results can be extended and applied to deeper networks through the proposed two-stage linearization. However, it is not clear how to choose the layer of linearization . This subsection addresses this design choice by conducting an ablation to study the impact of varying . In particular, we show that there is an intrinsic trade off between memory efficiency and linearization error for the choice of the layer , around which two-stage linearization is performed. To illustrate this, consider the following network where , , and . Performing two-stage linearization requires the memory of storing the Jacobians of the two-stage linearization and , which is a total of elements. When is chosen to be small (early convolutional layers), the value is usually very large, as it is the total number of pixels across all feature maps. Meanwhile, when is large, is usually only the number of nodes in a fully connected layer. However, the choice of large in general leads to larger linearization error. To demonstrate this, we conduct experiments on the LeNet architecture [lecun1999object] pretrained on the MNIST digit dataset [lecun1998mnist]

. Note that LeNet has a total of four layers, two of which are convolutional with max pooling and the other two are fully connected. We perform two-stage linearization on LeNet with a varying choice of

, where we compare the difference between the prediction scores of LeNet and the two-stage linearized version with the point of linearization taken to be a noisy version of a random image from the MNIST validation set. Table I demonstrates that the choice of smaller is best, in sense, for the two-stage linearization across all the various levels of noisy versions of the input. This implies a trade off between memory efficiency (better memory complexity with larger ) and accuracy (better linearization error for smaller ). Therefore and due to memory constraints, is chosen to be the fully-connected layer just before the last ReLU activation in all experiments, unless stated otherwise.

Noise 0.5 0.75 1 1.5 2
0.0241 0.0362 0.0485 0.0730 0.0977
0.0330 0.0497 0.0663 0.0996 0.1330
0.0329 0.0495 0.0661 0.0993 0.1327
TABLE I: Varying the layer of linearization . The average approximation error, on a randomly sampled MNIST image corrupted with additive Gaussian noise, between LeNet and the two stage-linearized version increases as the layer of linearization increases.

Tightness of Moment Expressions on LeNet. It is conceivable that the two-stage linearization might impact the tightness of the derived moment expressions when applied to deeper real PL-DNNs. Here, we empirically study their tightness by comparing them against Monte Carlo estimates over samples on LeNet. Using the MNIST dataset, the input to the network is with 10 output classes (i.e. ). In this case, following Section 3.3, the two-stage linearization is performed such that , and for memory efficiency, where is an image selected from the MNIST testing set. Thus, the input is where we randomly generate a covariance matrix such that with reasonable noise levels when . Since the LeNet architecture has , we report the tightness of the analytic mean from Theorem 1, variance from Theorem 2, and the new general variance expression based on Lemma 4 for . As for the metric, we report the average absolute relative difference of the analytic mean and variance expressions (Theorems 1 and 2) to their Monte Carlo counterparts. We refer to each as and , respectively. Similarly, we refer to the error of the Monte Carlo estimates to the new variance expression based on Lemma 4 as , where we find that the summation in Equation (3) can be truncated to only terms without scarifying much accuracy. We average the results over the complete MNIST test set. We report the tightness results across all classes in Table II, where the closer the errors are to the better. For instance, at , the absolute relative difference for the mean expression of Theorem 1 are close to , i.e. . That is to say, the mean expression is tight even though two-stage linearization is being performed on a real network. Whereas, the variance expression of Theorem 2 is less accurate, , and this can be attributed to the assumptions that do not hold (zero-mean input Gaussian and ). On the other hand, the new general expression for the output variance based on Lemma 4 is significantly much tighter than the one from Theorem 2, as the errors compared to the Monte Carlo estimates are closer to , i.e. . This shows that our new variance expression is far tighter and less sensitive to two-stage linearization despite the truncation of the infinite series to as few as terms. Furthermore, complementing the results in Table II and instead of reporting the absolute relative difference alone, we visualize the histogram of LeNet output variances for all testing MNIST images under varying noise levels in Table III for better interpretability of the results.

0.001 0
1
2
3
4
5
6
7
8
9
Avg
0.010 0
1
2
3
4
5
6
7
8
9
Avg
0.050 0
1
2
3
4
5
6
7
8
9
Avg
0.100 0
1
2
3
4
5
6
7
8
9
Avg
TABLE II: Tightness results across all classes on MNIST. Using different values of input noise , the table shows that the mean expression of Theorem 1 is tight and insensitive under two-stage linearization. Moreover, the table demonstrates that the new general variance expression based on Lemma 4 is far tighter, despite the truncation of the infinite series, compared the previous results from Theorem 2.
Variance histograms with input noise
Variance histograms with input noise
Variance histograms with input noise
Variance histograms with input noise
TABLE III: LeNet variance histograms on MNIST. The table shows the histogram of LeNet output variances for all testing MNIST images under varying noise levels. We compare the estimation of output variance through Monte-Carlo sampling of instances against the variance expressions in Theorem 2 (Old) and Lemma 4 (New). We also report in the legend the averaged absolute relative difference over the images.

Sensitivity to the Point of Linearization. In all previous tightness validation experiments of the moment expressions, the point at which two-stage linearization is performed was restricted to be the input image, i.e. . Clearly, this strategy suffers from limited scalability, since analyzing the output moment expressions of deep networks over a large dataset requires performing the expensive two-stage linearization for every image in the dataset. To circumvent this difficulty, we study the sensitivity of the tightness of the expressions under two-stage linearization around only a small set of input images from the dataset. That is to say, we choose a set of representative input images, at which the two-stage linearization parameters and are computed only once and offline for each input image. Now, to evaluate the network moments for an unseen input, we simply use the two-stage linearization parameters of the closest linearization point to this input.

In this experiment, we study the tightness of our expressions under this more relaxed linearization strategy using LeNet on the MNIST testing set. We cluster the images in the testing dataset using -means on the image intensity space with different values of . We use the cluster centers as the linearization points. Table IV summarizes the tightness of the expressions for and compares them against a weak baseline, where the linearization point is set to be the farthest image in each cluster from the cluster center with . It is clear that the new variance expression based on Lemma 4 remains very close to the Monte Carlo estimate across different number of linearization points , even when is as low as , i.e. only of the testing set. On the other hand, the analytic variance derived from Theorem 2 is less accurate but stays within an acceptable range with . This indeed reaffirms that even upon truncating the infinite series in Equation (3) to only terms, the new variance expression is much tighter and more accurate under network linearization than the preliminary result of Theorem 2 in [bibi2018analytic]. As for the analytic mean, however, it is more sensitive to the point of linearization but even in the worst case, i.e. and for example, the average error doesn’t exceed . When compared with the baseline experiments, i.e. using the farthest point to the cluster center, the contrast becomes more obvious where the error is about .

0.001 250
250*
500
500*
1000
2500
5000
10000
0.010 250
250*
500
500*
1000
2500
5000
10000
0.050 250
250*
500
500*
1000
2500
5000
10000
0.100 250
250*
500
500*
1000
2500
5000
10000
TABLE IV: Tightness results under varying number of linearization points . The table shows the tightness results using varying number of linearization points (i.e. -means cluster centers) averaged over all testing MNIST classes under different values of input noise .
*As for the Baseline experiment, the linearization points are set to be the farthest instances from the clusters’ centers.

4.3 Noise Construction

After establishing the tightness of our expressions compared to Monte Carlo estimates, we show more practical applications of these expressions, in which the output mean and variance expressions can be used to construct noise with certain properties. In particular, we are interested in showing that samples from a carefully crafted Gaussian distribution can act as an adversary. This goes against the common belief that Gaussian noise is too simple for such a task. In this section, we show insightful results on how to construct targeted and non-targeted Gaussian adversarial attacks. Moreover, we also show that such expressions can be leveraged to construct sparse and smooth Gaussian adversarial attacks that are perceptually feasible. It is to be noted here that this section is concerned about establishing the fact that Gaussian noise can act as an adversary while being perceptually feasible and not to particularly achieve state-of-the-art results on the task of adversarial attacks. The problem setup is as follows: given an image , whose predicted class is , the task is to add noise to such that the expected prediction score of the network of is . If such noise exists, we say the network is fooled in expectation. To keep the optimization and the number of variables manageable, we only consider the case of isotropic Gaussian distributions, i.e. . We define to avoid text clutter. In the following experiments, the two-stage linearization is performed around at for LeNet and for AlexNet.

Targeted Attacks.  On the MNIST dataset, we specify a target class and we construct a noise that can fool LeNet in expectation by solving the following optimization:

(7)
s.t.

Note that for any pair for which the previous objective is negative, the largest expected prediction among all classes occurs at the target class . In this experiment, we set and solve problem (7) with an interior-point solver. Note that the range of pixel values of MNIST images is . Figure 3 shows examples of noisy versions of an image from class that fool LeNet in expectation with multiple target classes (i.e. ). Not every target class is easily targeted with small because of the distance in their prediction scores. We verify that the constructed noise actually fools the network by sampling 10 samples from the learned distribution, passing each noisy input through LeNet, and verifying that at least of the predicted class flips are from to the target class .

Fig. 3: Targeted attacks. The figure shows noisy images that fool LeNet. The images from top-left to bottom-right are the original image from MNIST and the noisy versions classified as , , , , and , respectively.

Non-Targeted Attacks with -Pixel Support.  Inspired by the findings of some recent work [su2017one], we demonstrate that we can construct additive noise that only corrupts

of the pixels in an input image, but still changes the class prediction. Here, we use LeNet on MNIST and AlexNet on ImageNet. In this case, we do not specify the target class

but rather we optimize for the prediction scores of the correct class to be less than the maximum prediction score. The underlying optimization is formulated as follows:

(8)
s.t.

The optimization variable indicates the set of sparse pixels ( of the total number of pixels) in that will be corrupted, while the rest of pixels are set to . The locations of the corrupted pixels are randomly chosen and fixed before solving the optimization. Two experiments are conducted on few images, one on MNIST and the other on ImageNet. Figures 4 and 5 show examples of noisy images constructed by solving Equation (8) with