Minimum Uncertainty Based Detection of Adversaries in Deep Neural Networks

by   Fatemeh Sheikholeslami, et al.
University of Minnesota

Despite their unprecedented performance in various domains, utilization of Deep Neural Networks (DNNs) in safety-critical environments is severely limited in the presence of even small adversarial perturbations. The present work develops a randomized approach to detecting such perturbations based on minimum uncertainty metrics that rely on sampling at the hidden layers during the DNN inference stage. The sampling probabilities are designed for effective detection of the adversarially corrupted inputs. Being modular, the novel detector of adversaries can be conveniently employed by any pre-trained DNN at no extra training overhead. Selecting which units to sample per hidden layer entails quantifying the amount of DNN output uncertainty from the viewpoint of Bayesian neural networks, where the overall uncertainty is expressed in terms of its layer-wise components - what also promotes scalability. Sampling probabilities are then sought by minimizing uncertainty measures layer-by-layer, leading to a novel convex optimization problem that admits an exact solver with superlinear convergence rate. By simplifying the objective function, low-complexity approximate solvers are also developed. In addition to valuable insights, these approximations link the novel approach with state-of-the-art randomized adversarial detectors. The effectiveness of the novel detectors in the context of competing alternatives is highlighted through extensive tests for various types of adversarial attacks with variable levels of strength.


GraN: An Efficient Gradient-Norm Based Detector for Adversarial and Misclassified Examples

Deep neural networks (DNNs) are vulnerable to adversarial examples and o...

The Limitations of Deep Learning in Adversarial Settings

Deep learning takes advantage of large datasets and computationally effi...

Closeness and Uncertainty Aware Adversarial Examples Detection in Adversarial Machine Learning

Deep neural network (DNN) architectures are considered to be robust to r...

Real-time Over-the-air Adversarial Perturbations for Digital Communications using Deep Neural Networks

Deep neural networks (DNNs) are increasingly being used in a variety of ...

Noise Sensitivity-Based Energy Efficient and Robust Adversary Detection in Neural Networks

Neural networks have achieved remarkable performance in computer vision,...

LiBRe: A Practical Bayesian Approach to Adversarial Detection

Despite their appealing flexibility, deep neural networks (DNNs) are vul...

Effective Certification of Monotone Deep Equilibrium Models

Monotone Operator Equilibrium Models (monDEQs) represent a class of mode...

1 Introduction

Unprecedented learning capability offered by Deep Neural Networks (DNNs) has enabled state-of-the-art performance in diverse tasks such as object recognition and detection [1, 2, 3], speech recognition and language translation [4], voice synthesis [5], and many more, to reach or even surpass human-level accuracy. Despite their performance however, recent studies have cast doubt on the reliability of DNNs as highly-accurate networks are shown to be extremely sensitive to carefully crafted inputs designed to fool them [6, 7, 8]. Such fragility can easily lead to sabotage once adversarial entities target critical environments such as autonomous cars [9], automatic speech recognition [10]

, and face detection

[11, 12, 3]

. The extreme brittleness of convolutional neural networks (CNNs) for image classification is highlighted since small adversarial perturbations on the clean image, although often imperceptible to the human eye, can lead the trained CNNs to classify the

adversarial examples incorrectly with high confidence. In particular, design of powerful adversarial perturbations in environments with different levels of complexity and knowledge about the target CNN, known as white, grey, and black-box attacks, have been investigated in several works [7, 13, 14, 15, 16, 17]. These considerations motivate well the need for designing robust and powerful attack detection mechanisms for reliable and safe utilization of DNNs [18].

Defense against adversarial perturbations has been mainly pursued in two broad directions: (i) attack detection, and (ii) attack recovery. Methods in the first category aim at detecting adversarial corruption in the input by classifying the input as clean or adversarial, based on tools as diverse as auto-encoders [19], detection sub-networks [20, 21], and dropout units [22]. On the other hand, methods in the second category are based on recovery schemes that robustify the classification by data pre-processing [23, 24], adversarial training [25, 26, 27], sparsification of the network [28, 29] and Lipschitz regularization [30, 31], to name just a few.

Furthermore, the so-termed over-confidence of DNNs in classifying “out-of-distribution,” meaning samples which lie in unexplored regions of the input domain, or even “misclassified” samples, has been unraveled in [32, 33]. This has motivated the need for uncertainty estimation as well as calibration of the networks for robust classification. Modern Bayesian neural networks target this issue by modeling the distribution of DNN weights as random [34]

, and estimating the DNN output uncertainty through predictive entropy, variance, or mutual information

[35, 36, 22, 37]. The well-known dropout regularization technique is one such approximate Bayesian neural network, now widely used in training and testing of DNNs [38, 39].

Moreover, approaches relying on dropout units have shown promising performance in successfully detecting adversarial attacks, where other defense mechanisms fail [13]. In particular, [22] utilizes randomness of dropout units during the test phase as a defense mechanism, and approximates the classification uncertainty by Monte Carlo (MC) estimation of the output variance. Based on the latter, images with high classification uncertainty are declared as adversarial. Recently, dropout defense has been generalized to non-uniform sampling [40]

, where entries of the hidden-layers are randomly sampled, with probabilities proportional to the entry values. This heuristic sampling of units per layer is inspired by intuitive reasoning: activation units with large entries have more information and should be sampled more often 

[40]. However, analytical understanding and connections with the Bayesian framework have not been investigated.

The goal here is to further expand the understanding of uncertainty estimation in DNNs, and thereby improve the detection of adversarial inputs. The premise is that inherent distance of the adversarial perturbation from the natural-image manifold will cause the overall network uncertainty to exceed that of the clean image, and thus successful detection can be obtained.

To this end, and inspired by [40], we rely on random sampling of units per hidden layer of a pre-trained network to introduce randomness. Moreover, by leveraging the Bayesian approach to uncertainty estimation, the overall uncertainty of a given image is then quantified in terms of its hidden-layer components. We then formulate the task of adversary detection as uncertainty minimization by optimizing over the sampling probabilities to provide effective detection. Subsequently, we develop an exact solver with super-linear convergence rate as well as approximate low-complexity solvers for an efficient layer-by-layer uncertainty minimization scheme. Furthermore, we draw connections with uniform dropout [22] as well as stochastic approximate pruning (SAP) [40], and provide an efficient implementation of the novel approach by interpreting it as a non-uniform dropout scheme. Extensive numerical tests on CIFAR10 and high-quality cats-and-dogs images in the presence of various attack schemes corroborate the importance of our designs of sampling probabilities, as well as the placement of sampling units per hidden layer for improved detection of adversarial inputs.

The rest of the paper is organized as follows. An overview on Bayesian inference and detection in neural networks is provided in Section

2. The proposed class of detectors is introduced in Section 3, and exact as well as low-complexity approximate solvers for the layer-by-layer uncertainty minimization are the subjects of Section 4. Implementation issues are dealt with in Section 5, numerical tests are provided in Section 6, and concluding remarks are discussed in Section 7.

2 Bayesian neural network Preliminaries

Bayesian inference is among the powerful tools utilized for analytically understanding and quantifying uncertainty in DNNs [41, 39]. In this section, we provide a short review on the basics of Bayesian neural networks, and move on to the inference phase for adversary detection in Section 2.2, which is of primary interest in this work.

Consider an -layer deep neural network, which maps the input to output . The weights are denoted by

, and are modeled as random variables with prior probability density function (pdf)


Given training input and output data , it is assumed that the parameters only depend on these data. As a result, the predictive pdf for a new input can be obtained via marginalization as [38]


which requires knowing the conditional . The complexity of estimating motivates well the variational inference (VI) approach, where is replaced by a surrogate pdf that is parameterized by . For , it is desired to: (D1) approximate closely ; and, (D2) provide easy marginalization in (1) either in closed form or empirically. To meet (D1), the surrogate is chosen by minimizing the Kullback-Leibler (KL) divergence , which is subsequently approximated by the log evidence lower bound [42, p. 462]


Finding boils down to maximizing the log evidence lower bound, that is, . A common choice for to also satisfy (D2) is described next.

2.1 Variational inference

A simple yet effective choice for is a factored form modeling the weights as independent across layers, that is


where the -th layer with hidden units is modeled as


where is an

deterministic weight matrix multiplied by a diagonal matrix formed by the binary random vector

with entries drawn from a pmf parameterized by .

If the entries are i.i.d. Bernoulli with (identical) probability (w.p.) , they effect what is referred to as uniform (across layers and nodes) dropout, which is known to prevent overfitting [41]. Clearly, the parameter set fully characterizes . The dropout probability is preselected in practice, while can be obtained using the training data by maximizing the log evidence lower bound in (2). Nonetheless, integration in (2) over all the Bernoulli variables is analytically challenging, while sampling from the Bernoulli pmf is relatively cheap. This prompts approximate yet efficient integration using Monte Carlo estimation. A more detailed account of training Bayesian neural networks can be found in [38, 42, 43]. Moving on, the ensuing subsection deals with detection of adversarial inputs.

2.2 Bayesian detection of DNN adversaries

A Bayesian approach to detecting adversarial inputs during the testing phase proceeds by approximating the predictive pdf in (1) using the variational surrogate , as


Deciphering whether a given input

is adversarial entails three steps: (S1) parametric modeling of

; (S2) estimating the DNN output uncertainty captured by ; and (S3) declaring as adversarial if the output uncertainty exceeds a certain threshold, and clean otherwise. These steps are elaborated next.

Step 1: Parametric modeling of . Recall that uniform dropout offers a popular special class of pdfs, and has been employed in adversary detection [22]. Here, we specify the richer model of in (3) and (4) that will turn out to markedly improve detection performance. Different from uniform dropout, we will allow for (possibly correlated) Bernoulli variables with carefully selected (possibly non-identical) parameters. If such general can be obtained, matrices are then found as follows.

Let be deterministic weight matrices obtained via non-Bayesian training that we denote as (TR)111Such as back propagation based on e.g., a cross-entropy criterion.. We will use to specify the mean of the random weight matrix in our Bayesian approach, meaning we choose , where is the output of the st layer for a given input passing through the DNN with deterministic weights . With available, we first design ; next, we find ; and then , as


where the pseudo-inverse means that inverse entries are replaced with zeros if .

Step 2: Quantifying the DNN output uncertainty. Since evaluation of in (5

) is prohibitive, one can estimate it using MC sampling. In particular, one can readily obtain MC estimates of (conditional) moments of

. For instance, its mean and variance can be estimated as



where is the output of the -th DNN realized through weights with input . The predictive variance is the trace of that we henceforth abbreviate as . Given , the latter has been used to quantify output uncertainty as  [22]. Additional measures of uncertainty will be presented in the next section.

Step 3: Detecting adversarial inputs. Given , detection of adversarial inputs is cast as testing the hypotheses


where the null suggests absence of adversarial perturbation (low variance/uncertainty below threshold ), while the alternative in effect raises a red flag for presence of adversarial input (high variance/uncertainty above threshold ).

We will now proceed to introduce our novel variational distribution model targeting improved detection of adversaries based on uncertainty minimization.

3 Minimum Uncertainty based Detection

To design , we will build on and formalize the sampling scheme in [40]

that is employed to specify the joint pmf of the (generally correlated) binary variables

per layer . To this end, we randomly pick one activation unit output of the hidden units per layer ; and repeat such a random draw times with replacement. Let denote per draw the vector variable

where each entry is a binary random variable with

and the vector with nonegative entries summing up to specifies the Categorical pmf of .

With denoting element-wise binary OR operation on vectors , we define next the vector


Using as in (9) with to be selected, enables finding the expectation and then in (6). Deterministic matrix along with the variates provide the desired DNN realizations to estimate the uncertainty as in (7). In turn, this leads to our novel adversarial input detector (cf. (8))


where variational parameters are sought such that uncertainty is minimized under .

The rationale behind our detector in (10) is that given

, minimizing the uncertainty (test statistic) under

reduces the probability of false alarms. The probability of detection however, depends on test statistic pdf under , in which the adversarial perturbation is unknown in practice. The premise here is that due to network instability under , the sought probabilities will not reduce uncertainty under as effectively, thus the performance of (10) will be better than that of (8). To corroborate this, efficient solvers for the proposed minimization task, and extensive tests in lieu of analytical metrics, are in order.

3.1 Uncertainty measures

In order to carry the hypothesis test in (10), one has options for other than the conditional variance. For DNNs designed for classification, mutual information has been recently proposed as a measure of uncertainty [35]


where superscript indexes the pass of input through the th DNN realization with corresponding random output in a -class classification task, and is the entropy function222Entropy functions in (11) are also parameterized by , but we abbreviate them here as and .


The test statistic in (10) requires finding by solving


which is highly non-convex. However, using Taylor’s expansion of the logarithmic terms in (12), one can approximate the mutual information in (11) with the variance score in (10), where the conditioning on has been dropped for brevity [35]. As a result, the optimization in (13) is approximated as


To solve (14), one needs to express the objective in terms of the optimization variables for all layers explicitly. To this end, the following section studies a two-layer network, whose result will then be generalized to deeper models.

3.2 Simplification of the predictive variance

Aiming at a convenient expression for the cost in (14), consider first a two-layer network with input-output (I/O) relationship333Derivations in this section carry over readily to a more general I/O with and deterministic.


where are random matrices corresponding to the weights of the two layers as in (6), while is the softmax memoryless nonlinearity

with , and the inner in (15

) models a general differentiable nonlinearity such as tanh. Although differentiability of the nonlinearities is needed for the derivations in this section, the general idea will be later tested on networks with non-differentiable nonlinearities (such as ReLU) in the experiments.

Given trained weights , and using (4) and (6), the random weight matrices are found as


where denotes the random sampling matrix with pseudo-inverse diagonal mean given by . Since , the mean of does not depend on , while its higher-order moments do.

Proposition 1. For the two-layer network in (15), the proposed minimization in (14) can be approximated by


where is a constant. The solution of (17) proceeds in two steps

where .

Proof. See Appendix 8.1.

Remark. The cost in (17) approximates that in (14) by casting the overall uncertainty minimization as a weighted sum of layer-wise variances. In particular, is the sampling probability vector that minimizes variance score of the first layer. It subsequently influences the regularization scalar in minimizing the second layer variance, which yields the pmf vector . This can be inductively generalized to layers. As increases however, so do the number of cross terms. For simplicity and scalability, we will further approximate the per-layer minimization by dropping the regularization term, which leads to separable optimization across layers. This is an intuitively pleasing relaxation, because layer-wise variance is minimized under , which also minimizes the regularization weight .

The resulting non-regularized approximant of step 2 is

generalizing to the -th layer in an -layer DNN as


where is the output of the st layer, regardless of pmf vectors of other layers .

3.3 Layer-wise variance minimization

Here we will solve the layer-wise variance minimization in (18). Using (16), the cost can be upper bounded by


where the last equality follows because the draws are iid with replacement, and the binary random variables reduce to Bernoulli ones with parameter ; hence, for it holds that and , which implies that .

Using (19), the optimization in (18) can be approximately solved by a majorized surrogate as


which is a convex problem that can be solved efficiently as elaborated next.

4 Solving layer-by-layer minimization

Consider rewriting the layer-wise variance minimization in (20) in a general form as


where for the -th layer. Over the feasible set of the probability simplex, the cost in (21) has semi-definite Hessian; thus, it is convex, and can be solved by projected gradient descent iterations. However, lies in the probability simplex space of dimension , the number of hidden nodes in a given layer, and is typically very large. The large number of variables together with possible ill-conditioning can slow down the convergence rate.

To obtain a solver with quadratic convergence rate, we build on the fact that is usually very large, which implies that for the practical setting at hand. Using the inequality , the cost in (21) can then be tightly upperbounded, which leads to majorizing (21) as


The KKT conditions yield the optimal solution of the convex problem in (22), as summarized next.

Proposition 2. The optimization in (22) can be solved with quadratic convergence rate, and the optimum is given by


where is the solution to the following root-finding problem

Proof. See Appendix 8.2.

4.1 Approximate variance minimization for small

For small values of , it holds that ; hence, the Bernoulli parameter can be approximated by its upperbound . With this we can approximate the cost in (20), as


Using the Lagrangian and the KKT conditions, we then find , which for the -th layer is expressible as


This approximation provides analytical justification for the heuristic approach in [40], where it is proposed to sample with probabilities proportional to the magnitude of the hidden unit outputs. However, there remains a subtle difference, which will be clarified in Section 6.

Approximating (22) with (24) can be loose for large values of , which motivates our next approximation.

4.2 Approximate variance minimization for large

Building on the tight approximation in (22), one can further approximate the variance for large as

where we have used as a tight approximation for . This leads to the minimization

which again is a convex problem, whose solution can be obtained using the KKT conditions that lead to

where is the Lagrange multiplier. Under the simplex constraint on the , this leads to the optimal


with denoting the projection on the positive orthant, and the normalization constant having optimal value

Although the solution to the fixed point condition cannot be obtained at one shot, and may require a few iterations to converge, in practice we only perform it once and settle with the obtained approximate solution.

5 Practical issues

The present section deals with efficient implementation of the proposed approach in practice, and estabishes links with state-of-the-art Bayesian detection methods.

5.1 Efficient implementation via non-uniform dropout

The proposed defense builds on modeling the variational pdf using a sampling-with-replacement process. Performing the proposed process however, may incur overhead complexity during inference when compared to the inexpensive dropout alternative outlined in Sec. 2.1. To reduce this complexity, one can implement our approach using efficient approximations, while leveraging the sampling probabilities learned through our uncertainty minimization.

Reflecting on the binary variables that model the pickup of the hidden node in the overall sampling process in (9), one can approximate the joint pmf of as


where random variables are now viewed as approximately independent non-identical Bernoulli variables with parameters ; that is, Bernoulli for , where .

Although (27) is an approximation, it provides insight but also an efficient implementation of the sampling process. In fact, the proposed optimization in (21) can now be viewed as an optimization over the non-uniform dropout probabilities, coupled implicitly through the hyper-parameter , whose selection guarantees a certain level of randomness. This is to be contrasted with finding optimal dropout probabilities - a task requiring grid search over an -dimensional space for layer , where can be hundreds of thousands to millions in CNNs classifying high-quality images. Interestingly, the proposed convex optimization simplifies the high-dimensional grid-search into a scalar root-finding task, whose solution can be efficiently found with super-linear (quadratic) convergence rate.

5.2 Placement and adjustment of the sampling units

It has been argued that CNN layers at different depths can provide extracted features with variable levels of expressiveness [44]. On a par with this, one can envision the defense potential at different depths by incorporating sampling units across say blocks of the network as listed in Tables II and III. In particular, the dropout defense has been mostly utilized at the last layer after flattening [35], whereas here we consider the potential of sampling at earlier layers that has gone mostly under-explored so far. This can in turn result in Bayesian DNN-based classifiers with robustness to adversarial attacks, as optimal sampling at the initial layers maybe crucial for correct detection of the adversarial input. We henceforth refer to a DNN (or CNN) equipped with random sampling as the detection network, and the original one without the sampling units as the full network.

Similar to the pick up probability in uniform dropouts, the number of draws in our approach is a hyper parameter that controls the level of randomness present in the detection network. Qualitatively speaking, the smaller number of units (smaller ) is picked per layer, the larger ‘amount of randomness’ emerges (further is from ). This can lead to forward propagating not as informative (under-sampled) features, meaning not representative of the clean image, and can thus cause unreliable detection. A large on the other hand, increases the probability to pick up units per layer, which requires a large number of MC realizations for reliable detection, otherwise small randomness will lead to miss-detection. At the extreme, very large renders the detection and full networks identical, thus leading to unsuccessful detection of adversarial inputs. In a nutshell, there is a trade-off in selecting , potentially different for the initial, middle, and final layers of a given CNN.

Fig. 1 categorizes existing and the novel randomization-based approaches to detecting adversarial inputs.

[Bayesian approaches to detecting adversaries in DNNs [Uniform dropout
(Dropout),align=center] [Minimum uncertainty (variance) based
(non-uniform sampling),align=center [Fixed/Deterministic [Exact
(VM-exact), align=center , tier=bottom] [Linear apprx.
(VM-lin), align=center , tier=bottom] [Logarithmic apprx.
(VM-log), align=center , tier=bottom] ] [Dynamic [Linear apprx.
(SAP or DVM-lin) , align=center , tier=bottom] [Logarithmic apprx.
(DVM-log), align=center , tier=bottom] ] ] ]

Fig. 1: Overview of Bayesian adversary detection schemes

Uniform dropout. In this method, units are independently dropped w.p. , and sampled (picked) w.p. .

Non-uniform dropout using variance minimization. Dropout here follows the scheme in subsection 5.1, for which we pursue the following two general cases with deterministic and dynamic probabilities.

(C1) Variance minimization with fixed probabilities. In this case, the image is first passed through the full network to obtain the values of the unit outputs per hidden layer. These are needed to determine the non-uniform dropout probabilities (thus and then the index of the units to sample) via exact, linear, or logarithmic approximations given in (23), (25) and (26), respectively, refered to as VM-exact, VM-lin, and VM-log; see Fig. 2-a.

Despite parallel MC passes in the proposed class of sampling with fixed probabilities (step 3 in Fig. 2-a), the first step still imposes a serial overhead in detection since the wanted probabilities must be obtained using a pass through the full network. Our approach to circumventing this overhead is through approximation using the following class of sampling with dynamic probabilities.

(C2) Variance minimization with dynamic probabilities. Rather than finding the sampling probabilities beforehand, are determined on-the-fly as the image is passed through the detection network with the units sampled per layer. As a result, the observed unit values are random (after passing through at least one unit sampled), and are different across realizations. In order to mitigate solving many optimization problems, variance minimization with dynamic probabilities is only implemented via linear and logarithmic approximations (25) and (26); and are referred to as DVM-lin and DVM-log, respectively; see Fig. 2-b.

a) Detection via deterministic sampling probabilities b) Detection via dynamic sampling probabilities
Fig. 2: Schematic of the proposed detection schemes
Input : Test image , , and
1 Pass image through full network; find
2 Use to obtain via (23), (25) or (26)
3 for  do
4      Collect output class
5 end for
6Estimate the mutual information (MI) of
Output : Declare adversary if MI exceeds threshold
Algorithm 1 Adversary detection - fixed
Input : Test image , , and
1 for  do
2      Collect after passing through the detection network with units picked with dynamic probabilities obtained (exactly or approximately) using the observed values
3 end for
4Estimate the mutual information (MI) of
Output : Declare adversary if MI exceeds threshold
Algorithm 2 Adversary detection - dynamic

It is interesting to note that DVM-lin corresponds to the proposed stochastic activation pruning (SAP) in [40], with

where is the output of the -th activation unit of the -th layer in the -th realization for input .

Figure 1 provides an overview of the sampling methods, while Algorithms 1 and 2 outline the two proposed variance minimization-based detection methods in pseudocode.

6 Numerical tests

1 Solve:
Input : 
Output : Nonuniform dropout pmf
2 Using bisection and initialization , find the root for
3 Set
Algorithm 3 Layer-wise minimum variance solver

In this section, we test the effectiveness of the proposed Bayesian sampling method for detecting various adversarial attacks on CNNs used for image classification. In order to address the raised issue in [13], classification of the CIFAR10 image dataset using ResNet20 as well as the high-resolution cats-and-dogs images using ResNet34 networks [45] are tested. A short summary of the two networks and datasets can be found in Tables I, II and III. In order to investigate the issue around placement of the sampling units, we will place them after ReLU activation layers in different “blocks” () of the ResNet20 and ResNet34 networks, as listed in Tables II and III. Numerical tests are made available online.444

Dataset image size # train # val. # test
CIFAR10 32 x 32 50,000 2,000 8,000
Cats-and-dogs 224 x 224 10,000 2,000 13,000
TABLE I: CIFAR10 and cats-and-dogs image-classification datasets
name output-size 20 layers #sampling units
Block1 32 x 32 [ 3 x 3, 16] 1
Block2 32 x 32 6
Block3 16 x 16 6
Block4 8 x 8 6

average pool,
-d fully conn.,
Block5 1 x 1 1
TABLE II: ResNet20 architecture on CIFAR10 dataset
output-size 34 layers #sampling units
Block 1 112 x 112 [7 x 7, 64],

3x3 max-pool

Block2 56 x 56 6
Block3 28 x 28 8
Block4 14 x 14 12
Block5 7 x 7 6

average pool,
-d fc, softmax
Block 6 1 x 1 1
TABLE III: ResNet34 architecture on cats-and-dogs dataset

6.1 CIFAR10 dataset

ResNet20 is trained using epochs with minibatches of size . Adversarial inputs are crafted on the corresponding MC network as in [35], using the fast gradient sign method (FGSM) [46], the basic iterative method (BIM) [47], the momentum iterative method (MIM) [48], and the Carlini-and-Wagner (C&W) [14] attacks. Parameters of the attacks as well as test accuracy of the MC network on clean and adversarial inputs are reported in Table IV.

Placement parameter and sampling parameters for variance minimization methods as well as the dropout probability for uniform dropout are selected by cross validation. To clarify the suboptimality gap between the exact and approximate variance minimization with deterministic sampling probabilities, we have cross-validated the parameters for VM-exact, and reused them for VM-lin and VM-log approximates.

The sampling parameter is selected as for the -th layer sampling unit, where denotes the number of non-zero entries555This selection is chosen by taking into account the fact that, only non-zero samples will be dropped upon not being selected, while zero entries will remain unchanged regardless of the sampling outcome., and is the sampling ratio varied in .666Since the sampling procedure is modeled with replacement, fraction may be selected greater than 100%. Probability in uniform dropout is also varied as , and the number of MC runs is .

In order to properly evaluate accuracy in detection of adversarial images, we only aim at detecting the test samples that are correctly classified by the full network, and misclassified after the adversarial perturbation. The detection performance is then reported in terms of the receiver operating characteristic (ROC) curve in Fig. 3, obtained by varying the threshold parameter . The exact area-under-curve values along with parameters are also reported in Tables V and VI, highlighting the improved detection via the proposed variance minimization approach.

a) FGSM attack with b) MIM attack with c) BIM attack with d) FGSM attack with e) MIM attack with f) BIM attack with g) C&W attack h) Combination attack
Fig. 3: ROC-curve of different attack-detection sampling schemes on CIFAR10 dataset against different attacks.

Furthermore, in order to target more realistic scenaria, where attack generation is unknown and may indeed be crafted via various methods, we have also tested the performance against a “combination attack,” in which the adversarial input crafted with all 7 settings of attacks are considered. This indeed corroborates that placement of the sampling units in the fourth block along with careful tuning of the sampling probabilities via VM-exact provides the highest curve against combination of attacks, while its approximations follow in performance, outperforming uniform dropout. For further discussion on sensitivity against parameter selection, see Appendix 8.3.

norm: norm: # binary search: 10
Attack # iter: 20 # iter: 20 #max iter: 20
parameters : : learning rate:0.1
initial const.: 10
Class. Acc. 91.5% 64.87% 56.91% 5.2% 5.0% 5.4% 5.1% 11.7%
TABLE IV: Attack parameters and test accuracy on clean and adversarial input in CIFAR10 dataset.
FGSM Attack MIM Attack

Method Parameters AUC Parameters AUC Parameters AUC Parameters AUC

81.9 88.7 74.4 81.0
VM-log 79.3 84.3 71.4 78.3
VM-linear 77.9 84.5 71.8 77.7
DVM-log 78.4 83.0 70.3 75.8

79.3 85.3 73.8 79.1
Dropout 77.0