1 Introduction
Unprecedented learning capability offered by Deep Neural Networks (DNNs) has enabled stateoftheart performance in diverse tasks such as object recognition and detection [1, 2, 3], speech recognition and language translation [4], voice synthesis [5], and many more, to reach or even surpass humanlevel accuracy. Despite their performance however, recent studies have cast doubt on the reliability of DNNs as highlyaccurate networks are shown to be extremely sensitive to carefully crafted inputs designed to fool them [6, 7, 8]. Such fragility can easily lead to sabotage once adversarial entities target critical environments such as autonomous cars [9], automatic speech recognition [10]
, and face detection
[11, 12, 3]. The extreme brittleness of convolutional neural networks (CNNs) for image classification is highlighted since small adversarial perturbations on the clean image, although often imperceptible to the human eye, can lead the trained CNNs to classify the
adversarial examples incorrectly with high confidence. In particular, design of powerful adversarial perturbations in environments with different levels of complexity and knowledge about the target CNN, known as white, grey, and blackbox attacks, have been investigated in several works [7, 13, 14, 15, 16, 17]. These considerations motivate well the need for designing robust and powerful attack detection mechanisms for reliable and safe utilization of DNNs [18].Defense against adversarial perturbations has been mainly pursued in two broad directions: (i) attack detection, and (ii) attack recovery. Methods in the first category aim at detecting adversarial corruption in the input by classifying the input as clean or adversarial, based on tools as diverse as autoencoders [19], detection subnetworks [20, 21], and dropout units [22]. On the other hand, methods in the second category are based on recovery schemes that robustify the classification by data preprocessing [23, 24], adversarial training [25, 26, 27], sparsification of the network [28, 29] and Lipschitz regularization [30, 31], to name just a few.
Furthermore, the sotermed overconfidence of DNNs in classifying “outofdistribution,” meaning samples which lie in unexplored regions of the input domain, or even “misclassified” samples, has been unraveled in [32, 33]. This has motivated the need for uncertainty estimation as well as calibration of the networks for robust classification. Modern Bayesian neural networks target this issue by modeling the distribution of DNN weights as random [34]
, and estimating the DNN output uncertainty through predictive entropy, variance, or mutual information
[35, 36, 22, 37]. The wellknown dropout regularization technique is one such approximate Bayesian neural network, now widely used in training and testing of DNNs [38, 39].Moreover, approaches relying on dropout units have shown promising performance in successfully detecting adversarial attacks, where other defense mechanisms fail [13]. In particular, [22] utilizes randomness of dropout units during the test phase as a defense mechanism, and approximates the classification uncertainty by Monte Carlo (MC) estimation of the output variance. Based on the latter, images with high classification uncertainty are declared as adversarial. Recently, dropout defense has been generalized to nonuniform sampling [40]
, where entries of the hiddenlayers are randomly sampled, with probabilities proportional to the entry values. This heuristic sampling of units per layer is inspired by intuitive reasoning: activation units with large entries have more information and should be sampled more often
[40]. However, analytical understanding and connections with the Bayesian framework have not been investigated.The goal here is to further expand the understanding of uncertainty estimation in DNNs, and thereby improve the detection of adversarial inputs. The premise is that inherent distance of the adversarial perturbation from the naturalimage manifold will cause the overall network uncertainty to exceed that of the clean image, and thus successful detection can be obtained.
To this end, and inspired by [40], we rely on random sampling of units per hidden layer of a pretrained network to introduce randomness. Moreover, by leveraging the Bayesian approach to uncertainty estimation, the overall uncertainty of a given image is then quantified in terms of its hiddenlayer components. We then formulate the task of adversary detection as uncertainty minimization by optimizing over the sampling probabilities to provide effective detection. Subsequently, we develop an exact solver with superlinear convergence rate as well as approximate lowcomplexity solvers for an efficient layerbylayer uncertainty minimization scheme. Furthermore, we draw connections with uniform dropout [22] as well as stochastic approximate pruning (SAP) [40], and provide an efficient implementation of the novel approach by interpreting it as a nonuniform dropout scheme. Extensive numerical tests on CIFAR10 and highquality catsanddogs images in the presence of various attack schemes corroborate the importance of our designs of sampling probabilities, as well as the placement of sampling units per hidden layer for improved detection of adversarial inputs.
The rest of the paper is organized as follows. An overview on Bayesian inference and detection in neural networks is provided in Section
2. The proposed class of detectors is introduced in Section 3, and exact as well as lowcomplexity approximate solvers for the layerbylayer uncertainty minimization are the subjects of Section 4. Implementation issues are dealt with in Section 5, numerical tests are provided in Section 6, and concluding remarks are discussed in Section 7.2 Bayesian neural network Preliminaries
Bayesian inference is among the powerful tools utilized for analytically understanding and quantifying uncertainty in DNNs [41, 39]. In this section, we provide a short review on the basics of Bayesian neural networks, and move on to the inference phase for adversary detection in Section 2.2, which is of primary interest in this work.
Consider an layer deep neural network, which maps the input to output . The weights are denoted by
, and are modeled as random variables with prior probability density function (pdf)
.Given training input and output data , it is assumed that the parameters only depend on these data. As a result, the predictive pdf for a new input can be obtained via marginalization as [38]
(1) 
which requires knowing the conditional . The complexity of estimating motivates well the variational inference (VI) approach, where is replaced by a surrogate pdf that is parameterized by . For , it is desired to: (D1) approximate closely ; and, (D2) provide easy marginalization in (1) either in closed form or empirically. To meet (D1), the surrogate is chosen by minimizing the KullbackLeibler (KL) divergence , which is subsequently approximated by the log evidence lower bound [42, p. 462]
(2) 
Finding boils down to maximizing the log evidence lower bound, that is, . A common choice for to also satisfy (D2) is described next.
2.1 Variational inference
A simple yet effective choice for is a factored form modeling the weights as independent across layers, that is
(3) 
where the th layer with hidden units is modeled as
(4) 
where is an
deterministic weight matrix multiplied by a diagonal matrix formed by the binary random vector
with entries drawn from a pmf parameterized by .If the entries are i.i.d. Bernoulli with (identical) probability (w.p.) , they effect what is referred to as uniform (across layers and nodes) dropout, which is known to prevent overfitting [41]. Clearly, the parameter set fully characterizes . The dropout probability is preselected in practice, while can be obtained using the training data by maximizing the log evidence lower bound in (2). Nonetheless, integration in (2) over all the Bernoulli variables is analytically challenging, while sampling from the Bernoulli pmf is relatively cheap. This prompts approximate yet efficient integration using Monte Carlo estimation. A more detailed account of training Bayesian neural networks can be found in [38, 42, 43]. Moving on, the ensuing subsection deals with detection of adversarial inputs.
2.2 Bayesian detection of DNN adversaries
A Bayesian approach to detecting adversarial inputs during the testing phase proceeds by approximating the predictive pdf in (1) using the variational surrogate , as
(5) 
Deciphering whether a given input
is adversarial entails three steps: (S1) parametric modeling of
; (S2) estimating the DNN output uncertainty captured by ; and (S3) declaring as adversarial if the output uncertainty exceeds a certain threshold, and clean otherwise. These steps are elaborated next.Step 1: Parametric modeling of . Recall that uniform dropout offers a popular special class of pdfs, and has been employed in adversary detection [22]. Here, we specify the richer model of in (3) and (4) that will turn out to markedly improve detection performance. Different from uniform dropout, we will allow for (possibly correlated) Bernoulli variables with carefully selected (possibly nonidentical) parameters. If such general can be obtained, matrices are then found as follows.
Let be deterministic weight matrices obtained via nonBayesian training that we denote as (TR)^{1}^{1}1Such as back propagation based on e.g., a crossentropy criterion.. We will use to specify the mean of the random weight matrix in our Bayesian approach, meaning we choose , where is the output of the st layer for a given input passing through the DNN with deterministic weights . With available, we first design ; next, we find ; and then , as
(6) 
where the pseudoinverse means that inverse entries are replaced with zeros if .
Step 2: Quantifying the DNN output uncertainty. Since evaluation of in (5
) is prohibitive, one can estimate it using MC sampling. In particular, one can readily obtain MC estimates of (conditional) moments of
. For instance, its mean and variance can be estimated asand
(7) 
where is the output of the th DNN realized through weights with input . The predictive variance is the trace of that we henceforth abbreviate as . Given , the latter has been used to quantify output uncertainty as [22]. Additional measures of uncertainty will be presented in the next section.
Step 3: Detecting adversarial inputs. Given , detection of adversarial inputs is cast as testing the hypotheses
(8) 
where the null suggests absence of adversarial perturbation (low variance/uncertainty below threshold ), while the alternative in effect raises a red flag for presence of adversarial input (high variance/uncertainty above threshold ).
We will now proceed to introduce our novel variational distribution model targeting improved detection of adversaries based on uncertainty minimization.
3 Minimum Uncertainty based Detection
To design , we will build on and formalize the sampling scheme in [40]
that is employed to specify the joint pmf of the (generally correlated) binary variables
per layer . To this end, we randomly pick one activation unit output of the hidden units per layer ; and repeat such a random draw times with replacement. Let denote per draw the vector variablewhere each entry is a binary random variable with
and the vector with nonegative entries summing up to specifies the Categorical pmf of .
With denoting elementwise binary OR operation on vectors , we define next the vector
(9) 
Using as in (9) with to be selected, enables finding the expectation and then in (6). Deterministic matrix along with the variates provide the desired DNN realizations to estimate the uncertainty as in (7). In turn, this leads to our novel adversarial input detector (cf. (8))
(10) 
where variational parameters are sought such that uncertainty is minimized under .
The rationale behind our detector in (10) is that given
, minimizing the uncertainty (test statistic) under
reduces the probability of false alarms. The probability of detection however, depends on test statistic pdf under , in which the adversarial perturbation is unknown in practice. The premise here is that due to network instability under , the sought probabilities will not reduce uncertainty under as effectively, thus the performance of (10) will be better than that of (8). To corroborate this, efficient solvers for the proposed minimization task, and extensive tests in lieu of analytical metrics, are in order.3.1 Uncertainty measures
In order to carry the hypothesis test in (10), one has options for other than the conditional variance. For DNNs designed for classification, mutual information has been recently proposed as a measure of uncertainty [35]
(11) 
where superscript indexes the pass of input through the th DNN realization with corresponding random output in a class classification task, and is the entropy function^{2}^{2}2Entropy functions in (11) are also parameterized by , but we abbreviate them here as and .
(12) 
The test statistic in (10) requires finding by solving
(13) 
which is highly nonconvex. However, using Taylor’s expansion of the logarithmic terms in (12), one can approximate the mutual information in (11) with the variance score in (10), where the conditioning on has been dropped for brevity [35]. As a result, the optimization in (13) is approximated as
(14) 
To solve (14), one needs to express the objective in terms of the optimization variables for all layers explicitly. To this end, the following section studies a twolayer network, whose result will then be generalized to deeper models.
3.2 Simplification of the predictive variance
Aiming at a convenient expression for the cost in (14), consider first a twolayer network with inputoutput (I/O) relationship^{3}^{3}3Derivations in this section carry over readily to a more general I/O with and deterministic.
(15) 
where are random matrices corresponding to the weights of the two layers as in (6), while is the softmax memoryless nonlinearity
with , and the inner in (15
) models a general differentiable nonlinearity such as tanh. Although differentiability of the nonlinearities is needed for the derivations in this section, the general idea will be later tested on networks with nondifferentiable nonlinearities (such as ReLU) in the experiments.
Given trained weights , and using (4) and (6), the random weight matrices are found as
(16) 
where denotes the random sampling matrix with pseudoinverse diagonal mean given by . Since , the mean of does not depend on , while its higherorder moments do.
Proposition 1. For the twolayer network in (15), the proposed minimization in (14) can be approximated by
(17)  
where is a constant. The solution of (17) proceeds in two steps
where .
Proof. See Appendix 8.1.
Remark. The cost in (17) approximates that in (14) by casting the overall uncertainty minimization as a weighted sum of layerwise variances. In particular, is the sampling probability vector that minimizes variance score of the first layer. It subsequently influences the regularization scalar in minimizing the second layer variance, which yields the pmf vector . This can be inductively generalized to layers. As increases however, so do the number of cross terms. For simplicity and scalability, we will further approximate the perlayer minimization by dropping the regularization term, which leads to separable optimization across layers. This is an intuitively pleasing relaxation, because layerwise variance is minimized under , which also minimizes the regularization weight .
The resulting nonregularized approximant of step 2 is
generalizing to the th layer in an layer DNN as
(18) 
where is the output of the st layer, regardless of pmf vectors of other layers .
3.3 Layerwise variance minimization
Here we will solve the layerwise variance minimization in (18). Using (16), the cost can be upper bounded by
Tr  
(19) 
where the last equality follows because the draws are iid with replacement, and the binary random variables reduce to Bernoulli ones with parameter ; hence, for it holds that and , which implies that .
4 Solving layerbylayer minimization
Consider rewriting the layerwise variance minimization in (20) in a general form as
(21) 
where for the th layer. Over the feasible set of the probability simplex, the cost in (21) has semidefinite Hessian; thus, it is convex, and can be solved by projected gradient descent iterations. However, lies in the probability simplex space of dimension , the number of hidden nodes in a given layer, and is typically very large. The large number of variables together with possible illconditioning can slow down the convergence rate.
To obtain a solver with quadratic convergence rate, we build on the fact that is usually very large, which implies that for the practical setting at hand. Using the inequality , the cost in (21) can then be tightly upperbounded, which leads to majorizing (21) as
(22) 
The KKT conditions yield the optimal solution of the convex problem in (22), as summarized next.
Proposition 2. The optimization in (22) can be solved with quadratic convergence rate, and the optimum is given by
(23) 
where is the solution to the following rootfinding problem
Proof. See Appendix 8.2.
4.1 Approximate variance minimization for small
For small values of , it holds that ; hence, the Bernoulli parameter can be approximated by its upperbound . With this we can approximate the cost in (20), as
(24) 
Using the Lagrangian and the KKT conditions, we then find , which for the th layer is expressible as
(25) 
4.2 Approximate variance minimization for large
Building on the tight approximation in (22), one can further approximate the variance for large as
where we have used as a tight approximation for . This leads to the minimization
which again is a convex problem, whose solution can be obtained using the KKT conditions that lead to
where is the Lagrange multiplier. Under the simplex constraint on the , this leads to the optimal
(26) 
with denoting the projection on the positive orthant, and the normalization constant having optimal value
Although the solution to the fixed point condition cannot be obtained at one shot, and may require a few iterations to converge, in practice we only perform it once and settle with the obtained approximate solution.
5 Practical issues
The present section deals with efficient implementation of the proposed approach in practice, and estabishes links with stateoftheart Bayesian detection methods.
5.1 Efficient implementation via nonuniform dropout
The proposed defense builds on modeling the variational pdf using a samplingwithreplacement process. Performing the proposed process however, may incur overhead complexity during inference when compared to the inexpensive dropout alternative outlined in Sec. 2.1. To reduce this complexity, one can implement our approach using efficient approximations, while leveraging the sampling probabilities learned through our uncertainty minimization.
Reflecting on the binary variables that model the pickup of the hidden node in the overall sampling process in (9), one can approximate the joint pmf of as
(27) 
where random variables are now viewed as approximately independent nonidentical Bernoulli variables with parameters ; that is, Bernoulli for , where .
Although (27) is an approximation, it provides insight but also an efficient implementation of the sampling process. In fact, the proposed optimization in (21) can now be viewed as an optimization over the nonuniform dropout probabilities, coupled implicitly through the hyperparameter , whose selection guarantees a certain level of randomness. This is to be contrasted with finding optimal dropout probabilities  a task requiring grid search over an dimensional space for layer , where can be hundreds of thousands to millions in CNNs classifying highquality images. Interestingly, the proposed convex optimization simplifies the highdimensional gridsearch into a scalar rootfinding task, whose solution can be efficiently found with superlinear (quadratic) convergence rate.
5.2 Placement and adjustment of the sampling units
It has been argued that CNN layers at different depths can provide extracted features with variable levels of expressiveness [44]. On a par with this, one can envision the defense potential at different depths by incorporating sampling units across say blocks of the network as listed in Tables II and III. In particular, the dropout defense has been mostly utilized at the last layer after flattening [35], whereas here we consider the potential of sampling at earlier layers that has gone mostly underexplored so far. This can in turn result in Bayesian DNNbased classifiers with robustness to adversarial attacks, as optimal sampling at the initial layers maybe crucial for correct detection of the adversarial input. We henceforth refer to a DNN (or CNN) equipped with random sampling as the detection network, and the original one without the sampling units as the full network.
Similar to the pick up probability in uniform dropouts, the number of draws in our approach is a hyper parameter that controls the level of randomness present in the detection network. Qualitatively speaking, the smaller number of units (smaller ) is picked per layer, the larger ‘amount of randomness’ emerges (further is from ). This can lead to forward propagating not as informative (undersampled) features, meaning not representative of the clean image, and can thus cause unreliable detection. A large on the other hand, increases the probability to pick up units per layer, which requires a large number of MC realizations for reliable detection, otherwise small randomness will lead to missdetection. At the extreme, very large renders the detection and full networks identical, thus leading to unsuccessful detection of adversarial inputs. In a nutshell, there is a tradeoff in selecting , potentially different for the initial, middle, and final layers of a given CNN.
Fig. 1 categorizes existing and the novel randomizationbased approaches to detecting adversarial inputs.
Uniform dropout. In this method, units are independently dropped w.p. , and sampled (picked) w.p. .
Nonuniform dropout using variance minimization. Dropout here follows the scheme in subsection 5.1, for which we pursue the following two general cases with deterministic and dynamic probabilities.
(C1) Variance minimization with fixed probabilities. In this case, the image is first passed through the full network to obtain the values of the unit outputs per hidden layer. These are needed to determine the nonuniform dropout probabilities (thus and then the index of the units to sample) via exact, linear, or logarithmic approximations given in (23), (25) and (26), respectively, refered to as VMexact, VMlin, and VMlog; see Fig. 2a.
Despite parallel MC passes in the proposed class of sampling with fixed probabilities (step 3 in Fig. 2a), the first step still imposes a serial overhead in detection since the wanted probabilities must be obtained using a pass through the full network. Our approach to circumventing this overhead is through approximation using the following class of sampling with dynamic probabilities.
(C2) Variance minimization with dynamic probabilities. Rather than finding the sampling probabilities beforehand, are determined onthefly as the image is passed through the detection network with the units sampled per layer. As a result, the observed unit values are random (after passing through at least one unit sampled), and are different across realizations. In order to mitigate solving many optimization problems, variance minimization with dynamic probabilities is only implemented via linear and logarithmic approximations (25) and (26); and are referred to as DVMlin and DVMlog, respectively; see Fig. 2b.
It is interesting to note that DVMlin corresponds to the proposed stochastic activation pruning (SAP) in [40], with
where is the output of the th activation unit of the th layer in the th realization for input .
6 Numerical tests
In this section, we test the effectiveness of the proposed Bayesian sampling method for detecting various adversarial attacks on CNNs used for image classification. In order to address the raised issue in [13], classification of the CIFAR10 image dataset using ResNet20 as well as the highresolution catsanddogs images using ResNet34 networks [45] are tested. A short summary of the two networks and datasets can be found in Tables I, II and III. In order to investigate the issue around placement of the sampling units, we will place them after ReLU activation layers in different “blocks” () of the ResNet20 and ResNet34 networks, as listed in Tables II and III. Numerical tests are made available online.^{4}^{4}4https://github.com/FatemehSheikholeslami/varianceminimization
Dataset  image size  # train  # val.  # test 

CIFAR10  32 x 32  50,000  2,000  8,000 
Catsanddogs  224 x 224  10,000  2,000  13,000 
name  outputsize  20 layers  #sampling units 
Block1  32 x 32  [ 3 x 3, 16]  1 
Block2  32 x 32  6  
Block3  16 x 16  6  
Block4  8 x 8  6  

average pool,
d fully conn., softmax 

Block5  1 x 1  1 
outputsize  34 layers  #sampling units  
Block 1  112 x 112  [7 x 7, 64],
3x3 maxpool 
2 
Block2  56 x 56  6  
Block3  28 x 28  8  
Block4  14 x 14  12  
Block5  7 x 7  6  

average pool,
d fc, softmax 

Block 6  1 x 1  1 
6.1 CIFAR10 dataset
ResNet20 is trained using epochs with minibatches of size . Adversarial inputs are crafted on the corresponding MC network as in [35], using the fast gradient sign method (FGSM) [46], the basic iterative method (BIM) [47], the momentum iterative method (MIM) [48], and the CarliniandWagner (C&W) [14] attacks. Parameters of the attacks as well as test accuracy of the MC network on clean and adversarial inputs are reported in Table IV.
Placement parameter and sampling parameters for variance minimization methods as well as the dropout probability for uniform dropout are selected by cross validation. To clarify the suboptimality gap between the exact and approximate variance minimization with deterministic sampling probabilities, we have crossvalidated the parameters for VMexact, and reused them for VMlin and VMlog approximates.
The sampling parameter is selected as for the th layer sampling unit, where denotes the number of nonzero entries^{5}^{5}5This selection is chosen by taking into account the fact that, only nonzero samples will be dropped upon not being selected, while zero entries will remain unchanged regardless of the sampling outcome., and is the sampling ratio varied in .^{6}^{6}6Since the sampling procedure is modeled with replacement, fraction may be selected greater than 100%. Probability in uniform dropout is also varied as , and the number of MC runs is .
In order to properly evaluate accuracy in detection of adversarial images, we only aim at detecting the test samples that are correctly classified by the full network, and misclassified after the adversarial perturbation. The detection performance is then reported in terms of the receiver operating characteristic (ROC) curve in Fig. 3, obtained by varying the threshold parameter . The exact areaundercurve values along with parameters are also reported in Tables V and VI, highlighting the improved detection via the proposed variance minimization approach.
Furthermore, in order to target more realistic scenaria, where attack generation is unknown and may indeed be crafted via various methods, we have also tested the performance against a “combination attack,” in which the adversarial input crafted with all 7 settings of attacks are considered. This indeed corroborates that placement of the sampling units in the fourth block along with careful tuning of the sampling probabilities via VMexact provides the highest curve against combination of attacks, while its approximations follow in performance, outperforming uniform dropout. For further discussion on sensitivity against parameter selection, see Appendix 8.3.
clean  FGSM  BIM  MIM  C&W  
norm:  norm:  # binary search: 10  
Attack  –  –  # iter: 20  # iter: 20  #max iter: 20  
parameters  :  :  learning rate:0.1  
initial const.: 10  
Class. Acc.  91.5%  64.87%  56.91%  5.2%  5.0%  5.4%  5.1%  11.7% 
FGSM Attack  MIM Attack  

Sampling 

Method  Parameters  AUC  Parameters  AUC  Parameters  AUC  Parameters  AUC 
VM 
81.9  88.7  74.4  81.0  
VMlog  79.3  84.3  71.4  78.3  
VMlinear  77.9  84.5  71.8  77.7  
DVMlog  78.4  83.0  70.3  75.8  
SAP 
79.3  85.3  73.8  79.1  
Dropout  77.0 