Log In Sign Up

Theoretical evidence for adversarial robustness through randomization: the case of the Exponential family

by   Rafael Pinot, et al.

This paper investigates the theory of robustness against adversarial attacks. It focuses on the family of randomization techniques that consist in injecting noise in the network at inference time. These techniques have proven effective in many contexts, but lack theoretical arguments. We close this gap by presenting a theoretical analysis of these approaches, hence explaining why they perform well in practice. More precisely, we provide the first result relating the randomization rate to robustness to adversarial attacks. This result applies for the general family of exponential distributions, and thus extends and unifies the previous approaches. We support our theoretical claims with a set of experiments.


page 1

page 2

page 3

page 4


Randomization matters. How to defend against strong adversarial attacks

Is there a classifier that ensures optimal robustness against all advers...

Weight Map Layer for Noise and Adversarial Attack Robustness

Convolutional neural networks (CNNs) are known for their good performanc...

Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem

Recent research on the robustness of deep learning has shown that Vision...

Game Theory for Adversarial Attacks and Defenses

Adversarial attacks can generate adversarial inputs by applying small bu...

Robustification of deep net classifiers by key based diversified aggregation with pre-filtering

In this paper, we address a problem of machine learning system vulnerabi...

Towards a Theoretical Understanding of the Robustness of Variational Autoencoders

We make inroads into understanding the robustness of Variational Autoenc...

Inference via Randomized Test Statistics

We show that external randomization may enforce the convergence of test ...

1 Introduction

Adversarial attacks are some of the most puzzling and burning issues in modern machine learning. An adversarial attack refers to a small, imperceptible change of an input maliciously designed to fool the result of a machine learning algorithm. Since the seminal work of (Szegedy et al., 2014)

exhibiting this intriguing phenomenon in the context of deep learning, a wealth of results have been published on designing attacks 

(Goodfellow et al., 2015; Papernot et al., 2016a; Moosavi-Dezfooli et al., 2016; Kurakin et al., 2016; Carlini & Wagner, 2017; Moosavi-Dezfooli et al., 2017) and defenses (Goodfellow et al., 2015; Papernot et al., 2016b; Guo et al., 2018; Meng & Chen, 2017; Samangouei et al., 2018; Madry et al., 2018)), or on trying to understand the very nature of this phenomenon (Fawzi et al., 2018b; Simon-Gabriel et al., 2018; Fawzi et al., 2018a, 2016).

Among the defense strategies, randomization has proven effective in many contexts. It consists in injecting random noise (both during training and inference phases) inside the network architecture, i.e. at a given layer of the network, including the input. The noise can be drawn either from Gaussian (Liu et al., 2018; Lecuyer et al., 2018; Rakin et al., 2018), Laplace (Lecuyer et al., 2018), Uniform with fixed parameter (Xie et al., 2018), or Multinomial with fixed parameter (Dhillon et al., 2018) distributions. All these works, despite coming with relevant ideas and showing good empirical results, lack theoretical justifications and guarantees. It is also worth noting that almost all the used noise distributions belong to the Exponential family. This raises the following questions: why randomization works well in practice? what is profound in using the Exponential family?, and to what extent a noise drawn from the Exponential family preserves robustness (in a sense to be defined) to adversarial attacks. Our work answers these questions. The following informal theorem (see Theorem 3 for a more formal version) summarizes our main result:

Theorem (Main result-informal).

Let us consider

a Neural Network with

layers. For any and for any perturbation we denote the Network truncated at the -th layer, and the sensitivity of to . If at prediction time, we inject noise drawn from an Exponential family with parameter , then the sensitivity of the network is in with a non-decreasing function. Therefore, for a small enough (in norm) parameter , the effect of the perturbation to the network can be made negligible.

This theorem provides a sensitivity bound on the output of the network for two close inputs. In that sense, it provides stability guarantees to the network: two close inputs will have close output distributions. To prove this theorem, several milestones should be achieved. First, we need to introduce a definition of robustness to adversarial attacks that is suitable to the randomization defense mechanism. As this mechanism can be mathematically described as a non-deterministic querying process, called probabilistic mapping in the sequel, we propose a formal definition of robustness relying on a metric/divergence between probability measures. A key question arises then about the appropriate metric/divergence for our context. This requires tools for comparing divergences with respect to the introduced robustness definition. Renyi divergence reveals to be a measure of choice, since it satisfies most of the desired properties.

The outline of the paper is hence as follows: Section 2 presents the related work on the randomized defense mechanisms. Section 3 details our main results. It introduces a definition for measuring robustness to adversarial attacks in Subsection 3.2, discusses the choice of divergence measures in Subsection 3.3, and states our main theorem in Subsection 3.4. Finally, Section 4 presents experiments supporting our theoretical claims. To ensure the conciseness, proofs have been pushed into the supplementary materials.

2 Related work

Even though several explicit or implicit definitions can be found in the literature, e.g. (Szegedy et al., 2014; Fawzi et al., 2018a; Bubeck et al., 2018), there is no broadly accepted definition of robustness to adversarial examples attacks. Recently (Diochnos et al., 2018) proposed general definitions and a taxonomy of these. The authors divide the definitions from the literature into three categories: error-region, prediction-change and corrupted instance. In this paper we introduce a definition of robustness that generalizes the one of prediction-change, in the sense that it relies on probabilistic mappings in arbitrary metric spaces, and is not restricted to classification tasks, as discussed in the sequel.

Noise injection into algorithms to enhance robustness has been used for ages in detection and signal processing tasks, for example with a physical phenomenon called “stochastic resonance” (Zozor & Amblard, 1999; Chapeau-Blondeau & Rousseau, 2004; Mitaim & Kosko, 1998). It has also been extensively studied in several machine learning and optimization fields, e.g. robust optimization (Ben-Tal et al., 2009) and data augmentation techniques (Perez & Wang, 2017). Recently, noise injection techniques have been adopted by the adversarial defense community, especially for neural networks, with very promising results. The first technique explicitly using randomization at inference time as a defense appeared in 2017 during the NIPS defense challenge (Xie et al., 2018). This method uniformly samples from over 12000 geometric transformations of the image to select a substitute image to feed the network. Then (Dhillon et al., 2018) proposed to use stochastic activation pruning based on a multinomial distribution for adversarial defense.

Gaussian noise injection has also been well investigated. Recent papers (Liu et al., 2018; Rakin et al., 2018) propose to inject Gaussian noise directly on the activation of selected layers both at training and inference time. In (Lecuyer et al., 2018), the authors proposed a randomization method by exploiting the link between differential privacy (Dwork et al., 2014) and adversarial robustness. Their framework inheriting some theoretical results from the differential privacy work, is based on injecting Laplace or Gaussian noise at training and inference time. In general, noise drawn from continuous distributions is used to alter the activation of one layer or more, whereas noise drawn from discrete distributions is used to alter either the image or the architecture of the network. However efficient in practice, these methods lack theoretical arguments on every part of the procedure (when/where to inject noise, what noise to use, etc.).

Since the initial discovery of adversarial examples, a wealth of non randomized defense approaches have been proposed, inspired by various machine learning domains such as image reconstruction (Meng & Chen, 2017; Samangouei et al., 2018) or robust learning (Goodfellow et al., 2015; Madry et al., 2018). Even if these methods have their own merits, they fall short to defend against universal attacks. We hypothesize that the randomization strategy is the principled one, hence motivating the current study.

3 Using Exponential family for adversarial robustness

In the following of the paper, we will consider a measurable metric input space . We denote a norm on and we suppose the inputs are samples from a distribution .

3.1 Adversarial attacks problem

Let us consider a classification task111Note that the definition of robustness we provide generalizes to other tasks. over . A data has a true label . Let

be a trained classifier over

. The problem of generating an adversarial example from an input writes:


where is the target class (with ). Equation (1) presents a targeted attack model. For the untargeted attack problem, the condition is changed from to . In this paper, we treat indifferently targeted and untargeted attacks. Figure 1 illustrates the principle of an adversarial attack on : a small perturbation applied on an input fools the classifier. The dashed line represents the boundary decision between two classes. The dashed circle around represents the maximal amount of noise keeping the adversarial example perceptually close to (in the case of images). The inputs and looks similar but they are classified with two different labels.

Figure 1: Illustration of an adversarial attack on a classifier .

3.2 A general definition of robustness to adversarial attacks

As we will inject noise in our algorithm in order to defend against adversarial attacks, we need to introduce the notion of “probabilistic mapping”. Let us consider the output space, and a - over .

Definition 1 (Probabilistic mapping).

Let be a measurable space. For any space , a probabilistic mapping from to is a mapping where is the set of probability measures over . To obtain a numerical output out of this mechanism, one needs to sample .

This definition does not depend on the nature of as long as is measurable. In that sense, could be either the label space or any intermediate space corresponding to the outputs of one hidden layer of a neural network. Moreover, any mapping can be considered as a probabilistic mapping, whether it explicitly injects noise (as in (Lecuyer et al., 2018; Rakin et al., 2018; Dhillon et al., 2018)) or not. In fact, any deterministic mapping can be considered as a probabilistic mapping, since it can be characterized by a Dirac measure. Accordingly, the definition of a probabilistic mapping is fully general and equally treats networks with or without noise injection. So far, there exists no definition of robustness against adversarial attacks that comply with the notion of probabilistic mappings. We settle that by generalizing the notion of prediction-change risk initially introduced in (Diochnos et al., 2018) for deterministic classifiers. Given a classifier it is defined as follows:

a where for any , .

In our case, as probabilistic mappings are considered, we need to generalize this notion to probability measures. This leads to the following definition.

Definition 2 (Adversarial robustness).

Let be a metric/divergence on . The probabilistic mapping is said to be --robust if:

Finally, conversely to the previous work, ours does not restrict neither the task (regression, classification, reinforcement learning, etc.) nor the type of distribution the perturbation is drawn from. As

is an arbitrary space, this notion of robustness for probabilistic mappings is fully general, but, in this paper, our final goal remains to ensure robustness for a classification task.

One needs to be careful when considering adversarial robustness regarding this definition: a robust mapping does not necessarily ensures accuracy. In fact, if is the space of labels and

respect the same uniform distribution over

, then for every metric/divergence , one has , and the accuracy will be the one of a random classifier. In the following, robust will mean robust in the sense of Definition 2. Our definition depends on parameters (,,) and on the metric/divergence one chooses to consider between probability measures. Lemma 1 gives some natural insights on the monotony of the robustness according to the parameters, and the probability metric at hand.

Lemma 1.

Let u be a probabilistic mapping, and let , and be two metrics/divergences on . If there is a non decreasing function such that , , then the following assertion holds:

The metric/divergence one chooses to consider Definition 2 is intrinsically linked to the notion of robustness that will be preserved.

3.3 On the choice of the metric/divergence

At this point, a natural question to be asked is the choice of the metric/divergence we will choose to defend against adversarial attacks. The main notions that govern the selection of an appropriate metric/divergence are what we call coherence, strength, and computational tractability. A metric/divergence is said to be coherent if it corresponds to the task at hand (e.g. classification tasks are intrinsically linked to discrete/trivial metrics, conversely to regression tasks). The strength of a metric/divergence refers to its ability to cover (dominate) a wide class of others in the sense of Lemma 1. In the following, we will focus on both the total variation metric and the Renyi divergence, that we consider as respectively the most coherent with the classification task using probabilistic mappings, and the strongest divergence we studied. We first discuss how total variation metric is coherent with randomized classifiers but suffers from computational issues. Hopefully, the Renyi divergence provides good guarantees about adversarial robustness, enjoys nice computational properties, in particular when considering Exponential family distributions, and is strong enough to dominate a wide range of metrics/divergences including total variation.

Let and be two measures in , both dominated by a third measure . The trivial distance is the simplest distance one can define between and .

The Trivial distance:

In the deterministic case, it is straightforward to compute (since the numerical output of the algorithm characterizes its associated measure), but this is not the case in general. In fact one might not have access to the true distribution of the mapping, but just to the numerical outputs. Therefore, one needs to consider more sophisticated metrics/divergences, such as the Total variation distance.

The Total variation distance:

The total variation distance is one of the most broadly used probability metrics. It admits several very simple interpretations, and is a very useful tool in many mathematical fields such as probability theory, Bayesian statistics, coupling or transportation theory. In transportation theory, it can be rewritten as the solution of the Monge-Kantorovich problem with the cost function


where the infimum is taken over all joint probability measures on with marginals and . According to this interpretation, it seems quite natural to consider the total variation distance as a relaxation of the trivial distance on (see (Villani, 2008) for details). In the deterministic case, the total variation and the trivial distance coincides. In general, the total variation allows a finer analysis of the probabilistic mappings than the trivial distance. But it suffers from a high computational complexity. In the following of the paper we will show how to ensure robustness regarding TV distance.

Finally, denoting by and

the respective probability distributions with respect to

, let us recall the Renyi divergence definition (Rényi, 1961):

The Renyi divergence of order :

The Renyi divergence is a generalized measure defined on the interval

, where it equals the Kullback-Leibler divergence when

(that will be denoted ), and the maximum divergence when . It also has the very special property of being non decreasing with respect to

. This divergence is very common in machine learning, especially in its Kullback-Leibler form as it is used widely used as the loss function (cross entropy) of classification algorithms.

The choice of Renyi divergence is motivated by its good properties regarding the bounding of TV distance, good computation with Exponential family distributions and also a good behavior when it comes to ensure robustness for a neural network.

In the following we prove that Renyi divergence implies TV-robustness.

Theorem 1 (Renyi-robustness implies TV-robustness).

Let be a probabilistic mapping, then :

for .

An important property about Renyi-robustness is what is called the Data processing inequality. It is a well-known inequality from information theory which states that “post-processing cannot increase information” (Cover & Thomas, 2012; Beaudry & Renner, 2012). In our case, if we consider a Renyi-robust probabilistic mapping, composing it with a deterministic mapping maintains Renyi-robustness with the same level.

Theorem 2 (Data processing inequality).

Let consider a probabilistic mapping . Let denote a deterministic mapping. If then probability measure defines a probabilistic mapping .

For any if is - robust then is also is - robust.

Data processing inequality will allow us later to inject additive noise after in a neural network and to ensure Renyi-robustness.

3.4 Our main result: Exponential family ensures Renyi-robustness

For now, the question of what class of noise to add is treated ad hoc, we choose here to investigate one particular class of noise, namely Exponential family distributions, and demonstrate their interest. Let us first recall what the Exponential family is. Without loss of generality, we can restrict our study to an output space ).

Definition 3 (Exponential family).

Let be an open convex set of , and . Let be a measure dominated by (either by the Lesbegue or counting measure), it is said to be part of the Exponential family of parameter (denoted

) if it has the following probability density function:

  • is a sufficient statistic

  • a carrier measure (either for Lebesgue or counting measure)

Our main result is the following: by injecting noise from an exponential family distribution, we ensure Renyi-robustness up to a certain value.

Theorem 3 (Exponential family ensures robustness).

Let be an open convex subset of . Let be a mapping such that . Let

be a random variable. We denote

the probability measure of the random variable .

  • If where and have non-decreasing modulus of continuity and .

    Then for any , defines a probabilistic mapping that is - robust with .

  • If is a centered Gaussian random variable with a non degenerated matrix parameter . Then for any , defines a probabilistic mapping that is - robust with .

In simpler words, the previous theorem ensures stability in the neural network regarding the distribution of the output. Intuitively, if two inputs are close with respect to , the output distributions of the network will be close regarding Renyi divergence. So the predicted labels of two close inputs of the network are “more likely” to be the same.

Let us consider a deterministic feed forward neural network

with layers, where corresponds to the neural network truncated at layer . To obtain a robust probabilistic classifier, for any input , we add a noise to layer : where is a random variable from the Exponential family. Then according to Theorems 2 and 3, after adding the noise to the th layer, the whole network defines a probabilistic mapping satisfying Renyi-robustness. Figure 2 illustrates our noise injection defense mechanism. refers to the perturbed -th layer of networks each respecting -robustness, for small enough values of , and . represents the maximal amount of noise, the probabilistic mapping is robust against. It defines a ball (with radius ) on which the outputs of the network are stable. Any example falling within the ball defined by will be mapped close to the mapped version of as shown for in Figure 2. Otherwise, any example out of this ball may be mapped farther (potentially crossing the decision boundary). Depending on the magnitude of , the mapping could be made more robust as shown for in Figure 2. To summarize, if the outputs of the mapping are stable on a ball that includes the set of adversarial examples visually imperceptible, the probabilistic mapping will be robust to adversarial examples.

Figure 2: Illustration of the robustness against adversarial examples for probabilistic mappings (that map images to ditributions).

3.5 On the need for injecting noise in the training phase

So far, we have designed an algorithm for neural networks to ensure robustness at inference time. But simply injecting noise at inference time destroys the accuracy of the algorithm. Thus one needs to also inject noise during the training phase as well. The justification comes from the distribution shift (Sugiyama & Kawanabe, 2012). Distribution shift occurs when the training distribution differs from the test distribution. This implies that the hypothesis minimizing the empirical risk is not consistent, i.e. it does not converge to the true model as the training size increases. A way to circumvent that is to ensure that training and test distributions matches using importance weighting (in the case of covariate-shift) or with noise injection in training and test phases (in our case).

4 Experiments

Our main theoretical finding can be summarized as follows: if one adds random noise drawn from an Exponential family distribution at inference time, the network’s predictions are made locally stable to small changes in the input. Hence, small perturbations on the image result in small changes on the predictions. Accordingly, the accuracy of such a network should be close when evaluated either on natural images or their adversarial substitutes. Note that, robustness is not a synonym of accuracy and the local stability does not fix a poorly performing network. In order to build a robust and accurate network, we will use a state-of-the art architecture (developed for natural images) and make it robust through randomized procedures. We hereafter present experiments that confirm the theoretical evidences on the effectiveness of noise injection as a defense method. From these experiments, we discuss the impact of noise injection on the classifiers’ accuracy, and what noise standard deviation would be a reasonable trade-off in terms of robustness and accuracy.

4.1 Adversarial attacks

As reference adversarial attacks, we consider the followings:

Fast Gradient Method attack. The idea of the fast gradient method with -norm () is to find a linear approximation of the following optimization problem:

where is the loss function (usually ) and the label of . Assuming to be small, the previous problem can be relaxed as follows:

The special case of is called Fast Gradient Sign Method (FGSM, (Goodfellow et al., 2015)).

Carlini & Wagner attack. The Carlini & Wagner attack () introduced in (Carlini & Wagner, 2017) writes as:

where is a function such that iff where is the target class. The authors listed some functions:

where is the softmax function and a positive constant.

Instead of using box-constrained L-BFGS (Szegedy et al., 2014) like in the original attack, they use a new variable for :

Then they use binary search to optimize the constant and Adam or SGD for computing the optimum solution.

4.2 Experimental setting

Dataset. We use CIFAR-10 dataset, which is composed of training samples and test sample images with resolution and 3 channels.

Architecture. We used a ResNet architecture (He et al., 2016), as it is considered to be state-of-art on image classification. More precisely, we use a wide residual network (Zagoruyko & Komodakis, 2016) with layers. It is more memory efficient than very deep residual network and has comparably good performance. We trained every networks with a cross entropy loss.

Training procedure. Our inference procedure consists in adding noise to the network, which can be seen as modifying the test distribution, resulting in a distribution shift. It imposes the training procedure to also inject noise. During training, at each iteration we inject additive noise drawn from an Exponential family distribution on the activation of the first layer of the network. This method has the advantage of being independent of the training method. It only depends on the architecture of the network.

Inference procedure. For a given , the prediction procedure in a classical neural network consists in passing through the successive layers of the network and selects the output label with the maximum a posteriori rule on the activation of the last layer. Our method only differs in one way: when feeding the network with , we add random noise drawn from the same distribution to the same layer as the training phase.

Type of noise. Our experimental framework makes use of noises drawn from continuous distributions (to alter the activations). Being more difficult to compare, the case where a noise is drawn from discrete distribution is left for future work. Indeed, their use heavily depends on the considered transformation (either on the architecture or the image). Therefore, we investigate the noise injections drawn from the following distributions:

  • Gaussian distribution:
    For , .

  • Laplace distribution:
    For , .

  • Exponential distribution:
    For , .

  • Transformed Weibull ditribution:
    For , where .

All these distributions belong to the Exponential family, and respect the hypotheses of Theorem 3.

Evaluation protocol. For every type of noise, we evaluate the accuracy of the obtained classifier on natural images. Then we use an attack to construct an adversarial example for every image in the initial set, and re-evaluate the classifier on them. The obtained accuracy is called the accuracy under attack. To be able to make a global analysis, for every type of noise, we reproduce the above procedure for several attacks and noise’s standard deviations. To benchmark the defense mechanisms, we trained a network without injecting noise, neither on training phase nor on inference phase. Note that, the point of our experiment is not to present a state-of-the art classifier but to investigate the use of Exponential family noises as a defense method. This is why we voluntarily use classical and simple methods.

training inference No attack 0.3 0.3 0.03
No No 0.904 0.893 0.601 0.347 0.029
Yes No 0.855 0.854 0.778 0.517 0.030
Yes Yes 0.853 0.855 0.807 0.596 0.396
Table 1: Comparison on CIFAR-10 of the accuracy and accuracy under attack for a classical Resnet model compared to the same model when noise is injected at training or both at training and test phases. In this experiment, the noise is drawn from a centered Gaussian distribution with standard deviation .
(a) Laplace distribution
(b) Gaussian distribution
(c) Exponential distribution
(d) Weibull distribution
Figure 7: Comparison under various levels of noise of accuracy and accuracy under attack of the network for the defense method using noises drawn respectively from a) Laplace distribution, b) Gaussian distribution, c) Exponential distribution and d) Weibull distribution. Note that x-axis displays the level of noise and the y-axis displays the accuracy.

4.3 Implementation details


We learn every model by using a stochastic gradient descent with a momentum of

, and a Leaky-ReLU with slope

during epochs. The gradient descent is performed using a staircase learning rate starting at . We also apply a Parseval normalization to every convolutions layers (Cisse et al., 2017) during training. Finally, we initialize every layer with a normal random distribution with , where is the layer’s width. We did not use any data augmentation technique nor any pre-training.

Adversarial attacks. To evaluate the robustness of defense method, we apply attacks to both defended and undefended networks. First, we deploy Fast Gradient Methods with norm for (). This attacks depends on a parameter bounding the perturbation norm. We choose to investigate noises that are small enough (respectively and ) for the perception of the attacked image to hold. Second, we use the Carlini and Wagner attack (). represent the state of the art of adversarial attacks. It is a powerful and iterative attack that does not have any parameter since it automatically finds the appropriated perturbation bound.

4.4 Results

For every type of noise, we compare the accuracy of the methods for several level of noise intensity, and attacks. Tables 1 and 2 and Figure 7 present our result for every investigated noises on CIFAR-10 dataset. The y-axes represents the accuracy (natural image line), and the accuracy under attack (other lines), i.e. the percentage of good response of the network fed with classical/adversarial examples.

Baseline accuracies. As presented in Table 1, we note that the network performs very well on natural images with an accuracy of . attack is not powerful enough to be significantly discussed. But adding adversarial noise from other attacks on images decreases the accuracy of the trained network. Our theoretical work considers that robustness is obtained when noise is injected at inference time. Accordingly, adding noise only during training can give a good accuracy, but the accuracy under attack remains low, especially for powerful attacks such as that makes the accuracy drops from to . Moreover, regarding Section 3, only adding noise at inference phase will lead to poor accuracy. Indeed, this technique makes one fall in a distribution shift since noisy images and natural images are not drawn from the same distributions. Therefore, it both theoretically and empirically justifies to inject noise both at training and testing phases, in order to preserve both accuracy and robustness. This technique presents a significant increase over the others regarding the accuracy under attack, especially for attack, where mixing noise at training and inference is times more effective.

St. Dev. No attack 0.3 0.3 0.03
0.01 0.890 0.883 0.722 0.414 0.515
0.06 0.878 0.874 0.798 0.473 0.473
0.21 0.841 0.835 0.800 0.555 0.437
0.32 0.832 0.826 0.808 0.647 0.439
0.52 0.806 0.800 0.795 0.706 0.444
0.72 0.780 0.774 0.766 0.701 0.430
0.93 0.770 0.763 0.758 0.713 0.407
1.13 0.747 0.739 0.736 0.692 0.388
1.34 0.726 0.724 0.713 0.687 0.389
1.54 0.704 0.697 0.690 0.664 0.362
Table 2: Evaluation of the accuracy and accuracy under attack for the defense method based on exponential noise injection (for several attacks and noise intensity).

Robustness by noise injection At a first glance (see Figure 7), the effectiveness of noise injection as a defense method seems natural. In fact, regardless of the type of noise, the noise injection defenses largely outperform the undefended network. We observe the gap between the accuracy and the accuracy under attack, and use this gap to measure the robustness of the different noise injections under the proposed attack. With respect to the amount of injected noise, the classifiers are increasingly robust. Note also that when the amount of noise is big enough, the gap becomes constant.

Robustness/Accuracy trade-off. When injecting noise as a defense mechanism, regardless of the distribution it is drawn from, we observe that accuracy (and accuracy under attack) decreases when noise intensity grows. Note that, for a strong enough noise intensity both accuracy and accuracy under attack (regardless of the attack) decrease with the same slope. This represents an evidence that at some point, making the quantity of noise grow only impacts the accuracy of the method, and not the robustness anymore (this corresponds to both constant robustness and decreasing accuracy). In that sense, the noise needs to be calibrated to preserve both accuracy and robustness against adversarial attacks, i.e. it needs to be large enough to preserve robustness and small enough to preserve accuracy. Analyzing Figure 7, we find that, in practice, for the evaluated noise, small standard deviations seem to already represent good trade-offs between robustness and accuracy.

Comparison of noise distributions. Globally, any noise drawn from any distribution of the Exponential family constitutes an efficient defense against adversarial attacks. According to Figure (a)a, Laplace noise performs poorly in comparison to other distributions: the degradation of accuracy with respect to the noise intensity is too fast to be satisfactory. Transformed Weibull (see Figure (d)d) and Gaussian (see Figure (b)b) distributions have similar performances. This is not surprising regarding how similar the shapes of the two distributions are, but we find this quite interesting. In fact, Gaussian noise is the most used in the literature, and transformed Weibull could constitute a nice alternative to it. Finally, the exponential noise (see Figure (c)c and Table 2) is the one which defends the best the accuracy under attack. This result is quite surprising, since we expected symmetric distributions to better protect the accuracy during the defense mechanism.

Note that we did not investigate noises from families other than the Exponential one, as it would not have brought any valuable insight. Indeed, our main theorem derives robustness from injection of noise from the Exponential family but it does not draw any conclusion for other families.

5 Conclusion and future work

In this work, we bring a theoretically well-grounded framework in order to understand why previous methods based on noise injection were in practice effective against adversarial attacks. While the very article is a theoretical analysis, it also paves the way to novel defense mechanisms using noises from yet unexplored distributions (e.g. Weibull or exponential).

Our theoretical analysis mainly focused on the robustness of the methods but our numerical experiments validated that the accuracy was slightly altered by noise injection. Hence, we demonstrated the practical applicability of the approach. Note also that as we only used a vanilla ResNet21, using tricks of the trade for neural networks and the noise injection, the accuracy could be further improved.

In future work, we plan to investigate more complex architectures, other noise injection schemes and even the combination with other defenses (Madry et al., 2018; Goodfellow et al., 2015). But more importantly, we aim at developing new family of noise (respecting conditions of Theorem 3) further impeding the loss of accuracy.


  • Beaudry & Renner (2012) Beaudry, N. J. and Renner, R. An intuitive proof of the data processing inequality. Quantum Info. Comput., 12(5-6):432–441, May 2012. ISSN 1533-7146.
  • Ben-Tal et al. (2009) Ben-Tal, A., El Ghaoui, L., and Nemirovski, A. Robust optimization, volume 28. Princeton University Press, 2009.
  • Bubeck et al. (2018) Bubeck, S., Price, E., and Razenshteyn, I. Adversarial examples from computational constraints. arXiv preprint arXiv:1805.10204, 2018.
  • Carlini & Wagner (2017) Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
  • Chapeau-Blondeau & Rousseau (2004) Chapeau-Blondeau, F. and Rousseau, D.

    Noise-enhanced performance for an optimal bayesian estimator.

    IEEE Transactions on Signal Processing, 52(5):1327–1334, 2004.
  • Cisse et al. (2017) Cisse, M., Bojanowski, P., Grave, E., Dauphin, Y., and Usunier, N. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, pp. 854–863, 2017.
  • Cover & Thomas (2012) Cover, T. M. and Thomas, J. A. Elements of information theory. John Wiley & Sons, 2012.
  • Dhillon et al. (2018) Dhillon, G. S., Azizzadenesheli, K., Bernstein, J. D., Kossaifi, J., Khanna, A., Lipton, Z. C., and Anandkumar, A. Stochastic activation pruning for robust adversarial defense. In International Conference on Learning Representations, 2018.
  • Diochnos et al. (2018) Diochnos, D., Mahloujifar, S., and Mahmoody, M. Adversarial risk and robustness: General definitions and implications for the uniform distribution. In Advances in Neural Information Processing Systems, pp. 10380–10389, 2018.
  • Dwork et al. (2014) Dwork, C., Roth, A., et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  • Fawzi et al. (2016) Fawzi, A., Moosavi-Dezfooli, S.-M., and Frossard, P. Robustness of classifiers: from adversarial to random noise. In Advances in Neural Information Processing Systems, pp. 1632–1640, 2016.
  • Fawzi et al. (2018a) Fawzi, A., Fawzi, H., and Fawzi, O. Adversarial vulnerability for any classifier. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31, pp. 1186–1195. Curran Associates, Inc., 2018a.
  • Fawzi et al. (2018b) Fawzi, A., Moosavi-Dezfooli, S.-M., Frossard, P., and Soatto, S. Empirical study of the topology and geometry of deep networks. In IEEE CVPR, 2018b.
  • Gibbs & Su (2002) Gibbs, A. L. and Su, F. E. On choosing and bounding probability metrics. International Statistical Review / Revue Internationale de Statistique, 70(3):419–435, 2002. ISSN 03067734, 17515823. URL
  • Goodfellow et al. (2015) Goodfellow, I., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
  • Guo et al. (2018) Guo, C., Rana, M., Cisse, M., and van der Maaten, L. Countering adversarial images using input transformations. In International Conference on Learning Representations, 2018.
  • He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pp. 770–778, 2016.
  • Huber (2011) Huber, P. J. Robust statistics. In International Encyclopedia of Statistical Science, pp. 1248–1251. Springer, 2011.
  • Kraft (1969) Kraft, O. A note on exponential bounds for binomial probabilities. Ann. Inst. Stat. Math., 21:219–220, 1969.
  • Kurakin et al. (2016) Kurakin, A., Goodfellow, I., and Bengio, S. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
  • Lecuyer et al. (2018) Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., and Jana, S. Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 727–743, 2018.
  • Liu et al. (2018) Liu, X., Cheng, M., Zhang, H., and Hsieh, C.-J. Towards robust neural networks via random self-ensemble. In European Conference on Computer Vision, pp. 381–397. Springer, 2018.
  • Madry et al. (2018) Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
  • Meng & Chen (2017) Meng, D. and Chen, H. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147. ACM, 2017.
  • Mitaim & Kosko (1998) Mitaim, S. and Kosko, B. Adaptive stochastic resonance. Proceedings of the IEEE, 86(11):2152–2183, 1998.
  • Moosavi-Dezfooli et al. (2016) Moosavi-Dezfooli, S.-M., Fawzi, A., and Frossard, P. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582, 2016.
  • Moosavi-Dezfooli et al. (2017) Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., and Frossard, P. Universal adversarial perturbations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 86–94. Ieee, 2017.
  • Papernot et al. (2016a) Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B., and Swami, A. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pp. 372–387. IEEE, 2016a.
  • Papernot et al. (2016b) Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE, 2016b.
  • Perez & Wang (2017) Perez, L. and Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621, 2017.
  • Rakin et al. (2018) Rakin, A. S., He, Z., and Fan, D. Parametric noise injection: Trainable randomness to improve deep neural network robustness against adversarial attack. arXiv preprint arXiv:1811.09310, 2018.
  • Rényi (1961) Rényi, A. On measures of entropy and information. Technical report, HUNGARIAN ACADEMY OF SCIENCES Budapest Hungary, 1961.
  • Samangouei et al. (2018) Samangouei, P., Kabkab, M., and Chellappa, R. Defense-GAN: Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018.
  • Simon-Gabriel et al. (2018) Simon-Gabriel, C.-J., Ollivier, Y., Schölkopf, B., Bottou, L., and Lopez-Paz, D. Adversarial vulnerability of neural networks increases with input dimension. arXiv preprint arXiv:1802.01421, 2018.
  • Sugiyama & Kawanabe (2012) Sugiyama, M. and Kawanabe, M. Machine learning in non-stationary environments: Introduction to covariate shift adaptation. MIT press, 2012.
  • Szegedy et al. (2014) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
  • Vajda (1970) Vajda, I. Note on discrimination information and variation. IEEE Trans. Inform. Theory, 16(6):771–773, Nov. 1970.
  • Villani (2008) Villani, C. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
  • Xie et al. (2018) Xie, C., Wang, J., Zhang, Z., Ren, Z., and Yuille, A. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.
  • Zagoruyko & Komodakis (2016) Zagoruyko, S. and Komodakis, N. Wide residual networks. In Richard C. Wilson, E. R. H. and Smith, W. A. P. (eds.), Proceedings of the British Machine Vision Conference (BMVC), pp. 87.1–87.12. BMVA Press, September 2016. ISBN 1-901725-59-6. doi: 10.5244/C.30.87.
  • Zozor & Amblard (1999) Zozor, S. and Amblard, P.-O. Stochastic resonance in discrete time nonlinear AR(1) models. IEEE transactions on Signal Processing, 47(1):108–122, 1999.

1 Notations

Let us consider an output space , a - over . We denote the set of probability measures over . Let be a second measurable space, and a measurable function from to . Finally Let us consider two measures on .

Dominated measure: is said to be dominated by (denoted ) if and only if for all , . If is dominated by , there is a measurable function such that for all , . is called the Radon-Nikodym derivative and is denoted .

Push-forward measure: the push-forward measure of by (denoted ) is the measure on such that .

Convolution product: the convolution of with , denoted is the push-forward measure of by the addition on . since the convolution between function is defined accordingly, we use indifferently for measures and simple functions.

2 Main proofs

Lemma 1.

Let u be the probabilistic mapping, and let , and be two metrics on . If there is a non decreasing function such that , , then the following assertion holds:


Let consider a probabilistic mapping , , and , one has Hence . By inverting the inequality, one gets the expected result. ∎

Proposition 1 ((Kraft, 1969)).

Given two probability measures and on , on has

Proposition 2 ((Vajda, 1970)).

Given two probability measures and on , on has

Theorem 1 (Renyi-robustness implies TV-robustness).

Let be a probabilistic mapping, then :


Given two probability measures and on , and one wants to find a bound on as a functional of .

Using Proposition 1, on has

It suffices to solve a 2nd degree equation to get that

One thus finally gets:

Moreover, using Proposition 2, one gets:

For simplicity, and since the second part of the right hand equation is non increasing given , and since one gets:

Hence, one gets:

By combining the two results, one gets:

To conclude for it suffices to use Lemma 1, and the monotony of Renyi divergence regarding . ∎

Theorem 2 (Data processing inequality).

Let consider a probabilistic mapping . Let denote a deterministic algorithm. If then probability measure defines a probabilistic mapping .

For any if is - robust then is also is - robust.


Let consider a - robust algorithm. Let us also take , and . Without loss of generality, we consider that , and are dominated by the same measure . Finally let us take a measurable mapping from to . For the sake of readability we denote and .

Since , one has . Hence the transfer theorem, the generalized Jensen’s inequality for conditional expectation, and the property of the conditional expectation with regard to the regular expectation, one has

Simply using the transfer theorem, one gets
Since one easily gets the following:
Finally, by using the Jensen inequality, and the property of the conditional expectation, one has

Theorem 3 (Exponential family ensures robustness).

Let be an open convex subset of . Let a mapping such that . Let be a random variable. We denote the probability measure of the random variable .

  • If where and have non-decreasing modulus of continuity and .

    Then for any , defines a probabilistic mapping that is - robust with .

  • If is a centered Gaussian random variable with a non degenerated matrix parameter . Then for any , defines a probabilistic mapping that is - robust with .


Let consider the probabilistic mapping constructed from noise injection respectively drawn from 1) exponential family with non-decreasing modulus of continuity, and 2) a non degenerate Gaussian. Let us also take , and . Without loss of generality, we consider that , and are dominated by the same measure . Let us also denote, the Radon-Nikodym derivative of the noise drawn in 1) with respect to , the Radon-Nikodym derivative of the noise drawn in 2) with respect to and the Dirac function in a mapping any element if it equals and 0 otherwise.


2) Since the is non degenerated the Gaussian measure accept a pdf with respect to the Lebesgue measure, hence for all s.t

3 Additional results on the strength of the Renyi divergence

Let us consider an output space , a - over , and three measures on , with in the set of probability measures over denoted . One has and one denotes and the Radon-Nikodym derivatives with respect to .

The Separation distance:

The Hellinger distance:

The Prokhorov metric:

The Discrepancy metric:

Lemma 2.

Given two probability measures and on the Separation metric and the Renyi divergence satisfy the following relation:


The function is negative on , therefore for any one has , hence

Proposition 3 ((Gibbs & Su, 2002)).

Given two probability measures and on , the Wasserstein metric and the Total Variation distance satisfy the following relation: