Statistically Robust Neural Network Classification

12/10/2019 ∙ by Benjie Wang, et al. ∙ University of Oxford 0

Recently there has been much interest in quantifying the robustness of neural network classifiers through adversarial risk metrics. However, for problems where test-time corruptions occur in a probabilistic manner, rather than being generated by an explicit adversary, adversarial metrics typically do not provide an accurate or reliable indicator of robustness. To address this, we introduce a statistically robust risk (SRR) framework which measures robustness in expectation over both network inputs and a corruption distribution. Unlike many adversarial risk metrics, which typically require separate applications on a point-by-point basis, the SRR can easily be directly estimated for an entire network and used as a training objective in a stochastic gradient scheme. Furthermore, we show both theoretically and empirically that it can scale to higher-dimensional networks by providing superior generalization performance compared with comparable adversarial risks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since the discovery of the phenomenon of adversarial examples for neural networks (Szegedy et al., 2014; Goodfellow et al., 2015; Papernot et al., 2016), a variety of approaches for assessing and mitigating their practical impact on decision-making systems have been proposed (Gu and Rigazio, 2015; Moosavi-Dezfooli et al., 2016; Madry et al., 2018). Much of this work has focused on the formal verification of properties of neural network classifiers, such as the robustness of individual decisions under an -norm perturbation set (Majumdar and Kuncak, 2017; Gehr et al., 2018; Wang et al., 2018), typically doing this in an input-specific manner.

However, in practice, one is usually concerned with the overall robustness of the network, that is, its robustness across the range of possible inputs that it will see at test-time, something which is typically far more difficult to characterize and calculate. This has motivated network-wide robustness definitions, such as average minimal adversarial definitions (Fawzi et al., 2018) and adversarial risk/accuracy (Madry et al., 2018), as well training schemes aiming to produce networks which perform well under these metrics (Wong and Kolter, 2018; Madry et al., 2018).

Motivated by applications in which an explicit adversary is working against the network, these metrics generally focus on worst-case robustness, that is, they are based on the largest loss within the perturbed region. Notwithstanding, there are a variety of reasons why this may not always be preferable behavior.

Firstly, there are many applications where one is concerned about robustness to naturally occurring or random perturbations of the inputs, rather than an explicit test-time adversary. For example, in many safety-critical applications like self-driving cars, we may not have access to the exact inputs due to sensor imperfections and wish to ensure our predictions are robust to such variations. Here our classifier must account for these variations, but some level of risk will usually be acceptable: it will typically be neither feasible nor necessary to guarantee there are no possible adversarial inputs, but we instead wish to evaluate the probability of there being such an input.

Secondly, previous work has shown that worse-case robustness metrics can have very poor generalization from train to test time (Schmidt et al., 2018; Yin et al., 2019), both from a theoretical perspective and in practice for real networks, thereby substantially reducing their applicability.

Finally, whereas the motivation for requiring worst-case robustness for individual inputs is often clear, it is more difficult to motivate using worst-case robustness for the classifier as a whole. Specifically, because there will typically be some randomness in the exact inputs observed at test-time, our classifier can only be perfectly worse-case robust if it is robust to all possible perturbations of all possible inputs, something which will very rarely be achievable for practical problems.

To address these issues, we build on the statistical robustness work of Webb et al. (2019), which quantifies the expected robustness of an individual decision under an input perturbation distribution, to construct a framework for characterizing the overall robustness of a network in a statistical manner. Specifically, we introduce the notion of a statistically robust risk

(SRR), a class of metrics that can be used to assess the overall robustness of a neural network classifier by averaging a loss function over both possible inputs and an input perturbation distribution.

Unlike worst-case robustness approaches, our SRR framework naturally applies at a network-wide level due to the law of total expectation. We demonstrate that the SRR can differ significantly from both the corresponding natural (i.e., non-robust) and adversarial (i.e., worst-case) risks, and as such provides a unique metric for both training and testing networks that helps ensure robustness to probabilistic input perturbations. We provide theoretical and empirical evidence that the SRR has superior generalization performance to the corresponding adversarial risk, particularly in high-dimensions, with bounds on the generalization error respectively scaling as and in the size of the network. This suggests that it may be possible to obtain statistically robust networks in a wide range of applications where adversarial robustness is still elusive or inappropriate.

2 Background

2.1 Adversarial Examples

Although the general concept of adversarial examples (a perturbed input data point that is classified poorly) is well understood, the precise definition is often left implicit in literature, despite many versions being present. To formalize this, let represent the classifier (with parameters ) and a hypothetical ground-truth “reference” classifier. Let be the original input point and the perturbed input point. Then at least 3 different definitions are commonly used (Diochnos et al., 2018):

  • Prediction change (PC) ;

  • Corrupted instance (CI) ;

  • Error region (ER) .

For pointwise robustness metrics (Szegedy et al., 2014; Webb et al., 2019), we are usually concerned with cases where the original point was classified correctly and so the PC and CI definitions coincide. However, the distinction is important when we are working with overall metrics over multiple data points (some of which will be incorrectly classified). Unless otherwise explicitly stated, we will take the CI definition in this paper.

2.2 Adversarial Risk

The risk of a classifier is a measure of its average performance with respect to the data distribution:

(1)

where is an input/target pair, is the true data generating distribution, and is some loss function. For non-robust risks, can usually be written in the form , which we term the natural risk. For example, the 0-1 and cross-entropy losses:

(2)
(3)

To model the effect of an adversary limited to additively perturbing inputs by a vector

within a limited set (e.g., an -ball), adversarial risk is defined as (Madry et al., 2018)

(4)

which is in fact a form of risk with loss function . When the 0-1 loss function is used, this is known as adversarial accuracy.

Empirical versions and of both definitions can be obtained by replacing the expectation over with a Monte Carlo average over a training dataset . The standard training of a classifier then corresponds to solving the optimization problem . For adversarial risks, this corresponds to a class of well-known minimax problems called robust optimization (Ben-Tal et al., 2009).

Methods for solving this optimization problem typically include a subroutine for approximating the inner maximization before performing training on the neural network parameters . Adversarial training (Goodfellow et al., 2015; Kurakin et al., 2018; Madry et al., 2018), provides a lower bound on the inner maximization by using gradient-based methods to generate maximally adversarial examples. Other methods, instead, upper bound the inner maximization by approximating the neural network using convex relaxations before performing the outer minimization on that structure (Wong and Kolter, 2018; Raghunathan et al., 2018).

2.3 Statistical Robustness

Webb et al. (2019) recently introduced a statistical robustness metric that provides a statistical alternative to formal verification of neural network properties. They model inputs using an input distribution, , and define their metric to be the probability of violation under this. This provides more information than standard verification about the network’s robustness: if the property is not violated, then the probability is 0, whereas if there exists a violation, the metric indicates how likely the event of violation is.

Formally, they define their robustness metric using:

  • A property function defined over the input space , where is the neural network and is problem specific parameters (e.g. the true label). This is designed such that the event represents a violation of the property, which could, for example, represent the misclassification of the target.

  • An input distribution , typically taking the form of a perturbation distribution .

Their statistical robustness metric is then given by the integral (suppressing and from )

(5)

An important application of this method is to pointwise robustness for neural network classifiers. Standard verification schemes target the binary 0-1 metric on whether an adversarial example exists in a perturbation region around a point. On the other hand, the statistical robustness metric can determine the probability of drawing an adversarial example from some perturbation distribution.

Because often constitutes a (potentially very) rare event, numerically calculating (5) can be challenging. To address this, Webb et al. (2019) further introduce an estimation approach based around adaptive multi-level splitting (AMLS) (Guyader et al., 2011) and show that this can effectively estimate even for large networks and high-dimensional input spaces.

3 Statistically Robust Metrics

Despite the statistical robustness framework introduced in the last section having numerous attractive properties for quantifying neural network robustness, Webb et al. (2019) only consider using it for assessing pointwise robustness. We now show how it can be straightforwardly extended to characterize the robustness of a whole network, and, in turn, how this can be adapted to a notion of statistically robust risk, which, unlike the approach of Webb et al. (2019), is suitable for network training as well as evaluating existing networks.

3.1 The Total Statistical Robustness Metric

The statistical robustness metric can be straightforwardly extended to a metric for the whole network by simply averaging the pointwise metric over the data generating distribution. This allows us to define the total statistical robustness metric (TSRM) as

(6)

where is the true data generating distribution and is the perturbation distribution around a point (which will typically be additive, i.e. ). Intuitively, can be thought of as the probability that the property will be violated for an arbitrary input point and perturbation at test time, e.g. the probability the target will be misclassified.

Given the sophisticated Monte Carlo machinery required to accurately estimate the statistical robustness metric, it may at first seem like will be impractically difficult to estimate. Indeed, if one were to draw data points at random and separately run the estimation approach of Webb et al. (2019) this would indeed be the case. However, there are two key factors, which mean that this can be avoided. Firstly, the law of total expectation means we can treat (6) as a single expectation over . Secondly, is typically dominated by a small subset of the inputs for which the pointwise statistical robustness is large. For these points, is typically not an especially rare event and can thus be estimated efficiently by simple Monte Carlo, thereby requiring substantially less computational effort.111Though we did also develop an AMLS method specifically adapted to the needs of , we found in practice that this was typically less efficient than the simple Monte Carlo estimator and therefore unnecessary. Therefore, surprisingly, can typically be accurately estimated with comparable, and often noticeably less, computation than a single instance of the pointwise metric.

3.2 Statistically Robust Risk

The TSRM forms a useful metric for already trained networks, yet it is not suitable as an objective for training due to the difficulty of taking gradients through the identity function. Furthermore, it only applies to cases where we wish to estimate a probability of failure, rather than a more general loss.

To address these issues, we note that the TSRM can be thought of as a specific risk and thus generalized to

(7)

where represents a natural, pointwise, loss function as per Section 2.2.222In certain cases, we may further require to also take directly as an input. This potential dependency is not problematic and omitted simply for notational clarity. We refer to as the statistically robust risk (SRR). The link to the TSRM is straightforward, with TSRM constituting the special case of . Note that the SRR corresponds to using the loss function

(8)

Critically, the SRR allows us to use differentiable losses,

, such that it can form an objective for robust neural network training using stochastic gradient descent. Namely, by exploiting the law of total expectation, we can draw data points from the training data,

, draw corresponding sample perturbations , and then update the network using the unbiased gradient updates

(9)

noting that this is equivalent to conventional training but with the inputs randomly perturbed.

The SRR further provides a mechanism for linking statistical robustness back to the conventional notions of natural and adversarial risk, as well as a basis for theoretical analysis (see Section 4). For example, we see that the natural risk can be viewed as a special case of the SRR for which collapses to a Dirac delta measure about . On the other hand, in the additive perturbation case, that is where , we see that when the support of is contained within the perturbation set , the SRR, , is upper bounded by the adversarial risk, , since .

At a high-level, training using the SRR has the effect of “smoothing” the decision boundaries relative to using the corresponding natural risk. This can be useful when we want to be sensitive to certain classes or events, as it allows us to train our classifier to take conservative actions when the input is close to potentially problematic regions. For example, a self-driving car needs to ensure it avoids false negatives when predicting the presence of a pedestrian in its path.

4 Theoretical Generalization Analysis

Though adversarial training is effective for reducing the adversarial risk of neural networks on MNIST, success has been limited in scaling up to higher-dimensional datasets such as CIFAR-10. Schmidt et al. (2018) show that this is due to a generalization gap, whereby it is possible to achieve an adversarial accuracy of 97% on the training set, yet just 47% on the test set. This overfitting is in contrast to the natural case where well-tuned networks of sufficient capacity rarely overfit on CIFAR-10. We explore whether this gap still holds for our total statistical robustness metric through the lens of Rademacher complexity.

Let be the risk of a classifier under a distribution and the corresponding empirical risk for dataset

. It is well known in statistical learning theory that we can probabilistically upper bound the generalization error

of a learning algorithm using notions of complexity on the admissible set of classifiers and loss function (Shalev-Shwartz and Ben-David, 2014). Intuitively, if the admissible set of functions is less complex, then there is less capacity to overfit to the training data.

To be more precise, we define the empirical Rademacher complexity (ERC) for a function class and a sample of size with elements in to be (Shalev-Shwartz and Ben-David, 2014):

(10)

where

are independent Rademacher random variables, which take either the value

or , each with probability . Intuitively, this measures the complexity of the class by determining how many different ways functions can classify the sample .

Using the ERC and defining the loss function class to be , we can now bound the generalization error using the following Theorem from Yin et al. (2019):

Theorem 1.

Suppose for all . Suppose further that the samples are i.i.d. from a distribution . Then for any , we have, for all ,

(11)

where is the risk and

(12)

is the empirical risk.

This bound is probabilistic, data-dependent and uniform over all . This means it holds for all , including those trained on the dataset . To take advantage of it, we need to be able to compute . The Rademacher complexity of the set of neural network classifiers can be upper bounded (Bartlett et al., 2017; Yin et al., 2019) by an expression which is , where is the maximal number of nodes in a single layer. Thus we simply need to relate to .

First, let us consider the natural case, for which . If is Lipschitz in the first argument, we can use the Talagrand contraction lemma (Ledoux and Talagrand, 2013), which gives that . Thus, substituting this inequality into (11), we have

(13)

such that our generalization error bound scales as .

We now introduce an analogous result for the SRR.

Theorem 2.

For any and all ,

(14)

where and are equal to their values in the natural risk case. In other words, the generalization error for the SRR is upper bounded by an expression with dependence on .

Proof.

Firstly, using the law of total expectation, we can rewrite the SRR in the form of a single expectation over

(15)

Notice that this takes the form of natural risk on an extended space, with a natural loss function . Since this loss does not depend explicitly on , we can further rewrite it as and expectation over

(16)

where . This is now precisely in the natural form with replacing . Moreover, as and depend only on , their values are unchanged from the natural case and so we can directly invoke (13). ∎

In contrast, for the adversarial risk (in binary classification), where , is lower bounded by an expression containing explicit dependence on , where is the dimension of the input layer to the NN (Yin et al., 2019). While this lower bound does not allow us to directly bound the generalization error using Equation (13), it does suggest that in high dimensions the adversarial generalization error can be much greater than the natural and statistically robust generalization error.

This indicates it will typically be difficult to train networks that are adversarially robust at test time for high-dimensional datasets. On the other hand, our analysis shows that statistically robust networks may be easier to obtain.

5 Related Work

The concept of training neural networks with randomly perturbed inputs is, of course, not new (Plaut et al., 1986; Elman and Zipser, 1988; Sietsma and Dow, 1991; Holmstrom and Koistinen, 1992; Minnix, 1992; An, 1996). For example, Elman and Zipser (1988) show how the accuracy of neural networks trained to classify phonemes is improved by perturbing the inputs with uniform noise normalized by the scale of the inputs. They note that training with noise performs a form of data augmentation, smoothing the decision boundaries, and thus improving the model’s generalization to new inputs. Webb (1994) and Bishop (1995), meanwhile, make explicit the connection between training with perturbed inputs and regularization, proving that for the sum-of-squares loss function, training with noise is approximately equal to training with Tikhonov regularization. Other work has investigated training neural networks by perturbing other components of the neural network such as its weights (An, 1996; Jim et al., 1996; Graves et al., 2013), targets (Szegedy et al., 2016; Vaswani et al., 2017), and gradients (Neelakantan et al., 2015), with a similar motivation to input perturbation. Modern widely used techniques such as dropout (Srivastava et al., 2014; Wager et al., 2013) can also be interpreted as regularization by the addition of noise.

Our work differs from these previous approaches in that training with noise emerges from a principled risk minimization framework, rather than being taken as the starting point of algorithmic development with a justification ex post facto. Moreover, we use input perturbations not only during training but also as a means of evaluating the robustness of the classifier at test-time. We have also drawn novel connections and comparisons between existing adversarial/robustness methods and probabilistic input perturbations, providing conceptual, theoretical, and (later) empirical arguments for why the latter is an important component in the greater arsenal of robust classification approaches. We further highlight that our framework is more general than the simple addition of Gaussian or uniform noise to the inputs of a standard loss function, in that both the perturbation distribution and loss function can be crafted to incorporate prior knowledge about the task at hand.

6 Experiments

To empirically investigate our SRR framework, we now present a series of experiments comparing it with natural and adversarial approaches. For training using the SRR, we follow the approach from Section 3.2, generating perturbations to points in the training dataset and then using a mini-batch version of the gradient update in (9).

Unless otherwise stated, we train using the cross entropy loss,

(17)

referring to training on the resulting SRR as corruption training. To maintain consistency with natural training settings, we then evaluate using the 0-1 loss function

(18)

We refer to the resulting SRR as Accuracy-TSRM (A-TRSM), since this is a version of (6) where the event corresponds to .

Figure 1: A-TSRM computed on the MNIST test set over a range values of . Each line represents a different network, trained either naturally or with corruptions.

6.1 Comparison to Natural Accuracy

In this experiment, we show that naturally trained networks are vulnerable under the A-TSRM metric and that corruption training can alleviate this vulnerability. Specifically, we examine the A-TSRM for naturally trained and corruption trained neural network classifiers on MNIST, using corruption models given by the uniform distribution on

-balls of radius .

We use a dense ReLU network with an input layer of size 784, a hidden layer of size 256, and an output layer of size 10. We train using 5 different methods: natural training (i.e.,

), and corruption training with . We then evaluate on A-TSRM using a range of values of from (below the level of discretization in the pixel values and thus invisible) to (difficult for even humans to classify).

The results, shown in Figure 1, provide several interesting insights. Most notably, corruption training led to reduced values for the A-TSRM compared to natural training for all values of equal to or larger than the value used in training (with the exception that training with did slightly worse when evaluated with ). These gains are sometimes more than an order of magnitude in size, confirming that, not only is A-TSRM highly distinct to natural accuracy, but training using an appropriate SRR can provide significant A-TSRM benefits at test-time. Though this comes at the cost of reduced test-time accuracy for this example (as highlighted by evaluations at small values of ), we will later find that this is not always the case.

Training Method
Natural, Corruption Corruption PGD
Evaluation
Metric
Natural, 98.7/92.9 / / /
A-TSRM, / 98.1/92.4 / /
A-TSRM, / / /89.9 /
Adversarial, / / / /40.1
Table 1:

Train/test set evaluations of different networks on CIFAR-10, scores given in % and averaged over 5 runs. The best test set performance for each evaluation metric is highlighted in bold.

6.2 Empirical Generalization Error

As previously noted, it has proved challenging to train networks to achieve high test-time adversarial accuracy on higher-dimensional datasets such as CIFAR-10 due to poor generalization from training. By contrast, our analysis in Section 4 suggests that the gap will be more similar to natural accuracy for SRR approaches. We thus investigate the generalization gap experimentally for A-TSRM on CIFAR-10. Additionally, we compare corruption training with the PGD adversarial training method of Madry et al. (2018), which is designed to maximize adversarial accuracy.

We use a wide residual network architecture (Zagoruyko and Komodakis, 2016) with depth 28, widening factor 10, and dropout rate 0.3. We train using four different methods: natural training, corruption training with and , and PGD adversarial training (7 gradient steps) with . Correspondingly, we then evaluate these networks using natural accuracy, A-TSRM with and , and adversarial accuracy with (computed using 7-step PGD). Here corresponds roughly 8/255 in pixel values, which is used as the corruption set by Madry et al. (2018) for adversarial training, while represents a more extreme corruption model.

We report the train/test scores after 30 epochs (averaged over 5 runs) in Table 

1. Though we run each training process with the same number of epochs, adversarial training took significantly longer than natural or corruption training (approx. times).

On all of natural and A-TRSM evaluation metrics and all training methods, we observe a fairly small generalization gap of up to about , which is in line with the natural/natural generalization gap of . This compares very favorably to adversarial accuracy, for which we see a much larger gap of for the PGD trained network (and where Schmidt et al. (2018) found a generalization gap eventually, running for more epochs on the same architecture). This provides support for our theoretical analysis suggesting that A-TSRM (statistical robustness) generalizes better than adversarial accuracy for higher dimensional datasets.

For the natural and A-TSRM metrics, we also notice that the best test-set performance was achieved using the corresponding natural/corruption training method. The drop in performance from training using larger values of than those used in the evaluation was far less than in the reciprocal case. For example, corruption training with had a test natural accuracy below natural training, but natural training had an A-TSRM worse than corruption training.

Interestingly, PGD (with ) was also found to be fairly effective in improving the A-TSRM, recording consistently good test-set values comparable to corruption training. However, as previously mentioned, PGD incurs a large cost in computational time, due to the gradient step computations and the relatively slow training (with respect to epochs) due to the harder computational task of improving adversarial accuracy. Thus, if we are concerned with optimizing the A-TSRM metric, corruption training is an efficient and effective training scheme, while adversarial methods can also be effective but may be much slower.

         Natural training          SRR training

Natural evaluation

 

SRR evaluation

 
Figure 2: Learning curves using weighted cross-entropy loss on MNIST with . Each plot represents a combination of training method and evaluation metric. For the bottom left plot, note the different y-axis scaling, and that the training set SRR almost coincides with the test set SRR and thus is not visible.

6.3 Tailored Loss Functions

In risk frameworks, we often wish to tailor the loss function to better represent a particular problem. In particular, the cost associated with misclassifications may vary for different true labels and different predictions. For example, a self-driving car predicting the road is clear when there is actually a pedestrian will be far more damaging than predicting there is a pedestrian when the road is clear. The SRR can be particularly useful in such situations, as networks need to be robust to noise in their inputs to fully incorporate all uncertainty present in the decision making process.

To provide a concrete example, we consider training and evaluating using a weighted cross entropy loss

(19)

where the weights reflect the relative importance of making accurate positions for different values of the true label . For example, by taking for a particular problem class and for others, the classifier will be heavily penalized if it fails to correct identify with high-confidence all occurrences of . In turn, this heavy penalty can increase the sensitivity to perturbations in the inputs: we do not want our classifier to confidently predict that if our input is close to points for which , as this risks incurring the heavy penalty if our inputs are noisy or our classifier is imperfect.

To assess the efficacy of SRR compared with natural risk in this setting, we consider training on MNIST using the same architecture as in Experiment 6.1, but with a weighted CE loss where and otherwise, i.e. penalizing classifiers which fail to confidently identify images of the figure . We also change the perturbation distribution from those used earlier, taking with .

The results, shown in Figure 2, exhibit several interesting traits. Firstly, we see in the bottom left plot that natural training is extremely vulnerable to noisy input perturbations for this problem, producing SRR values at both train and test time that are multiple orders of magnitude worse than those achieved when corruption training (bottom right). This highlights the importance both of considering the SRR at test time and also its ability to provide effective robust training.

Secondly, we see that while training with natural risk very quickly overfits (top left), corruption training with the SRR provides far better generalization (bottom right). In fact, the final test SRR of the corruption trained network is far lower than the final test natural risk on the natural risk trained network, a powerful result given that the former is a corrupted version of the latter. Moreover, evaluating the corruption trained network, we achieve lower SRR risk (top right) than natural risk (top left). Thus even when the inputs are not corrupted, we can achieve better weighted cross entropy loss by artificially adding noise to both the training procedure and at test-time. This indicates that for the weighted-cross entropy loss, the SRR can provide robustness not only by accounting for potential input noise but also by better accounting for the imperfect nature of the network to avoid overconfidently dismissing the potential for a test-time datapoint to belong to the problem class.

7 Conclusions

Motivated by applications where test-time corruptions are not generated adversarially but probabilistically, we introduced a statistically robust risk (SRR) framework, providing a class of metrics for evaluating robust performance under probabilistic input perturbations that are amenable to efficient training. We showed that the SRR can differ significantly from both natural and adversarial risk, and that networks with low test-time SRRs can be achieved through training with corrupted inputs. Unlike adversarial risk, our results suggest statistically robust risk generalizes from the training data similarly to, and potentially even better than, natural risk, meaning that it has more general practical applicability to high-dimensional datasets and complex architectures. Thus, for probabilistic corruption threat settings, robust neural networks may be within reach for a wide range of applications by using SRRs.

Acknowledgements

TR gratefully acknowledges funding from Tencent AI Labs and a Junior Research Fellowship supported by Christ Church, Oxford.

Bibliography

  • An (1996) Guozhong An.

    The effects of adding noise during backpropagation training on a generalization performance.

    Neural computation, 8(3):643–674, 1996.
  • Bartlett et al. (2017) Peter L. Bartlett, Dylan J. Foster, and Matus Telgarsky. Spectrally-normalized margin bounds for neural networks. In Proceedings of 31st International Conference on Neural Information Processing Systems (NeurIPS), pages 6241–6250, 2017.
  • Ben-Tal et al. (2009) A. Ben-Tal, L. El Ghaoui, and A.S. Nemirovski. Robust Optimization. Princeton Series in Applied Mathematics. Princeton University Press, October 2009.
  • Bishop (1995) Chris M Bishop. Training with noise is equivalent to tikhonov regularization. Neural computation, 7(1):108–116, 1995.
  • Diochnos et al. (2018) Dimitrios I. Diochnos, Saeed Mahloujifar, and Mohammad Mahmoody. Adversarial risk and robustness: General definitions and implications for the uniform distribution. In Proceedings of 32nd International Conference on Neural Information Processing Systems (NeurIPS), pages 10380–10389, 2018.
  • Elman and Zipser (1988) Jeffrey L Elman and David Zipser. Learning the hidden structure of speech. The Journal of the Acoustical Society of America, 83(4):1615–1626, 1988.
  • Fawzi et al. (2018) Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classifiers’ robustness to adversarial perturbations. Machine Learning, 107(3):481–508, Mar 2018.
  • Gehr et al. (2018) Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. AI: Safety and robustness certification of neural networks with abstract interpretation. In Security and Privacy (SP), 2018 IEEE Symposium on, 2018.
  • Goodfellow et al. (2015) Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Proceedings of 3rd International Conference on Learning Representations (ICML), 2015.
  • Graves et al. (2013) Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton.

    Speech recognition with deep recurrent neural networks.

    In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. IEEE, 2013.
  • Gu and Rigazio (2015) Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. In Proceedings of 3rd International Conference on Learning Representations (ICLR), 2015.
  • Guyader et al. (2011) Arnaud Guyader, Nicolas Hengartner, and Eric Matzner-Løber.

    Simulation and estimation of extreme quantiles and extreme probabilities.

    Applied Mathematics & Optimization

    , 64(2):171–196, 2011.
  • Holmstrom and Koistinen (1992) Lasse Holmstrom and Petri Koistinen. Using additive noise in back-propagation training. IEEE transactions on neural networks, 3(1):24–38, 1992.
  • Jim et al. (1996) Kam-Chuen Jim, C Lee Giles, and Bill G Horne. An analysis of noise in recurrent neural networks: convergence and generalization. IEEE Transactions on neural networks, 7(6):1424–1438, 1996.
  • Kurakin et al. (2018) Alex Kurakin, Dan Boneh, Florian Tramèr, Ian Goodfellow, Nicolas Papernot, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In Proceedings of 6th International Conference on Learning Representations (ICML), 2018.
  • Ledoux and Talagrand (2013) Michel Ledoux and Michel Talagrand. Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media, 2013.
  • Madry et al. (2018) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.

    Towards deep learning models resistant to adversarial attacks.

    In Proceedings of 6th International Conference on Learning Representations (ICLR), 2018.
  • Majumdar and Kuncak (2017) Rupak Majumdar and Viktor Kuncak. Computer aided verification - 29th international conference, CAV 2017, heidelberg, germany, july 24-28, 2017, proceedings, part I. volume 10426 of Lecture Notes in Computer Science. Springer, 2017.
  • Minnix (1992) Jay I Minnix. Fault tolerance of the backpropagation neural network trained on noisy inputs. In Proceedings of the IJCNN International Joint Conference on Neural Networks, volume 1, pages 847–852. IEEE, 1992.
  • Moosavi-Dezfooli et al. (2016) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR

    , pages 2574–2582, 2016.
  • Neelakantan et al. (2015) Arvind Neelakantan, Luke Vilnis, Quoc V Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens. Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807, 2015.
  • Papernot et al. (2016) Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P), pages 372–387. IEEE, 2016.
  • Plaut et al. (1986) David C Plaut et al. Experiments on learning by back propagation. 1986.
  • Raghunathan et al. (2018) Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. In Proceedings of 6th International Conference on Learning Representations (ICML), 2018.
  • Schmidt et al. (2018) Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Proceedings of 32nd International Conference on Neural Information Processing Systems (NeurIPS), pages 5019–5031, 2018.
  • Shalev-Shwartz and Ben-David (2014) Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York, NY, USA, 2014. ISBN 1107057132, 9781107057135.
  • Sietsma and Dow (1991) Jocelyn Sietsma and Robert JF Dow. Creating artificial neural networks that generalize. Neural networks, 4(1):67–79, 1991.
  • Srivastava et al. (2014) Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research (JMLR), 15(1):1929–1958, 2014.
  • Szegedy et al. (2014) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Proceedings of 2nd International Conference on Learning Representations (ICLR), 2014.
  • Szegedy et al. (2016) Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
  • Wager et al. (2013) Stefan Wager, Sida Wang, and Percy S Liang. Dropout training as adaptive regularization. In Proceedings of 27th International Conference on Neural Information Processing Systems (NeurIPS), pages 351–359, 2013.
  • Wang et al. (2018) Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Efficient formal safety analysis of neural networks. In Proceedings of 32nd International Conference on Neural Information Processing Systems (NeurIPS), pages 6369–6379, 2018.
  • Webb (1994) Andrew R Webb. Functional approximation by feed-forward networks: a least-squares approach to generalization. IEEE Transactions on Neural Networks, 5(3):363–371, 1994.
  • Webb et al. (2019) Stefan Webb, Tom Rainforth, Yee Whye Teh, and M. Pawan Kumar. A Statistical Approach to Assessing Neural Network Robustness. In Proceedings of 7th International Conference on Learning Representations (ICLR), 2019.
  • Wong and Kolter (2018) Eric Wong and J. Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proceedings of 35th International Conference on Machine Learning (ICML), pages 5283–5292, 2018.
  • Yin et al. (2019) Dong Yin, Ramchandran Kannan, and Peter Bartlett. Rademacher complexity for adversarially robust generalization. In Proceedings of 36th International Conference on Machine Learning (ICML), volume 97, pages 7085–7094, 2019.
  • Zagoruyko and Komodakis (2016) Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 2016.

Bibliography

  • An (1996) Guozhong An.

    The effects of adding noise during backpropagation training on a generalization performance.

    Neural computation, 8(3):643–674, 1996.
  • Bartlett et al. (2017) Peter L. Bartlett, Dylan J. Foster, and Matus Telgarsky. Spectrally-normalized margin bounds for neural networks. In Proceedings of 31st International Conference on Neural Information Processing Systems (NeurIPS), pages 6241–6250, 2017.
  • Ben-Tal et al. (2009) A. Ben-Tal, L. El Ghaoui, and A.S. Nemirovski. Robust Optimization. Princeton Series in Applied Mathematics. Princeton University Press, October 2009.
  • Bishop (1995) Chris M Bishop. Training with noise is equivalent to tikhonov regularization. Neural computation, 7(1):108–116, 1995.
  • Diochnos et al. (2018) Dimitrios I. Diochnos, Saeed Mahloujifar, and Mohammad Mahmoody. Adversarial risk and robustness: General definitions and implications for the uniform distribution. In Proceedings of 32nd International Conference on Neural Information Processing Systems (NeurIPS), pages 10380–10389, 2018.
  • Elman and Zipser (1988) Jeffrey L Elman and David Zipser. Learning the hidden structure of speech. The Journal of the Acoustical Society of America, 83(4):1615–1626, 1988.
  • Fawzi et al. (2018) Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classifiers’ robustness to adversarial perturbations. Machine Learning, 107(3):481–508, Mar 2018.
  • Gehr et al. (2018) Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. AI: Safety and robustness certification of neural networks with abstract interpretation. In Security and Privacy (SP), 2018 IEEE Symposium on, 2018.
  • Goodfellow et al. (2015) Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Proceedings of 3rd International Conference on Learning Representations (ICML), 2015.
  • Graves et al. (2013) Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton.

    Speech recognition with deep recurrent neural networks.

    In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. IEEE, 2013.
  • Gu and Rigazio (2015) Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. In Proceedings of 3rd International Conference on Learning Representations (ICLR), 2015.
  • Guyader et al. (2011) Arnaud Guyader, Nicolas Hengartner, and Eric Matzner-Løber.

    Simulation and estimation of extreme quantiles and extreme probabilities.

    Applied Mathematics & Optimization

    , 64(2):171–196, 2011.
  • Holmstrom and Koistinen (1992) Lasse Holmstrom and Petri Koistinen. Using additive noise in back-propagation training. IEEE transactions on neural networks, 3(1):24–38, 1992.
  • Jim et al. (1996) Kam-Chuen Jim, C Lee Giles, and Bill G Horne. An analysis of noise in recurrent neural networks: convergence and generalization. IEEE Transactions on neural networks, 7(6):1424–1438, 1996.
  • Kurakin et al. (2018) Alex Kurakin, Dan Boneh, Florian Tramèr, Ian Goodfellow, Nicolas Papernot, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In Proceedings of 6th International Conference on Learning Representations (ICML), 2018.
  • Ledoux and Talagrand (2013) Michel Ledoux and Michel Talagrand. Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media, 2013.
  • Madry et al. (2018) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.

    Towards deep learning models resistant to adversarial attacks.

    In Proceedings of 6th International Conference on Learning Representations (ICLR), 2018.
  • Majumdar and Kuncak (2017) Rupak Majumdar and Viktor Kuncak. Computer aided verification - 29th international conference, CAV 2017, heidelberg, germany, july 24-28, 2017, proceedings, part I. volume 10426 of Lecture Notes in Computer Science. Springer, 2017.
  • Minnix (1992) Jay I Minnix. Fault tolerance of the backpropagation neural network trained on noisy inputs. In Proceedings of the IJCNN International Joint Conference on Neural Networks, volume 1, pages 847–852. IEEE, 1992.
  • Moosavi-Dezfooli et al. (2016) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR

    , pages 2574–2582, 2016.
  • Neelakantan et al. (2015) Arvind Neelakantan, Luke Vilnis, Quoc V Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens. Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807, 2015.
  • Papernot et al. (2016) Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P), pages 372–387. IEEE, 2016.
  • Plaut et al. (1986) David C Plaut et al. Experiments on learning by back propagation. 1986.
  • Raghunathan et al. (2018) Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. In Proceedings of 6th International Conference on Learning Representations (ICML), 2018.
  • Schmidt et al. (2018) Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Proceedings of 32nd International Conference on Neural Information Processing Systems (NeurIPS), pages 5019–5031, 2018.
  • Shalev-Shwartz and Ben-David (2014) Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York, NY, USA, 2014. ISBN 1107057132, 9781107057135.
  • Sietsma and Dow (1991) Jocelyn Sietsma and Robert JF Dow. Creating artificial neural networks that generalize. Neural networks, 4(1):67–79, 1991.
  • Srivastava et al. (2014) Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research (JMLR), 15(1):1929–1958, 2014.
  • Szegedy et al. (2014) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Proceedings of 2nd International Conference on Learning Representations (ICLR), 2014.
  • Szegedy et al. (2016) Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
  • Wager et al. (2013) Stefan Wager, Sida Wang, and Percy S Liang. Dropout training as adaptive regularization. In Proceedings of 27th International Conference on Neural Information Processing Systems (NeurIPS), pages 351–359, 2013.
  • Wang et al. (2018) Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Efficient formal safety analysis of neural networks. In Proceedings of 32nd International Conference on Neural Information Processing Systems (NeurIPS), pages 6369–6379, 2018.
  • Webb (1994) Andrew R Webb. Functional approximation by feed-forward networks: a least-squares approach to generalization. IEEE Transactions on Neural Networks, 5(3):363–371, 1994.
  • Webb et al. (2019) Stefan Webb, Tom Rainforth, Yee Whye Teh, and M. Pawan Kumar. A Statistical Approach to Assessing Neural Network Robustness. In Proceedings of 7th International Conference on Learning Representations (ICLR), 2019.
  • Wong and Kolter (2018) Eric Wong and J. Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proceedings of 35th International Conference on Machine Learning (ICML), pages 5283–5292, 2018.
  • Yin et al. (2019) Dong Yin, Ramchandran Kannan, and Peter Bartlett. Rademacher complexity for adversarially robust generalization. In Proceedings of 36th International Conference on Machine Learning (ICML), volume 97, pages 7085–7094, 2019.
  • Zagoruyko and Komodakis (2016) Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 2016.