Improved robustness to adversarial examples using Lipschitz regularization of the loss

by   Chris Finlay, et al.
McGill University

Adversarial training is an effective method for improving robustness to adversarial attacks. We show that adversarial training using the Fast Signed Gradient Method can be interpreted as a form of regularization. We implemented a more effective form of adversarial training, which in turn can be interpreted as regularization of the loss in the 2-norm, ∇_x ℓ(x)_2. We obtained further improvements to adversarial robustness, as well as provable robustness guarantees, by augmenting adversarial training with Lipschitz regularization.



There are no comments yet.


page 1

page 2

page 3

page 4


Label Smoothing and Logit Squeezing: A Replacement for Adversarial Training?

Adversarial training is one of the strongest defenses against adversaria...

Adversarial Training Generalizes Data-dependent Spectral Norm Regularization

We establish a theoretical link between adversarial training and operato...

Smoothness Analysis of Loss Functions of Adversarial Training

Deep neural networks are vulnerable to adversarial attacks. Recent studi...

Bridging Adversarial Robustness and Gradient Interpretability

Adversarial training is a training scheme designed to counter adversaria...

Gradient-Guided Dynamic Efficient Adversarial Training

Adversarial training is arguably an effective but time-consuming way to ...

Single-Step Adversarial Training for Semantic Segmentation

Even though deep neural networks succeed on many different tasks includi...

Scaleable input gradient regularization for adversarial robustness

Input gradient regularization is not thought to be an effective means fo...

Code Repositories


Lipschitz regularization of deep neural networks

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Contributions of this work

Adversarial training is an effective method for improving robustness to adversarial attacks. We show that adversarial training using the Fast Signed Gradient Method (Goodfellow et al., 2014) can be interpreted as regularization by the average of the 1-norm of the gradient of the loss over the data,


The choice of norm for the adversarial perturbation can lead to different interpretations: using the 2-norm for adversarial training corresponds to


We present theoretical justification and empirical evidence that training with () is more adversarially robust than ().

We consider Lipschitz regularization in §3. Write for the Lipschitz constant of loss of the model, . We found existing methods of Lipschitz regularization based on norms of weight matrices (Bartlett, 1996; Szegedy et al., 2013) to be ineffective. As an alternative, we consider a tractable Lipschitz regularization of the loss of the model, by taking the maximum over the data of the norm of the gradient of the loss of the model.

Moreover, we show in 3.2 that controls the adversarial robustness of the model. Thus we interpret adversarial training (in the 2-norm) augmented with Lipschitz regularization as minimization of the objective function


which we refer to as (tulip). In practice, outperforms and . For example on CIFAR-10, for a ResNeXt model, adversarial training alone reduced adversarial training error by 29% (measured at adversarial distance111Apologies for overloading ‘’ for both the loss and for norms: we hope the meaning is clear from context ) over an undefended model. In contrast, with Lipschitz regularization () reduces adversarial error by 42% over baseline. See Table 1

. We trained with hyperparameters

and . Other values of and may work better; we did not tune these hyperparameters. See §4 for empirical results.

1.2 Background on adversarial examples and adversarial training

Improving robustness to adversarial samples is a first step towards model verification (Szegedy et al., 2013; Goodfellow et al., 2018). However robustness guarantees to adversarial samples are difficult to obtain, since in practice it is only possible to generate suboptimal adversarial attacks.

Adversarial samples are unlikely to occur randomly. Rather, they are generated by an adversary. Adversarial attacks are classified according to the amount of information available to the attacker. White box attacks occur when the attacker has full access to the loss, model, and label. Typically white box attacks are generated using loss gradients: these attacks include L-BFGS (

Szegedy et al. (2013)), Fast Signed Gradient (Goodfellow et al. (2014)), Jacobian Saliency (Papernot et al. (2016a)), and Projected Gradient Descent (Madry et al. (2017)). Black box attacks rely on less information, using model outputs rather than model gradients. Black box attacks require more effort (Papernot et al. (2017)) to implement, but their brute force approach may make them more effective evading adversarial defences (Brendel et al. (2018)).

The recent review Goodfellow et al. (2018) discusses defences against adversarial attacks and their limitations. The earliest and most successful defense is adversarial training (Szegedy et al. (2013); Goodfellow et al. (2014); Tramèr et al. (2018); Madry et al. (2017)). Top entries in a recent adversarial defence competition (Kurakin et al. (2017)) used Ensemble Adversarial Training (Tramèr et al. (2018)), where a model is adversarially trained with inputs generated by an ensemble of other models.

In adversarial training, the model, , is trained to solve the minimax problem


However in practice this problem is not computationally feasible. Instead, (1) is approximated. A popular and effective approximation is the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014), which also defines an attack.

Other forms of defences against gradient based attacks besides adversarial training include (Papernot et al., 2017, 2016b) as well as adding stochastic noise to the model, using a non-differentiable classifier (Lu et al. (2017)

), or defense distillation (

Hinton et al. (2015); Papernot et al. (2016c)). Gradient based methods may be less successful against black box attacks (Brendel et al. (2018)).

Other possible defences discussed in Goodfellow et al. (2018) include input validation and preprocessing, which would potentially allow adversarial samples to be recognized before being input to the model, and architecture modifications designed to improve robustness to adversarial samples. For more information we refer to the review (Goodfellow et al. (2018)) and the discussion of attack methods in (Brendel et al. (2018)).

1.3 Background on Lipschitz Regularization of the model

A form of robustness guarantees for a network is provided by the global Lipschitz constant of the model. Weng et al. (2018) show that the Lipschitz constant of the model gives an certifiable minimum adversarial distance: a successful attack on image will have adversarial distance at least


where is the Lipschitz constant of the model, , and is the correct label of . Thus training models to have small Lipschitz constant could improve adversarial robustness (Hein & Andriushchenko (2017); Tsuzuku et al. (2018)). Oberman & Calder (2018)

recently showed that Lipschitz regularization leads to a proof of generalization. The Lipschitz constant of a model may be estimated using only the product of the norms of model weight matrices (

Bartlett (1996); Szegedy et al. (2013)), which is independent of the data. Models have been trained using this estimate as a regularization term in (Cissé et al., 2017; Gouk et al., 2018; Miyato et al., 2018a; Tsuzuku et al., 2018).

For deep neural networks, we argue that there is a large gap between the empirical Lipschitz constant of a model on the data and the estimate of the model Lipschitz constant provided using the model weights

(Bartlett, 1996), see §4.

2 Adversarial training and regularization

Definition 2.1 (Adversarial attacks).

Write for the correct label and for the classifier. An adversarial attack , is a perturbation of the input which leads to incorrect classification

Adversarial attacks seek to find the minimum norm attack vector, which is an intractable problem

(Athalye et al., 2018). An alternative which permits loss gradients to be used, is to consider the attack vector of a given norm which most increases the loss, .


2.1 Derivation of attack directions

The solution of (3) can be approximated using the dual norm (Boyd & Vandenberghe, 2004, A.1.6). If the -norm is used, we recover the Signed Gradient (Goodfellow et al., 2014). However a different attack vector is obtained if we measure attacks in the 2-norm.

Theorem 2.2.

The optimal attack vector defined by (3) in a generic norm can be approximated to with the vector , where is the solution of


and is the dual norm. In particular is given by


Write and use the Taylor expansion of

Then we can approximate (3) by solving


The value of the solution of (6) is given by the dual norm (Boyd & Vandenberghe, 2004, A.1.6) of the gradient, , and the optimal vector is then given by the -scaled solution of

In the case of the -norm, the dual norm is the 1-norm, and the solution is given by the Signed Gradient vector . In the case of the 2-norm the dual norm is itself the 2-norm and the solution of (6) is given by . ∎

The 2-norm attack vector, points in the direction of the gradient of the loss, while the signed gradient attack vector points in the direction of the optimal dual vector.

2.2 Interpretation of adversarial training

Adversarial training can be interpreted as minimizing

Theorem 2.3.

Adversarial training using the attack vector (5

) can be interpreted as augmenting the loss function with the regularization



The adversarial vector given by (5) combined with the Taylor expansion gives

Substitute the last equation into the adversarial training equation (7) to obtain

which, up to give the regularization term (8). ∎

2.3 Iterative attacks based on gradient norms

Iterative attacks based on gradient ascent such as iterative FGSM (Madry et al., 2017) should be performed using the 2-norm direction , since this follows the gradient ascent curve, see Figure 0(b).

Figure 1:

Left: Comparison of attack methods using error curves for undefended ResNeXt-34, on the CIFAR-10 test set. The data stability curve is black. A higher curve means more probability of error, so the

projected gradient method is the most effective attack. Right: Comparison of Iterative FGSM and gradient descent on a quadratic function in two dimensions. The search direction chosen by FGSM is not the direction of steepest ascent.
Euclidean distance distance

max test statistics

Dataset defense method median % Err at median % Err at
distance distance
CIFAR-10   (baseline) 0.09 53.98 99.92 3.33 85.85
  (AT, FGSM) 0.18 24.63 96.06 7.04 48.29
  (AT, ) 0.30 13.54 84.76 3.94 33.19
& 0.56 12.12 51.64 1.62 9.54
CIFAR-100   (baseline) 74.18 99.61 3.54 43.28
  (AT, FGSM) 56.34 98.46 7.50 56.62
  (AT, ) 53.77 98.03 7.78 44.9
& 0.136 42.58 93.73 3.75 21.03
Table 1: Adversarial statistics with ResNeXt-34. The columns and report the maximum observed norm on the test data.

The angle between and is given by

where is the input dimension. Because , this ratio is always between zero and one. On the networks we studied, the ratio above could be as small as 0.32. To illustrate, Figure 0(b) shows the angle between iterative FGSM and the iterative gradient ascent on a toy loss (convex quadratic) in two dimensions. In practice we find iterative attacks using the steepest ascent direction are more effective than iterative FGSM based attacks, see Section 4.2.

3 Lipschitz Regularization

3.1 Evaluating the Lipschitz constant of a model

Definition 3.1.

The Lipschitz constant of a function is given by


When is differentiable on a closed, bounded domain, , then


Here for vector value functions, , the induced matrix norm must be used, based on the norms for and (Horn et al., 1990, Chapter 5.6.4). The result is standard in analysis, it follows from the Mean Value Theorem and the definition of the derivative. Using (10), we can approximate the Lipschitz constant by testing on the data


Because the loss is a scalar, Lipschitz regularization of the loss is implemented by taking and minimizing the regularized loss function


The first term in () is the expected loss, and the second term is the approximation of the Lipschitz constant of the loss coming from (11

). During training with Stochastic Gradient Descent, both terms are evaluated over mini-batches.

3.2 Lipschitz constant of data and optimal extensions

Define the Lipschitz constant of the data (in the norm) to be


Table 4.1 lists the Lipschitz constant of the training data for common datasets, which are all small: all but one are below 1 in the norm.

The Lipschitz extension theorem (Valentine, 1945) says that given function values , there exists an extension which perfectly fits the data, and has the same Lipschitz constant, provided the appropriate norm are used on the and spaces. This can be done using, for example, the 2-norm for and the norm on the label space. In other norms, we can also make an extension, but the Lipschitz constant may increase (Johnson & Lindenstrauss, 1984). Of course, such a function may not be consistent with a given architecture.

3.3 Robustness guarantees from the Lipschitz constant

The following Lemma shows that the Lipschitz constant of the loss function gives a robustness guarantee for the loss incurred by an adversarial perturbation of norm . An analogous formula gives the corresponding robustness result using the Lipschitz constant of the model (2).

Lemma 3.2 (Stability of network).

Suppose the composed loss function is -Lipschitz continuous. Let be an adversarial perturbation of norm . Then


By Lipschitz continuity of


There are two cases for the left-hand side, depending on the sign. In both cases we obtain (13). ∎

3.4 Regularization of the model versus the loss of the model

If the goal is adversarial robustness, then regularization of the loss is just as effective (empirically) as regularizing the model, at a much lower cost. Since the loss is a scalar, regularizing by the Lipschitz constant of the loss is equivalent to corresponds to regularization of the model

in one direction. By the chain rule,

For example, when is the KL divergence, and when then

Thus, in this case, regularizing corresponds to regularization of in the direction .

3.5 Upper bounds on the Lipschitz constant

The estimate (11) is a lower bound of the Lipschitz constant of the loss. It is well known that data independent upper bounds on the Lipschitz constant of the model are available (Bartlett, 1996) using the product of the norm of the weight matrices. See also Szegedy et al. (2013); Cissé et al. (2017); Gouk et al. (2018); Miyato et al. (2018b, a); Tsuzuku et al. (2018). Other estimates are also available, for example Weng et al. (2018) used Extreme value theory to estimate the local Lipschitz constant of a model.

Let be the weight matrix of the -th layer of a network comprised of layers, and suppose all non linearities of a network are at most 1-Lipschitz. Then via the chain rule and properties of induced matrix norms, it can be shown


with and . Certain conditions on the ’s must be met. For a proof with see Tsuzuku et al. (2018). A similar bound is available with and . However, for deep models, we found that this bound was much too large: on the models we considered, this estimate was on the order of to .

4 Empirical results

We considered two toy problems, using image classification on the CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton (2009)). We tested our methods on three networks, chosen to represent a broad range of architectures: AllCNN-C (Springenberg et al. (2014)), a 34 layer ResNet (He et al. (2016)), and a 34 layer ResNeXt (Xie et al. (2017)). Training and model details are provided in Appendix A.

4.1 Error curves for models and robustness metrics

We define the error curve of a model given an attack. The curve provides information about the robustness of a model to attacks of different norms.

Definition 4.1.

The error curve of the model for the attack is the probability over the test data that an attack of size leads to a misclassification


See Figure 0(a) for error curves for a given model over a range of attacks. We also plot the data stability curve, which is the probability that a perturbation can move one data point in the direction of another data point with a different label (which can be interpreted as a very weak attack).

We can compare how two different models perform against various attacks using the model error curve. In Table 1 we report and , using the 2-norm on . These values corresponded to the test error, and noise which is slightly smaller than a human perceptible perturbation (see Figure 3). We also report the median distance which corresponds to the -intercept of 50% error on the curve.

Dataset MNIST FashionMNIST CIFAR-10 CIFAR-100
0.417 0.626 0.364 1.245
3.45 63.75 10.20 9.107
Table 2: Lipschitz constants of common training sets. CIFAR-100 has several duplicated images with different labels, these were removed from the calculation.
(a) CIFAR-10, distance
(b) CIFAR-100, distance
(c) CIFAR-10, distance
(d) CIFAR-100, distance
Figure 2: Comparison of adversarial regularization types for ResNeXt networks on CIFAR 10 and 100.

4.2 Attack evaluation

We attacked each model on the test/validation set using six untargeted attack methods: gradient attack; projected gradient descent (constrained in ); the Fast Gradient Sign Method (FGSM) (Goodfellow et al. (2014)); Iterative FGSM (I-FGSM) (Kurakin et al. (2016)); DeepFool (Moosavi-Dezfooli et al. (2016)); and Boundary attack (Brendel et al., 2018). The first five methods are white-box attacks, while the last is a black-box attack. I-FGSM and the projected gradient attack are iterative methods, whereas FGSM and the gradient attack are single step. All attacks were implemented with Foolbox 1.3.1 (Rauber et al. (2017)). Hyperparameters were set to Foolbox defaults, except for the Boundary attack222 The Boundary attack is a computationally demanding attack, and so due to resource constraints we ran the Boundary attack for only 500 iterations per test image. The boundary attack should have better performance with more iterations.. For each image and attack method, each attack reports an adversarial distance (in ).

On each model, dataset, and regularization method, we tested all six attack methods on the entire test/validation set. We compared attack methods using the attack error curve. For example see Figure 0(a), where we plot attack error curves for each attack method on an undefended model. The attack error curve plots the percent of misclassified test images as a function of adversarial distance. We report against Euclidean distance, , because it is an often reported measure of adversarial robustness, although other choices (MSE, distance in -norm) are equally valid. Common adversarial metrics are easily read off the attack error curve. For example, median Euclidean adversarial distance occurs at the -intercept of 50% error. The percent error given a maximum adversarial distance (for example ) is also readily available.

On all models and for all defence methods studied, projected gradient descent (constrained in ) consistently outperformed the other attack methods: Projected gradient descent had the smallest mean adversarial distance and the highest attack error curve. See Figure 0(a) for an illustration. A close second was I-FGSM, the other iterative attack method tested. The next two best attack methods were gradient attack, followed by FGSM. The Boundary attack outperformed DeepFool. We observed the same ranking of attacks on all models and defences studied. For this reason in the following section, we only report model statistics using projected gradient descent, constrained in , which could also be regarded as the strongest attack of all the attacks listed.

4.3 Evaluation of defence methods

Each model was trained with combinations of up to three adversarial defences. The methods were: (i) , the baseline undefended model; (ii) , adversarial training with FGSM; (iii) , adversarial training with 2-norm; each of can be augmented with Lipschitz regularization, which in the last case we call ; we also considered adding a final sigmoid layer to the network, prior to the .

The choice of sigmoid we choose is , and is inspired by (but not equivalent to) -estimators used in classical statistics as a robust estimator (Hampel et al., 2011, Chapter 2)

. The intuition behind this choice is to normalize the logit scores of the model, which we believe should improve robustness to outliers. Outside of deep learning,

-estimators have been successfully used to normalize scores and improve robustness, for example in machine learning biometrics (

Jain et al. (2005)). See Appendix A.1 for layer details.

Model robustness is evaluated on the entire test/validation set using the median adversarial distance (in ), and the percent misclassified at adversarial distance . We chose because at this magnitude attacks are still imperceptible to the human eye. We argue it is reasonable to ask that models classify images with imperceptible perturbations correctly. At attacks are perceptible, albeit only slightly. See Figure 3. We also plot the attack error curve for each model. These statistics were generated with the projected gradient attack. Table 1 and Figure 2 present results for ResNeXt-34. The best statistics are in bold.

Figure 3: Adversarial perturbations of CIFAR-10 with increasing magnitudes of attack, measured in .

Here we summarize our results for ResNeXt-34, the model studied with the greatest capacity, and defer results for the other models to Appendix B. Without adversarial perturbations, all ResNeXt-34 models achieve roughly 4% test error on CIFAR-10. However, the undefended (baseline, ) model achieves 54% test error at adversarial distance . Adversarial training via FGSM () reduces test error to 24.6%, whereas adversarial training () reduces test error to 13.5%. A combination of all defenses ( with ) further reduces test error to 12.1%. The models are ranked in the same order when instead measured with median adversarial distance. The model with all defenses has median adversarial distance six times that of the undefended model. FGSM () only doubles the median adversarial distance relative to the baseline undefended model. Figure 1(a) illustrates that this ranking of defenses holds over all distances of adversarial perturbations.

We observe a similar ranking on CIFAR-100. See for Figure 1(b). Unperturbed, all models achieve between 21% and 22% test error. Without adversarial defenses, ResNeXt-34 (4x32d) has a test error of 74% at adversarial distance . Adversarial training alone brings the test error down to 56.3% and 53.7%, with respectively FGSM and adversarial training. A combination of all defenses further reduces test error to 42.6%. Median adversarial distance increased from 0.05 on the undefended model to 0.14 on the model with all defenses.

In Table 1 we also report statistics measuring the model’s Lipschitz constant. The columns and give the maximum of these norms over the test/validation set. The norm of the product of weights is independent of test data, and is an upper bound on the global Lipschitz constant of the model. Employing all defenses dramatically decreases the norm of the model Jacobian on the test data, and hence improves model robustness. On CIFAR-10 the model with all defenses has Jacobian norm nearly 10 times smaller than the undefended model, whereas adversarial training only improves the Jacobian norm by a factor of three at most. On CIFAR-100, adversarial training alone does not appear to improve the norm of the Jacobian significantly. However a combination of all defenses decreases the norm of the model Jacobian by a factor of two.

In Appendix B we report results for all models and combinations of defense methods. Of the individual defenses by themselves, adversarial training ( or ) improves model robustness the most. We find adversarial training () to be more effective than FGSM (). We observe the same ranking of defense methods for AllCNN and ResNet-34. Adversarial training improves model robustness. However model robustness is further improved by adding Lipschitz regularization, which empirically decreases the Jacobian norm of the model on the test data.

Both adversarial training and Lipschitz regularization increase training time by a factor of no more than four. In contrast, adding a final layer to normalize the logits is nearly free, and consistently improves model robustness by itself.

Rather than using as the Lipschitz penalty, we also tried training models with direct estimates of the Lipschitz constant. We tried both the product of layer weight norms , and the tighter estimate . However, neither of these direct estimates were effective as regularizers. The gap between the empirical Lipschitz constant on the data (the modulus of continuity on the data), and the estimated Lipschitz constant is too large. See for example Table 1, where we report the maximum Jacobian norm and . These two statistics differ by at least four orders of magnitude. The estimate is worse, and is numerically infeasible for models with more than a few layers. For example, on the two 34-layer networks we studied, this estimate was at least , and was as large as . Another estimate of the local Lipschitz constant is available using a statistic from Extreme value theory (Weng et al. (2018)). However this estimate requires at a minimum many tens of model evaluations for each image, and so is not tractable as a Lipschitz estimate during training.


The authors thank Bill Tubbs, Alex Iannantuono and Aram Pooladian for their assistance designing the experimental pipeline. The authors acknowledge the support of a Google gift which was used to support Bilal Abbasi during a collaboration at Google Brain Montréal. Adam Oberman was partially supported by AFOSR grant FA9550-18-1-0167.


Appendix A Model and training details

We used standard data augmentation for the CIFAR dataset, comprising of horizontal flips, and random crops of padded images, four pixels per side. We used square cutout (

Devries & Taylor (2017)

) of width 16 on CIFAR-10, and width 8 on CIFAR-100, but no dropout. Batch normalization was used after every convolution layer. We used SGD with an initial learning rate of 0.1, momentum set to 0.9, and a batch size of 128. CIFAR-10 was trained for 200 epochs, dropping the learning rate by a factor of five after epochs 60, 120, and 180. On CIFAR-100, networks were trained for 300 epochs, and the learning rate was dropped by a factor of 10 after epochs 150 and 225. For CIFAR-10 weight decay (Tikhonov/

regularization) was set to ; on CIFAR-100 it was .

For networks with Lipschitz regularization, the Lagrange multiplier of the excess Lipschitz term was set to . Adversarially trained models were trained with images perturbed to an distance of . We did not tune either of these hyperparameters.

For CIFAR-10, the ResNeXt architecture we used had a depth of 34 layers, cardinality 2 and width 32, with a basic residual block rather than a bottleneck. The branches (convolution groups) of the blocks were aggregated via a mean, rather than using a fully connected layer. For CIFAR-100 the architecture was the same, but had cardinality 4.

a.1 Pre- sigmoid layer

Prior to the final

layer, we found inserting a sigmoid activation function improved model robustness. In this case, the sigmoid layer comprised of first batch normalization (without learnable parameters), followed by the activation function

, where is a single learnable parameter, common across all layer inputs.

Appendix B Further experimental results

(a) CIFAR-10, distance
(b) CIFAR-100, distance
(c) CIFAR-10, distance
(d) CIFAR-100, distance
Figure 4: Comparison of results for ResNeXt networks on CIFAR 10 and 100.

Here we present complete results for all regularization types, on all models and datasets considered. Because adversarial training outperforms FGSM, we only report results for the former.

Euclidean distance distance
Model defense method % Err at median % Err at median % Err at
distance distance
AllCNN 6.01 0.13 38.11 88.32
6.26 0.17 29.27 87.58
& 5.41 0.19 32.61 72.06
& 5.45 0.21 25.04 74.88
5.90 0.29 17.09 72.43
5.84 0.29 16.86 74.06
& 5.10 0.38 16.19 60.22
& 5.27 0.35 15.00 63.82
ResNet34 6.00 0.09 56.00 100
5.43 0.17 27.08 99.39
& 5.54 0.20 34.44 86.62
& 6.14 0.21 28.66 81.34
5.57 0.25 18.19 90.62
5.65 0.28 16.74 89.1
& 5.52 0.46 17.45 57.75
& 5.81 0.40 15.84 62.74
4.07 0.09 53.98 99.92
4.28 0.21 19.13 98.16
& 4.05 0.34 23.97 70.49
& 4.18 0.33 19.64 63.73
3.58 0.30 13.54 84.76
4.13 0.31 12.52 81.65
& 3.80 0.61 12.71 47.08
& 4.08 0.56 12.12 51.64
Table 3: CIFAR-10 adversarial statistics
Model defense method
AllCNN 0.31 16.75
0.18 9.80
& 0.40 13.64
& 0.16 7.15
1.03 6.38
0.84 6.26
& 1.44 9.72
& 1.04 8.20
ResNet34 3.17 72.14
0.81 20.43
& 2.28 41.76
& 0.38 8.45
2.51 18.62
1.88 12.40
& 1.97 15.66
& 1.33 9.23
3.33 85.85
0.79 20.91
& 2.30 50.70
& 0.55 8.23
3.94 33.19
2.30 13.65
& 0.76 18.91
& 1.62 9.54
Table 4: CIFAR-10 stability statistics. The columns and report the maximum observed norm on the test data.
Euclidean distance distance
Model defense method % Err at median % Err at median % Err at
distance distance
AllCNN 25.25 63.58 99.86
25.89 56.45 99.52
& 26.06 64.77 99.22
& 26.23 56.26 97.98
25.64 53.85 99.13
25.60 50.71 98.74
& 26.27 54.81 98.44
& 26.05 53.24 98.55
ResNet34 27.42 90.41 99.97
28.18 70.94 99.89
& 40.72 81.19 98.24
& 38.61 68.34 97.74
28.21 66.12 99.66
28.21 66.12 99.43
& 29.19 64.51 98.27
& 28.01 58.40 98.01
21.24 74.18 99.61
21.97 47.64 94.48
& 21.05 52.28 98.23
& 21.05 52.28 92.42
21.57 53.77 98.03
21.73 46.79 96.95
& 21.01 50.33 96.51
& 21.47 42.58 93.73
Table 5: CIFAR-100 adversarial statistics
Model defense method
AllCNN 4.18 27.57
3.08 20.95
& 3.41 10.69
& 2.35 6.42
3.21 20.92
2.62 16.30
& 3.65 23.32
& 3.04 20.70
ResNet34 17.78 90.83
5.41 27.72
& 11.36 31.66
& 3.66 7.8
5.13 31.15
3.55 19.19
& 4.48 27.99
& 3.44 16.88
3.54 43.28
0.65 23.27
& 14.03 53.43
& 4.74 10.09
7.78 44.9
3.26 27.67
& 7.90 41.73
& 3.75 21.03
Table 6: CIFAR-100 stability statistics. The columns and report the maximum observed norm on the test data.