Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness

06/26/2019 ∙ by Fanny Yang, et al. ∙ 0

This work provides theoretical and empirical evidence that invariance-inducing regularizers can increase predictive accuracy for worst-case spatial transformations (spatial robustness). Evaluated on these adversarially transformed examples, we demonstrate that adding regularization on top of standard or adversarial training reduces the relative error by 20 for CIFAR10 without increasing the computational cost. This outperforms handcrafted networks that were explicitly designed to be spatial-equivariant. Furthermore, we observe for SVHN, known to have inherent variance in orientation, that robust training also improves standard accuracy on the test set. We prove that this no-trade-off phenomenon holds for adversarial examples from transformation groups in the infinite data limit.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As deployment of machine learning systems in the real world has steadily increased over recent years, the trustworthiness of these systems is of crucial importance. This is particularly the case for safety-critical applications. For example, the vision system in a self-driving car should correctly classify an obstacle or human irrespective of their orientation. Besides being relevant from a security perspective, a measure for spatial invariance also helps to gauge interpretability and reliability of a model. If an image of a child rotated by

is classified as a trash can, can we really trust the system in the wild?

As neural networks have been shown to be expressive both theoretically

[18, 4, 15] and empirically [48], in this work we study to what extent standard neural networks predictors can be made invariant to small rotations and translations. In contrast to enforcing conventional invariance on entire group orbits, we weaken the goal to invariance on smaller so-called transformation sets. This requirement reflects the aim to be invariant to transformations that do not affect the labeling by a human. During test time we assess transformation set invariance by computing the prediction accuracy on the worst-case (adversarial) transformation in the (small) transformation set of each image in the test data. The higher this worst-case prediction accuracy of a model is, the more spatially robust we say it is. Importantly, we use the same terminology as in the very active field of adversarially robust learning [40, 29, 23, 33, 6, 26, 37, 39, 35, 44, 28], but we consider adversarial examples with respect to spatial instead of -transformations of an image.

Recently, it was observed in [11, 13, 34, 20, 14, 2] that worst-case prediction performance drops dramatically for neural network classifiers obtained using standard training, even for rather small transformation sets. In this context, we examine the effectiveness of regularization that explicitly encourages the predictor to be constant for transformed versions of the same image, which we refer to as being invariant on the transformation sets. Broadly speaking, there are two approaches to encourage invariance of neural network predictors. On the one hand, the relative simplicity of the mathematical model for rotations and translations has led to carefully hand-engineered architectures that incorporate spatial invariance directly [19, 24, 8, 27, 45, 43, 12, 41]. On the other hand, augmentation-based methods [3, 47] constitute an alternative approach to encourage desired invariances on transformation sets. Specifically, the idea is to augment the training data by a random or smartly chosen transformation of every image for which the predictor output is enforced to be close to the output of the original image. This invariance-inducing regularization term is then added to the cross entropy loss for back-propagation.

While augmentation-based methods can be used out of the box whenever it is possible to generate samples in the transformation set of interest, it is unclear how they compare to architectures that are tuned for the particular type of transformation using prior knowledge. Studying robustness against spatial transformations in particular allows us to compare the robust performance of these two approaches, as spatial-equivariant networks have been somewhat successful in enforcing invariance. In contrast, this cannot be claimed for higher-dimensional -type perturbations. In the empirical sections of this paper, we hence want to explore the following questions:

  1. To what extent can augmentation and regularization based methods improve spatial robustness of common deep neural networks?

  2. How does augmentation-based invariance-inducing regularization perform in case of small spatial transformations compared to representative specialized architectures designed to achieve invariance against entire transformation groups?

As a justification for employing this form of invariance-inducing regularization, we prove in our theoretical section 2 that when perturbations come from transformation groups, predictors that optimize the robust loss are in fact invariant on the set of transformed images. Although recent works show a fundamental trade-off between robust and standard accuracy in constructed perturbation settings [42, 49, 36], we additionally show that this is fundamentally different for spatial transformations due to their group structure.

For the empirical study, we implemented various augmentation based training methods as described in Sec. 3. In Sec. 4, we compare spatial robustness for augmentation-based methods and specialized neural network architectures on CIFAR-10 and SVHN. Although group-invariance should automatically imply robust predictions against all transformations in the group, group-equivariant networks have not been extensively evaluated using adversarially chosen, but rather random transformations. In experiments with CIFAR-10 and SVHN, we find that regularized methods can achieve relative adversarial error reduction compared to previously proposed augmentation-based methods (including adversarial training) without requiring additional computational resources. Furthermore, they even outperform representative handcrated networks that were explicitly designed for invariance.

2 Theoretical results for invariance-inducing regularization

In this section, we first introduce our notion of transformation sets and formalize robustness against a small range of translations and rotations. We then prove that, on a population level, constraining or regularizing for transformation set invariance yields models that minimize the robust loss. Moreover, when the label distribution is constant on each transformation set, we show that the set of robust minimizers not only minimizes the natural loss but, under mild conditions on the distribution over the transformations, is even equivalent to the set of natural minimizers.

Although the framework can be applied to general problems and transformation groups, we consider image classification for concreteness. In the following, are the observed images,

is the one-hot vector for multiclass labels and both are random variables from a joint distribution

. The function in function space

(e.g. deep neural network in experiments) maps the input image to a logit vector that is then used for prediction via a softmax layer.

2.1 Transformation sets

Invariance with respect to spatial transformations is often thought of in terms of group equivariance of the representation and prediction. Instead of invariance with respect to all spatial transformations in a group, we impose a weaker requirement, that is invariance against transformation sets, defined as follows. We denote by a compact subset of images in the support of that can be obtained by transformation of an image . is called a transformation set. For example in the case of rotations, the transformation set corresponds to the set of observed images in a dataset that are different versions of the same image , that can be obtained by small rotations of one another.

By the technical assumption on the space of real images that the sampling operator is bijective, the mapping is bijective. We can hence define , a set of transformation sets, by for a given transformation group. Importantly, the bijectivity assumption also leads to being disjoint for different images . The above definition is distribution dependent and partitions the support of the distribution. More details on the aforementioned concepts and definitions can be found in Sec. A.1 in the Appendix.

We say that a function is (transformation-)invariant if for all for all and denote the class of all such functions by . Using this notation, fitting a model with high accuracy under worst-case “small” transformations of the input can be mathematically captured by the robust optimization formulation [5] of minimizing the robust loss

(1)

in some function space . We call the solution of this problem the (spatially) robust minimizer. While adversarial training aims to optimize the empirical version of Eq. (1), the converged predictor might be far from the global population minimum, in particular in the case of nonconvex optimization landscapes encountered when training neural networks. Furthermore, we show in the following section that for robustness over transformation sets, constraining the model class to invariant functions leads to the same optimizer of the robust loss. These facts motivate invariance-inducing regularization which we then show to exhibit improved robust test accuracy in practice.

2.2 Regularization to encourage invariance

For any regularizer , we define the corresponding constrained set of functions as

where denotes the support of . When and is a semimetric111The weaker notion of a semimetric satisfies almost all conditions for a metric without having to satisfy the triangle inequality. on , we have . We now consider constrained optimization problems of the form

(O1)
(O2)

The following theorem shows that (O1), (O2) are equivalent to (1) if the set of all invariant functions is a subset of the function space .

Theorem 1.

If , all minimizers of the adversarial loss (1) are in . If furthermore , any solution of the optimization problems (O1), (O2) minimizes the adversarial loss.

The proof of Theorem 1 can be found in the Appendix in Sec. A.2. Since exact projection onto the constrained set is in general not achievable for neural networks, an alternative method to induce invariance is to relax the constraints by only requiring . Using Lagrangian duality, (O1) and (O2) can then be rewritten in penalized form for some scalar as

(2)
(3)

In Sec. 2.4 we discuss how ordinary adversarial training, and modified variants that have been proposed thereafter, can be viewed as special cases of Eqs. (2) and (3). On the other hand, the constrained regularization formulation corresponds to restricting the function space and is hence comparable with hand-crafted network architecture design as described in Sec. 3.1.

2.3 Trade-off between natural and robust accuracy

Even though high robust accuracy (1) might be the main goal in some applications, one might wonder whether the robust minimizer exhibits lower accuracy on untransformed images (natural accuracy) defined as [42, 49]. In this section we address this question and identify the conditions for transformation set perturbations under which minimizing the robust loss does not lead to decreased natural accuracy. Notably, it even increases under mild assumptions.

One reason why adversarial examples have attracted a lot of interest is because the prediction of a given classifier can change in a perturbation set in which all images appear the same to the human eye. Mathematically, in the case of transformation sets, the latter can be modeled by a property of the true distribution. Namely, it translates into the conditional distribution given , denoted by , being constant for all belonging to the same subset . In other words, is conditionally independent of given , i.e. . Under this assumption the next theorem shows that there is no trade-off in natural accuracy for the transformation robust minimizer.

Theorem 2 (Trade-off natural vs. robust accuracy).

Under the assumption of Theorem 1 and if holds, the adversarial minimizer also minimizes the natural loss. If moreover, has support for every and the loss is injective, then every minimizer of the natural loss also has to be invariant.

As a consequence, minimizing the constrained optimization problem (O1) could potentially help in finding the optimal solution to minimize standard test error. Practically, the assumption on the distribution of the transformation sets corresponds to assuming non-zero inherent transformation variance in the natural distribution of the dataset. In practice, we indeed observe a boost in natural accuracy for robust invariance-inducing methods in Sec. 4 on SVHN, a commonly used benchmark dataset for spatial-equivariant networks for this reason.

One might wonder how this result relates to several recent publications such as [42, 49] that presented toy examples for which the robust solution must have higher natural loss than the Bayes optimal solution even in the infinite data limit. On a fundamental level, perturbation sets are of different nature compared to transformation sets on generic distributions of . In the distribution considered in [42, 49], there is no unique mapping from to a perturbation set and thus the conditional independence property does not hold in general.

2.4 Different regularizers and practical implementation

In order to improve robustness against spatial transformations we consider different choices of in the regularized objectives (2) and (3) that we then compare empirically in Sec. 4. This allows us to view a number of variants of adversarial training in a unified framework. Broadly speaking, each approach listed below consists of first searching an adversarial example according to some mechanism which is then included in a regularizing function, often some weak notion of distance between the prediction at and the new example. The following choices of regularizers involve the maximization of a regularizing function over the transformation set

where is the KL divergence on the softmax of the (logit) vectors . In all cases we refer to the maximizer as an adversarial example that is found using defense mechanisms as discussed in Section 3.3. Note that for and the assumption in Theorem 1 is satisfied.

Instead of performing a maximization of the regularizing function to find the adversarial example , we can also choose in alternative ways The following variants are explored in the paper, two of which are reminiscent of previous work

The last regularizer suggests using an additive penalty on top of data augmentation, with either one or even multiple random draws, where the penalty can be any of the above semimetrics between and , such as the or distance. Albeit suboptimal, the experimental results in Section 4 suggest that simply adding the additive regularization penalty on top of randomly drawn data matches general adversarial training in terms of robust prediction at a fraction of the computational cost. In addition, Theorem 2 suggests that even when the goal is to improve standard accuracy and one expects inherent variance of nuisance factors in the data distribution it is likely helpful to use regularized data augmentation with instead of vanilla data augmentation. Empirically we observe this on the SVHN dataset in Section 4.

Adversarial example for spatial transformation sets Since is not a closed group and we do not even know whether the observation lies at the boundary of or in the interior, we cannot solve the maximization constrained to in practice. However, for an appropriate choice of set , we can instead minimize an upper bound of (1) which reads

(4)

where is the set of transformations that we search over and denotes the transformed image with transformation (see Sec. A.1 in the Appendix for an explicit construction of the transformation search set ). The left hand side in (4) is hence what we aim to solve in practice where the expectation is over the empirical joint distribution of . The relaxation of to a range of transformations of that is is also used for the maximization within the regularizers.

In Figure 1 one pair of example images is shown: the original image (panel (a)) is depicted along with a transformed version with (panel (b)) and the respective predictions by a standard neural network classifier.

3 Experimental setup

In our experiments, we compare invariance-inducing regularization incorporated via various augmentation-based methods (as described in Section 2.4) used on standard networks and representative spatial equivariant networks trained using standard optimization procedures.

3.1 Spatial equivariant networks

We compare the robust prediction accuracies from networks trained with the regularizers with three specialized architectures, designed to be equivariant against spatial transformations and translations: (a) G-ResNet44(GRN) [8]

using p4m convolutional layers (90 degree rotations, translations and mirror reflections) on CIFAR-10; (b) Equivariant Transformer Networks (ETN)

[41], a generalization of Polar Transformer Networks (PTN) [12]

, on SVHN; and (c) Spatial Transformer Networks (STN)

[19] on SVHN. A more comprehensive discussion of the literature on equivariant networks can be found in Sec. 5. We choose the architectures listed above based on availability of reproducible code and previously reported state-of-the art standard accuracies on SVHN and CIFAR10. We train GRN, STN and ETN  using standard augmentation as described in Sec. 3.4 (std) and random rotations in addition (). Out of curiosity we also trained a “two-stage” STN where we train the localization network separately in a supervised fashion. Specifically, we use a randomly transformed version of the training data, treating the transformation parameters as prediction targets. Details about the implementation and results can be found in Sec. B in the Appendix.

3.2 Transformations

The transformations that we consider in Sec. 4 are small rotations (of up to ) and translations in two dimensions of up to 3 px corresponding to approx. 10% of the image size. For augmentation based methods we need to generate such small transformations for a given test image. Although the definition of a transformation

in the theoretical section using the corresponding continuous image functions is clean, we do not have acccess to the continuous function in practice since the mapping is in general not bijective. Instead, we use bilinear interpolation, as implemented in TensorFlow and in a differentiable version of a transformer

[19] for first order attack and defense methods.

(a) (b)
Figure 1: Example images and classifications by the Standard model. (a) An image that is correctly classified for most of the rotations in the considered grid. (b) One rotation for which the image shown in (b) is misclassified as “airplane”.

On top of interpolation, rotation also creates edge artifacts at the boundaries, as the image is only sampled in a bounded set. The empty space that results from translating and rotating an image is filled with black pixels (constant padding) if not noted otherwise. Fig. 1 (b) shows an example. [11]

additionally analyze a “black canvas“ setting where the images are padded with zeros prior to applying the transformation, ensuring that no information is lost due to cropping. Their experiments show that the reduced accuracy of the models cannot be attributed to this effect. Since both versions yield similar results, we report results on the first version of pad and crop choices, having input images of the same size as the original.

3.3 Attacks and defenses

The attacks and defenses we choose essentially follow the setup in [11]. The defense refers to the procedure at training time which aims to make the resulting model robust to adversarial examples. It generally differs from the (extensive) attack mechanism performed at evaluation time to assess the model’s robustness due to computational constraints.

Considered attacks First order methods such as projected gradient descent that have proven to be most effective for transformations are not optimal for finding adversarial examples with respect to rotations and translations. In particular, our experiments confirm the observations reported in [11] that the most adversarial examples can be found through a grid search. For the grid search attack, the compact perturbation set is discretized to find the transformation resulting in a misclassification with the largest loss . In contrast to the case of -adversarial examples, this method is computationally feasible for the 3-dimensional spatial parameters. We consider a default grid of 5 values per translation direction and 31 values for rotation, yielding 775 transformed examples that are evaluated for each . We refer to the accuracy attained under this attack as grid accuracy. How did we ensure the number of transformations in the grid are sufficient? Considering with a finer grid of 7500 transformations for a subset of the experiments, summarized in Table 11, showed only minor reductions in accuracy compared to the coarser grid. Therefore, we chose the latter for computational reasons.

Considered defenses For the adversarial example which maximizes either the loss or regularization function, we use the following defense mechanisms:

  • worst-of-: At every iteration , we sample different perturbations for each image in the batch. The one resulting in the highest function value is used as the maximizer. Most of our experiments are conducted with consistent with [11] as a higher only improved performance minimally (see Table 6).

  • Spatial PGD: In analogy to common practice for adversarial training as in e.g. [40, 26], the S-PGD  mechanism uses projected gradient descent with respect to the translation and rotation parameters with projection on the constrained set of transformations. We consider 5 steps of PGD, starting from a random initialization, with step sizes of (following [11]) for horizontal-, vertical translation and rotation respectively. A discussion on the discrepancy between S-PGD as a defense and attack mechanism can be found in Section C.2.

  • Random: Data augmentation with a distinct random perturbation per image and iteration. This can be seen as the most naive “adversarial” example as it corresponds to worst-of- with .

3.4 Training details

The experiments are conducted with deep neural networks as the function space and is the cross-entropy loss. In the main paper we consider the datasets SVHN [32] and CIFAR-10 [22]. For the non-specialized architectures, we train a ResNet-32 [16], implemented in TensorFlow [1]. For the Transformer networks STN and ETN we use a 3-layer CNN as localization according to the default settings in the provided code of both networks for SVHN and rot-MNIST. For a subset of the experiments we also report results for CIFAR-100 [22] in the Appendix.

We train the baseline models with standard data augmentation: random left-right flips and random translations of followed by normalization. Below we refer to the models trained in this fashion as “std”. For the models trained with one of the defenses described in Sec. 3.3, we only apply random left-right flipping since translations are part of the adversarial search. The special case of data augmentation (with translations and rotations, i.e. the defense “random”) without regularization is refered to as .

For optimization of the empirical training loss, we run standard minibatch SGD with a momentum term with parameter and weight decay parameter . We use an initial learning rate of which is divided by after half and three-quarters of the training steps. Independent of the defense method, we fix the number of iterations to for SVHN and CIFAR-10, and to for CIFAR-100. For comparability across all methods, the number of unique original images in each iteration is in all cases. For the baselines and Adversarial training, we additionally trained with a conventional batch size of and report the higher accuracy of both versions. For the regularized methods, the value of is chosen based on the test grid accuracy. All models are trained using a single GPU on a node equipped with an NVIDIA GeForce GTX 1080 Ti and two 10-core Xeon E5-2630v4 processors.

Figure 2: Mean runtime for different methods on CIFAR-10. The connected points correspond to Wo-  defenses with .

4 Empirical Results

We now compare the natural test accuracy (standard accuracy on the test set, abbreviated as nat) and test grid accuracy (as defined in Sec. 3.3, abbreviated as rob) achieved by standard and regularized (adversarial) training techniques as well as specialized spatial equivariant architectures described in Sec. 3.1. For clarity of presentation, the naming convention we use in the rest of the paper consists of the following components: (a) Reg: refers to what regularizer was used (AT, ALP, , KL, or KL-C as defined in Section 2.4); (b) batch: indicates whether the gradient of the loss is taken with respect to the adversarial examples (rob), natural examples (nat) or both (mix), and (c) def: the mechanism used to find the adversarial example, including random (rnd), worst-of- (Wo-) and spatial PGD (S-PGD) as described in Sec. 3.3. Thus, Reg(batch, def) corresponds to using Regas the regularization function, the examples defined by batch  in the gradient of the loss and the defense mechanism def  to find the augmented or adversarial examples.

In Table 1, we report results for a subset of the Reg(batch, def) combinations to facilitate comparisons. Tables with many more combinations can be found Tables 49

in the Appendix. We report averages (standard errors are contained in Tables 

49

) computed over five training runs with identical hyperparameter settings. We compare all methods by computing absolute and relative error reductions (defined as

). It is insightful to present both numbers since the absolute values vary drastically between datasets.


std
AT  (rob,
Wo-)
KL  (rob,
Wo-)
 (rob,
Wo-)
ALP (rob,
Wo-)
KL-C (mix,
S-PGD)
ALP (rob,
S-PGD)
SVHN (nat) 95.48 93.97 96.03 96.13 96.53 96.30 96.14 96.11
   (rob) 18.85 82.60 90.35 92.71 92.55 92.04 92.42 92.32
CIFAR (nat) 92.11 89.93 91.76 90.41 90.53 90.11 89.98 89.85
   (rob) 9.52 58.29 71.17 77.47 77.06 75.9 78.93 77.80
GRN ETN STN
SVHN (nat) 96.07 95.05 95.53 95.57 95.61 95.55
   (rob) 25.12 84.9 13.15 84.21 36.68 79.28
CIFAR (nat) 93.39 93.08
   (rob) 16.85 71.64
Table 1: Mean accuracies of models trained with various forms of regularized adversarial training as well as standard augmentation techniques (top) and spatial equivariant networks (bottom). denotes standard augmentation plus random rotations. The highest accuracies per row are bolded.

KL  (nat, Wo-)   (nat, Wo-) ALP  (nat, Wo-)   (nat, rnd) KL  (nat, rnd)   (rob, rnd) KL  (rob, rnd) SVHN (nat) 96.00 96.05 96.39 93.97 96.34 96.16 96.09 96.23    (rob) 92.27 92.16 91.98 82.60 90.51 90.69 90.48 90.92 CIFAR (nat) 90.83 88.32 88.78 89.93 87.80 89.33 88.75 89.47    (rob) 77.34 75.64 75.43 58.29 71.60 73.50 71.49 73.22
Table 2: Mean accuracies of models trained with various forms of regularized adversarial training. Left: All adversarial examples were found via Wo-; right: unregularized () and regularized data augmentation where the optimum is bolded for each row.

Effectiveness of augmentation-based invariance-inducing regularization In Table 1 (top), the three leftmost columns represent unregularized methods which all perform worse in grid accuracy than regularized methods and the two right-most columns represent adversarial examples with respect to the classification cross entropy loss found via S-PGD. When considering the three regularizers (KL, , ALP) with the same batch  and def  (here chosen to be “rob” and Wo-) regularized adversarial training improves the grid accuracy from to on CIFAR-10 and to on SVHN, corresponding to a relative error reduction of and respectively. The same can be observed when comparing data augmentation and its regularized variants in Table 2. Together with Tables 5 and 6, S-PGD seems to be the more efficient defense mechanism compared to worst-of- even when is raised to , with comparable computation time.

Computational considerations In Figure 2, we plot the grid accuracy vs. the runtime (in hours) for a subset of regularizers and defense mechanisms on CIFAR-10 for clarity of presentation. How much overhead is needed to obtain the reported gains? Comparing AT(rob, Wo-) (green line) and ALP(rob, Wo-) (red line) shows that significant improvements in grid accuracy can be achieved by regularization with only a small computational overhead. What if we make the defense stronger? While the leap in robust accuracy from Wo- (also referred to as rnd) to Wo- is quite large, increasing to 20 only gives diminishing returns while requiring more training time. This observation is summarized exemplarily for both KL and ALP regularizer on CIFAR-10 in Table 7. Furthermore, for any fixed training time, regularized methods exhibit higher robust accuracies where the gap varies with the particular choice of regularizer and defense mechanism.

Comparison with spatial equivariant networks Although the rotation-augmented G-ResNet44  obtains higher grid (SVHN: , CIFAR-10: ) and natural accuracies (SVHN: , CIFAR-10: ) than the rotation-augmented Resnet-32 on both SVHN (grid: , nat: ) and CIFAR10 (grid: , nat: ), regularizing standard data augmentation (i.e. regularizers with “rnd”, see Table 2 (right)) using both the distance and the KL divergence matches the G-ResNet44  on CIFAR10 (: , KL: ) and surpasses it on SVHN on grid (: , KL: ) and natural accuracies by a relative grid error reduction of . The same phenomenon is observed for the augmented ETN and STN on SVHN.333We had difficulties to train both ETN and STN to higher than natural accuracy for CIFAR10 even after an extensive learning rate and schedule search so we do not report the numbers here. In conclusion, regularized augmentation based methods match or outperform representative end-to-end networks handcrafted to be equivariant to spatial transformations.

Trade-off natural vs. adversarial accuracy SVHN is one of the main datasets (without artificial augmentation like in rot-MNIST [25]) where spatial equivariant networks have reported improvements on natural accuracy. This is due to the inherent orientation variance in the data. In our mathematical framework, this corresponds to the assumption in Theorem 2 of the distribution on the transformation sets having support . Furthermore, as all numbers in SVHN have the same label irrespective of small rotations of at most 30 degrees, the first assumption in Theorem 2 is also fulfilled. Table 1 and  2 confirm the statement in the Theorem that improving robust accuracy may not hurt natural accuracy or even improve it: For SVHN, adding regularization to samples obtained both via Wo- adversarial search or random transformation (rnd) consistently not only helps robust but also standard accuracy.

Comparing the effects of different regularization parameters on test grid accuracy We study Tables 1 and 2 and attempt to disentangle the effects by varying only one parameter. For example we can observe that, computational cost aside, fixing any regularizer defense to Wo-, the robust regularized loss Reg(rob, Wo-) (i.e., ) does better (or not statistically significantly worse) than Reg(nat, Wo-) (i.e., ). Furthermore, the KL regularizer generally performs better than for a large number of settings. A possible explanation for the latter could be that upper bounds the squared

loss on the probability simplex and is hence more restrictive.

Choice of The different regularization methods peak at different in terms of grid accuracy. However, they outperform unregularized methods in a large range of values, suggesting that well-performing values of are not difficult to find in practice. These can be seen in Figures 4 and 5 in the Appendix.

There are many more interesting experiments we have conducted for subsets of the defenses and datasets illustrating different phenomena that we observe. For example we have analyzed a finer grid for the grid search attack and evaluated S-PGD as an attack mechanism. A detailed discussion of these experiments can be found in Sec. C.2.

5 Related work

Group equivariant networks There are in general two types of approaches to incorporate spatial invariance into the network. In one of the earlier works in the neural net era, Spatial Transformer Networks were introduced [19] which includes a transformer module that predicts transformation parameters followed by a transformer. Later on, one line of work proposed multiple filters that are discrete group transformations of each other [24, 27, 8, 51, 45]. For continuous transformations, steerability [43, 9] and coordinate transformation [12, 41] based approaches have been suggested. Although these approaches have resulted in improved standard accuracy performances, it has not been rigorously studied whether or by how much they improve upon regular networks with respect to robust test accuracy.

Regularized training Using penalty regularization to encourage robustness and invariance when training neural networks has been studied in different contexts: for distributional robustness [17], domain generalization [30], adversarial training [31, 21, 49], robustness against simple transformations [7]

and semi-supervised learning

[50, 46]. These approaches are based on augmenting the training data either statically [17, 30, 7, 46], ie. before fitting the model, or adaptively in the sense of adversarial training, with different augmented examples per training image generated in every iteration [21, 31, 49].

Robustness against simple transformations Approaches targeting adversarial accuracy for simple transformations have used attacks and defenses in the spirit of PGD (either on transformation space [11] or on input space projecting to transformation manifold [20]) and simple random or grid search [11, 34]. Recent work [10] has also evaluated some rotation-equivariant networks with different training and attack settings which reduces direct comparability with e.g. adversarial based defenses [11].

6 Conclusion

In this work, we have explored how regularized augmentation-based methods compare against specialized spatial equivariant networks in terms of robustness against small translations and rotations. Strikingly, even though augmentation can be applied to encourage any desired invariance, the regularized methods adapt well and perform similarly or better than specialized networks. Furthermore, we have introduced a theoretical framework incorporating many forms of regularization techniques that have been proposed in the literature. Both theoretically and empirically, we showed that for transformation invariances and under certain practical assumptions on the distribution, there is no trade-off between natural and adversarial accuracy which stands in contrast to the debate around -perturbation sets. In summary, it is advantageous to replace unregularized with regularized training for both augmentation and adversarial defense methods. With regard to the regularization parameter choice we have seen that improvements can be obtained for a large range of values, indicating that this additional hyperparameter is not difficult to tune in practice. In future work, we aim to explore whether specialized architectures can be combined with regularized adversarial training to improve upon the best results reported in this work.

7 Acknowledgements

We thank Ludwig Schmidt for helpful discussion, Nicolai Meinshausen for valuable feedback on the manuscript and Luzius Brogli for initial experiments. FY was supported by the Institute for Theoretical Studies ETH Zurich, the Dr. Max Rössler and Walter Haefner Foundation and the Office of Naval Research Young Investigator Award N00014-19-1-2288.

References

  • [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
  • [2] Michael A Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei-Shinn Ku, and Anh Nguyen. Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. arXiv preprint arXiv:1811.11553, 2018.
  • [3] Henry S Baird. Document image defect models. In Structured Document Image Analysis, pages 546–556. Springer, 1992.
  • [4] Andrew R Barron.

    Universal approximation bounds for superpositions of a sigmoidal function.

    IEEE Trans. Info. Theory, 39(3):930–945, 1993.
  • [5] Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. Robust optimization, volume 28. Princeton University Press, 2009.
  • [6] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
  • [7] Gong Cheng, Junwei Han, Peicheng Zhou, and Dong Xu.

    Learning rotation-invariant and Fisher discriminative convolutional neural networks for object detection.

    IEEE Transactions on Image Processing, 28(1):265–278, 2019.
  • [8] Taco Cohen and Max Welling. Group equivariant convolutional networks. In Proceedings of the International Conference on Machine Learning, pages 2990–2999, 2016.
  • [9] Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical CNNs. In Proceedings of the International Conference on Learning Representations, 2018.
  • [10] Beranger Dumont, Simona Maggio, and Pablo Montalvo. Robustness of rotation-equivariant networks to adversarial perturbations. arXiv preprint arXiv:1802.06627, 2018.
  • [11] Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. Exploring the landscape of spatial robustness. In Proceedings of the International Conference on Machine Learning, 2019.
  • [12] Carlos Esteves, Christine Allen-Blanchette, Xiaowei Zhou, and Kostas Daniilidis. Polar transformer networks. In Proceedings of the International Conference on Learning Representations, 2018.
  • [13] A. Fawzi and P. Frossard. Manitest: Are classifiers really invariant? In British Machine Vision Conference (BMVC), 2015.
  • [14] Robert Geirhos, Carlos RM Temme, Jonas Rauber, Heiko H Schütt, Matthias Bethge, and Felix A Wichmann. Generalisation in humans and deep neural networks. In Advances in Neural Information Processing Systems, pages 7549–7561, 2018.
  • [15] Boris Hanin. Universal function approximation by deep neural nets with bounded width and relu activations. arXiv preprint arXiv:1708.02691, 2017.
  • [16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In

    Proceedings of the IEEE Conference on Computer Vision and Patern Recognition

    , pages 770–778, 2016.
  • [17] Christina Heinze-Deml and Nicolai Meinshausen. Conditional variance penalties and domain shift robustness. arXiv preprint arXiv:1710.11469, 2017.
  • [18] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.
  • [19] Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial Transformer Networks. In Advances in Neural Information Processing Systems, pages 2017–2025, 2015.
  • [20] Can Kanbak, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Geometric robustness of deep networks: analysis and improvement. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 4441–4449, 2018.
  • [21] Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial Logit Pairing. arXiv preprint arXiv:1803.06373, 2018.
  • [22] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 4, University of Toronto, 2009.
  • [23] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
  • [24] Dmitry Laptev, Nikolay Savinov, Joachim M Buhmann, and Marc Pollefeys. TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 289–297, 2016.
  • [25] Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, pages 473–480, 2007.
  • [26] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.

    Towards deep learning models resistant to adversarial attacks.

    In Proceedings of the International Conference on Learning Representations, 2018.
  • [27] Diego Marcos, Michele Volpi, Nikos Komodakis, and Devis Tuia. Rotation equivariant vector field networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 5058–5067, 2017.
  • [28] Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. In Proceedings of the International Conference on Machine Learning, pages 3575–3583, 2018.
  • [29] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 2574–2582, 2016.
  • [30] Saeid Motiian, Marco Piccirilli, Donald A Adjeroh, and Gianfranco Doretto. Unified deep supervised domain adaptation and generalization. In Proceedings of the IEEE International Conference on Computer Vision, volume 2, page 3, 2017.
  • [31] Taesik Na, Jong Hwan Ko, and Saibal Mukhopadhyay.

    Cascade adversarial machine learning regularized with a unified embedding.

    In Proceedings of the International Conference on Learning Representations, 2018.
  • [32] Y. Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on Deep Learning and Unsupervised Feature Learning, page 5, 2011.
  • [33] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the ACM Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.
  • [34] Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. Towards practical verification of machine learning: The case of computer vision systems. arXiv preprint arXiv:1712.01785, 2017.
  • [35] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. In Proceedings of the International Conference on Learning Representations, 2018.
  • [36] Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, and Percy Liang. Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032, 2019.
  • [37] Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-GAN: Protecting classifiers against adversarial attacks using generative models. In Proceedings of the International Conference on Learning Representations, 2018.
  • [38] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, 2015.
  • [39] Aman Sinha, Hongseok Namkoong, and John Duchi. Certifiable distributional robustness with principled adversarial training. In Proceedings of the International Conference on Learning Representations, 2018.
  • [40] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Proceedings of the International Conference on Learning Representations, 2014.
  • [41] Kai Sheng Tai, Peter Bailis, and Gregory Valiant. Equivariant Transformer Networks. In Proceedings of the International Conference on Machine Learning, 2019.
  • [42] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry.

    Robustness may be at odds with accuracy.

    In Proceedings of the International Conference on Learning Representations, 2019.
  • [43] Maurice Weiler, Fred A Hamprecht, and Martin Storath. Learning steerable filters for rotation equivariant CNNs. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition, 2018.
  • [44] Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, pages 5283–5292, 2018.
  • [45] Daniel E Worrall, Stephan J Garbin, Daniyar Turmukhambetov, and Gabriel J Brostow. Harmonic networks: Deep translation and rotation equivariance. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 5028–5037, 2017.
  • [46] Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V. Le. Unsupervised data augmentation. arXiv preprint arXiv:1904.12848, 2019.
  • [47] Larry S. Yaeger, Richard F. Lyon, and Brandyn J. Webb. Effective training of a neural network character classifier for word recognition. In Advances in Neural Information Processing Systems, pages 807–816, 1997.
  • [48] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. In Proceedings of the International Conference on Learning Representations, 2015.
  • [49] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. In Proceedings of the International Conference on Machine Learning, 2019.
  • [50] Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. Improving the robustness of deep neural networks via stability training. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 4480–4488, 2016.
  • [51] Yanzhao Zhou, Qixiang Ye, Qiang Qiu, and Jianbin Jiao. Oriented response networks. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 519–528, 2017.

Appendix A Appendix

a.1 Rigorous definition of transformation sets and choice of

In the following we introduce the concepts that are needed to rigorously define transformation sets that are subsets of the finite-dimensional (sampled) image space . In particular, because rotations of continuous angles are not well-defined for sampled images we need to introduce the space of image functions with elements , i.e.  maps Euclidean coordinates in to the RGB intensities of an image. The observed finite-dimensional vector is then a sampled version of an image function . Here we assume that the sampling operator is bijective, with rigorous definitions later in the section.

Next we define subsets in the continuous function space and then transfer the concept back to the finite-dimensional . Let us define the symmetric group of all rotations and horizontal and vertical translations acting on . We denote the elements in the group by , uniquely parameterized by and can be represented by a coordinate transform matrix , see e.g. [8]. Two of the three dimensions represent the values for the translations and the third represents the rotation.

The transformed image (function) can be expressed by for each where is the coordinate transform matrix associated with as in [8]. For each , the group orbit is . By definition, the group orbits partition the space and every belongs to a unique orbit.

Subsets of orbits

In our setting, requiring invariance in the entire orbit (i.e. with respect to all translations and rotations) is too restrictive. First of all, large transformations rarely occur in nature because of physical laws and common human perspectives (an upside down tower for example). Secondly, in image classification, robustness is usually only required against adversarial attacks which would not fool humans, i.e. lead them to mislabel the image. If the transformation set is too large, this requirement is no longer fulfilled. For this purpose we consider a closed subset of each group orbit. It follows from the group orbit definition that for every it either belongs to one unique or no such set.

As described in the paragraph of Equation (4), when observing a (sampled) image in the training set, we do not know where in its corresponding subset it lies. At the same time, for our augmentation-based methods, we do not want the set of transformations that we search over (transformation search set for short), to be image dependent. Instead, in this construction we aim to find to be the smallest set of transformations such that (4) is satisfied. For this purpose, it suffices that the effective search set of images for any image covers the corresponding subset for all , i.e.

Here we give an explicit construction of using the maximal transformation for each subset that is needed to transform an image of the subset to another. In particular, we define the maximal transformation vector by the element-wise maximum over all such maximum transformations

for

. Although the subsets themselves for each image are not known, using prior knowledge in each application one can usually estimate the largest possible range of transformations

against which robustness is desired or required. For example for images, one could use experiments with humans to determine for which range of angles their reaction time to correctly label each image stays approximately constant. The maximal vector can now be used to determine the minimal set of transformations . A simplified illustration for when consists of just one orbit (corresponding for example to one image function and all its rotated variants) can be found in Figure 3.

(a) (b) (c)
Figure 3: Illustration of an example where one group orbit is the entire space of images and is an arbitrary image in the orbit . We depict one subset of the orbit and the effective search sets for different instantiations defined by the transformation search set : (a) on the left boundary of , (b) in the interior of and (c) on the right boundary of . The effective search sets are centered around each instantation . The necessity of symmetry of the minimal set of transformations arises from the requirement to cover from both boundary points and the maximum transformation vector that defines is determined by the maximum transformation in (in blue).

Sampling issues

In reality, the observed image is not a function on but a vector that is the result of sampling an image function . We use to denote the sampling operator and hence . Then the space of observed finite dimensional images is the range space of . In order to counter the problem that the sampling operator is in general not injective, we add another constraint to by requiring that is bijective so that the quantity is well-defined. That is, for a finite-dimensional image , there exists exactly one possible continuous image . As a consequence, if and a transformed version exist in , then . This is a rather technical assumption that is typically fulfilled in practice. In the main text, we also refer to as the image corresponding to the sampled image transformed by the group element .

We can now define specific to be the subsets of such that with , the set corresponds to the support of the marginal distribution on . We refer to as transformation sets. By definition of and bijectivity of , there is an injective mapping from any to the set of transformation sets .

a.2 Proof of Theorem 1

Please refer to Section A.1 for the necessary notation for this section. Furthermore, define .

We prove the first statement of the theorem by contradiction. Let be the minimizer of and let us assume that and in particular that it is constant on all transformation sets except and the marginal distribution over that can be defined as for any , is discrete (for simplicity of presentation) and has non-zero probability.

Let’s assume that there is at least one transformation set , on which is not constant and collect all different values in the set (with cardinality strictly bigger than since not constant) and denote the distribution over by . Since there is a unique mapping that maps each to a unique transformation (see Section A.1), we can lower bound of the robust loss as follows for any :

(5)

where the inequality follows from

The right hand side is minimized with respect to the set by choosing where is defined as because setting for all and else leads to equality in equation (5) and by assumption that . Morever, since by assumption, choosing for all implies which contradicts optimality of and thus proves the first statement of the theorem.

For the second statement let us rewrite

By the first statement we know that the set of invariant functions that minimize the robust loss

is non-empty. For all , it holds by definition of that .

Since , the minimizers of (O1) satisfy for all . But because in we have and it directly follows that . The same argument goes through for (O2) since for all , we have . This concludes the proof of the theorem.

a.3 Proof of Theorem 2

On a high level, similar to the proof of Theorem 1, we can construct a minimizer of the natural loss given the assumption that . Since on both losses are equivalent, together with Theorem 1 this shows that the robust minimizer also minimizes the unconstrained natural loss.

Assume minimizes , and in particular, it is constant on all transformation sets except for some . Again by existence of a mapping and by assumption we can write for any

(6)

We then obtain

(7)

when setting for all and otherwise. Together with equation (6), we thus have that for all by definition of .

If additionally the support of is equal to and is injective, the inequality (7) becomes a strict inequality for and hence we have which contradicts the definition of being the minimizer of the natural loss.

Appendix B Two-stage STN

Since STNs are known to be sensitive to hyperparameter settings and thus difficult to train end-to-end [41], we apply the following two-stage procedure to simulate its functionality: (1) we first train a ResNet-32 as a localization regression network (LocNet) to predict the attack perturbation separately by learning from a training set, which contains perturbed images and uses the transformations as the prediction targets; (2) at the same time we train a ResNet-32 classifier with data augmentation, namely random translations and rotations; (3) during the test phase, the output of the LocNet is used by a spatial transformer module that transforms the image before entering the pretrained classifier. We refer to this two-stage STN as STN+.

LocNet and Classifier

For the classifiers, we take the two models trained on CIFAR-10 and SVHN using standard data augmentation and random rotations from our previous experiments. Since we do not expect the regressors (or LocNets) to be perfect in terms of prediction capability, there will still be some transformation left after the regression stage. Thus, the classifiers should effectively see a smaller range of transformations than without the inclusion of a LocNet and transformer module. The training procedure used to train the classifiers is described in Section 3.4.

Effect of rendering edges on LocNet

The LocNet is trained on zero padded – suffix – as well as reflect padded inputs – suffix – for comparison. The former possibly yields an unfair advantage of this approach compared to other methods as the neural network can exploit the edges (induced through zero padding) to learn the transformation parameters. Therefore, we also consider reflection padding to assess the effect of the different paddings on final performance. Nonetheless, zero padding is consistent with the augmentation setting for the end-to-end trained networks and regularized methods and was also the choice considered by [11]. For completeness we also show results when using reflection padding for training LocNet although it lacks comparability with the other methods since attacks should be reflection-padded as well.

Minimizing loss of information in the prediction transformation process

In the spatial transformer module we compare two variants of handling the labels predicted by the LocNet. We can either back-transform the transformed image with the negative predicted labels, which will, under the assumption that the regressor successfully learnt object orientations, turn back the image but potentially result in extra padding space before we feed the images into the classifier. Alternatively, we can subtract the predicted transformation from the attack transformation, then use the remaining transformation as the new “attack transformation”. The latter will result in much smaller padding areas, if the LocNet is performing well. From the experimental results we do see a big drop if we naively transform images twice. We denote the former method as “naive” and latter as “trick”.

Observed results

For CIFAR-10, this two-stage classifier achieved relatively high grid accuracies. However, the obtained accuracies are still lower than expected, given that the LocNet is allowed to learn rotations with a separately trained regressor on the transformed training set. For SVHN we also see a gain compared to adversarial training without regularizer. However, the performance still lags behind the accuracies obtained by the regularizers. The results are summarized in Table 3.

Dataset STN+(c) trick STN+(r) trick STN+(c) naive STN+(r) naive
SVHN (nat) 94.92 95.51 94.92 95.51
   (rob) 90.95 90.28 64.91 59.68
CIFAR10 (nat) 91.29 90.99 91.29 90.99
   (rob) 83.05 84.31 44.88 42.84
Table 3: Accuracies of two-stage STN  (STN+) under different settings. Details are provided in Section B.

Appendix C More experimental results

In this section we discuss additional experimental results that we collected and and analyzed.

c.1 Stability to selection of regularization parameter

SVHN CIFAR-10
SVHN CIFAR-10
Figure 4: Test grid accuracy (first row) and test natural accuracy (second row) as a function of the regularization parameter for the SVHN (first column) and CIFAR-10 (second column) datasets and data augmentation (“rnd”). The test grid accuracy is relatively robust in a large range of values while natural test accuracy decreases with larger values of .
SVHN CIFAR-10
SVHN CIFAR-10
Figure 5: Test grid accuracy (first row) and test natural accuracy (second row) as a function of the regularization parameter for the SVHN (first column) and CIFAR-10 (second column) datasets and Wo- defenses. The test grid accuracy is relatively robust in a large range of values while natural test accuracy decreases with larger values of .

Figures 4 and 5 show the test grid and test natural accuracy as a function of the regularization parameter . We observe that the regularization methods outperform unregularized methods in terms of grid accuracy in a large range of values.

c.2 Additional experimental results

std std* AT(rob, Wo-10) AT(mix, Wo-)
SVHN (nat) 95.48 (0.15) 93.97 (0.09) 96.03 (0.03) 96.56 (0.07)
   (rob) 18.85 (1.27) 82.60 (0.23) 90.35 (0.27) 88.83 (0.10)
CIFAR-10 (nat) 92.11 (0.18) 89.93 (0.18) 91.76 (0.23) 93.44 (0.19)
    (rob) 9.52 (0.66) 58.29 (0.60) 71.17 (0.26) 68.14 (0.48)
CIFAR-100 (nat) 70.23 (0.18) 66.62 (0.37) 68.79 (0.34) 73.03 (0.13)
    (rob) 5.09 (0.25) 28.53 (0.25) 38.21 (0.10) 35.93 (0.24)

Table 4: Mean accuracies of models trained without regularized adversarial training. Standard errors are shown in parentheses.

Mixed batch experiments

In addition to the results reported in the main text, in this section we also report results on more experiments that use the “mixed batch” setting, meaning that the gradient of the loss is taken with respect to both the adversarial and natural examples. This is common practice in the adversarial example literature [21] and we denote this approach by “mix”. As can be seen in Table 4, for adversarial training a mixed batch improves natural accuracy at the expense of test grid performance. For the regularization methods, we observe a much small, and not consistent, effect of the batch type as can be seen in Table 6. For example, comparing ALP(rob, ) vs. ALP(mix, ) shows that the performance differences are mostly not significant.

KL(nat,
rnd)
KL(nat,
Wo-10)
KL(rob,
Wo-10)
KL-C(mix,
S-PGD)
KL(nat,
S-PGD)
SVHN (nat) 96.16 (0.10) 96.00 (0.02) 96.13 (0.07) 96.14 (0.04) 96.54 (0.01)
   (rob) 90.69 (0.05) 92.27 (0.09) 92.71 (0.09) 92.42 (0.03) 92.62 (0.03)
CIFAR-10 (nat) 89.33 (0.16) 90.83 (0.18) 90.41 (0.05) 89.98 (0.21) 89.82 (0.13)
    (rob) 73.50 (0.19) 77.34 (0.19) 77.47 (0.28) 78.93 (0.23) 78.89 (0.07)

Table 5: Mean accuracies of models trained with various forms of regularized adversarial training, using the KL  regularization function. Standard errors are shown in parentheses.
(nat,
Wo-)
(rob,
Wo-10)
ALP(mix,
Wo-10)
ALP(rob,
Wo-10)
ALP(mix,
Wo-20)
ALP(rob,
S-PGD)
ALP(mix,
S-PGD)
SVHN (nat) 96.05 (0.04) 96.53 (0.03) 96.41 (0.07) 96.3 (0.09) 96.39 (0.04) 96.11 (0.08) 96.30 (0.09)
   (rob) 92.16 (0.05) 92.55 (0.08) 92.17 (0.11) 92.04 (0.19) 92.48 (0.05) 92.32 (0.17) 92.42 (0.20)
CIFAR-10 (nat) 88.32 (0.13) 90.53 (0.16) 91.13 (0.13) 90.11 (0.25) 90.67 (0.12) 89.85 (0.27) 89.70 (0.10)
    (rob) 75.46 (0.25) 77.06 (0.16) 75.89 (0.23) 75.90 (0.31) 76.72 (0.21) 77.80 (0.17) 77.72 (0.35)
CIFAR-100 (nat) - - 68.54 (0.27) - 68.04 (0.27) 89.82 (0.13) 68.44 (0.39)
    (rob) - - 49.30 (0.33) - 49.98 (0.31) 78.89 (0.07) 52.58 (0.20)

Table 6: Mean accuracies of models trained with various forms of regularized adversarial training, using the   and ALP  regularization functions. Standard errors are shown in parentheses.
KL(nat,
Wo-)
KL(nat,
Wo-)
KL(nat,
Wo-)
ALP(rob,
Wo-)
ALP(rob,
Wo-)
ALP(rob,
Wo-)
CIFAR-10 (nat) 89.34 (0.16) 90.83 (0.18) 89.33 (0.22) 89.47 (0.04) 90.11 (0.25) 90.62 (0.07)
   (rob) 73.40 (0.19) 77.34 (0.19) 77.52 (0.16) 73.22 (0.14) 75.90 (0.31) 76.78 (0.15)
Table 7: Mean standard and grid (rob) accuracies of models trained with various forms of regularized adversarial training, using the rnd(equivalent to Wo-1), Wo-10 and Wo-20 defense mechanisms for KL(left) and ALP(right). Standard errors are shown in parentheses.
std* (nat, rnd) KL(nat, rnd) ALP(rob, rnd) KL(rob, rnd) ALP(mix, rnd)
SVHN (nat) 93.97 (0.09) 96.34 (0.08) 96.16 (0.10) 96.09 (0.06) 96.23 (0.08) 96.19 (0.07)
   (rob) 82.60 (0.23) 90.51 (0.15) 90.69 (0.05) 90.48 (0.16) 90.92 (0.17) 90.48 (0.15)
CIFAR-10 (nat) 89.93 (0.18) 87.80 (0.11) 89.34 (0.16) 88.75 (0.18) 89.47 (0.04) 89.43 (0.28)
    (rob) 58.29 (0.60) 71.60 (0.27) 73.50 (0.19) 71.49 (0.30) 73.22 (0.14) 71.97 (0.11)
Table 8: Mean accuracies of models trained with various forms of augmented training, i.e. unregularized and regularized data augmentation. Standard errors are shown in parentheses.
ALP(nat, Wo-) (nat, Wo-) KL-C(nat, Wo-) KL(nat, Wo-)
SVHN (nat) 96.39 (0.03) 96.05 (0.04) 96.18 (0.06) 96.00 (0.02)
   (rob) 91.98 (0.13) 92.16 (0.05) 91.99 (0.12) 92.27 (0.09)
CIFAR10 (nat) 88.78 (0.11) 88.32 (0.13) 89.61 (0.09) 90.83 (0.18)
    (rob) 75.43 (0.13) 75.46 (0.25) 76.15 (0.23) 77.34 (0.19)
Table 9: Mean accuracies of models trained with various forms of regularized adversarial training. Standard errors are shown in parentheses.

Weakness of first order attack.

Table 10 shows the accuracies of various models trained with S-PGD defenses and evaluated against the S-PGD and the grid search attack on all datasets. We observe that the S-PGD attack constitutes are very weak attack since the associated accuracies are much larger than for the grid search attack. In other words, the S-PGD attack only yields a very loose upper bound on the adversarial accuracy. This stands in stark contrast to attacks and has first been noted and discussed in [11]. Interestingly, using the first order method as a defense mechanism proves to be very effective in terms of grid accuracy. When used in combination with a regularizer this defense yields the largest overall accuracies as shown and discussed in Section 4. Recall that due to computational reasons grid search cannot be used as a defense mechanism. Therefore, the strongest computationally feasible defense does not use the same mechanism as the strongest attack in our setting.

 SVHN (nat) 96.27 (0.00) 96.06 (0.10) 96.30 (0.09)
    (grid) 84.81 (0.01) 87.29 (0.09) 92.42 (0.20)
   (S-PGD) 95.26 (0.04) 95.46 (0.10) 95.92 (0.13)
CIFAR-10 (nat) 92.19 (0.23) 91.83 (0.19) 89.70 (0.10)
    (grid) 64.26 (0.25) 69.74 (0.27) 77.72 (0.35)
   (S-PGD) 88.84 (0.27) 89.87 (0.10) 88.15 (0.21)
CIFAR-100 (nat) 71.11 (0.37) 68.87 (0.19) 68.44 (0.39)
    (grid) 33.40 (0.21) 37.87 (0.12) 52.58 (0.20)
   (S-PGD) 65.01 (0.32) 65.56 (0.12) 66.04 (0.40)
Table 10: Mean accuracies of different models trained with S-PGD defenses and evaluated on the natural test set, against the S-PGD attack and against the grid search attack on the SVHN, CIFAR-10 and CIFAR-100 datasets. While the test accuracy for the S-PGD attack is only slightly lower than the natural accuracy in most cases, the grid accuracy is significantly smaller.

Stronger grid search attack

To evaluate how much grid accuracy changes with a finer discretization of the perturbation set , we compare the default grid to a finer one for a subset of experiments, summarized in Table 11. Specifically, “(grid-775)" shows the test grid accuracy using the default grid containing 5 values per translation direction and 31 values for rotation, yielding a total of 775 transformed examples that are evaluated for each . “(grid-7500)” shows the test grid accuracy on a much finer grid with 10 values per translation direction and 75 values for rotation, resulting 7500 transformed examples. We observe that the test grid accuracy only decreases slightly for the finer grid and the reduction in accuracy is smaller for ALP than for AT. Due to computational reasons we use the grid containing 775 values for all other experiments.

 SVHN (grid-775) 88.83 (0.10) 89.75 (0.17) 92.17 (0.11)
    (grid-7500) 88.02 (0.12) 89.29 (0.15) 91.79 (0.12)
CIFAR-10 (grid-775) 68.14 (0.48) 70.35 (0.16) 75.89 (0.23)
    (grid-7500) 65.69 (0.28) 68.28 (0.16) 74.58 (0.16)
CIFAR-100 (grid-775) 35.93 (0.24) 38.21 (0.10) 49.30 (0.33)
    (grid-7500) 33.62 (0.23) 36.04 (0.21) 47.95 (0.23)
Table 11: Mean accuracies for different models evaluated against two different grid search attacks. grid-775 represents test grid accuracy using the default grid with 775 transformed examples, grid-7500 shows test grid accuracy on a much finer grid with 7500 transformed examples. Test grid accuracy only decreases slightly for the finer grid and the reduction in accuracy is smaller for ALP than for AT.

c.3 Regularization effect on range of incorrect angles

Standard
Figure 6: For 100 randomly chosen examples from the CIFAR-10 dataset, we show which rotations lead to a misclassification by various models. Each row corresponds to one example and each column to one angle in the interval . A dark red square indicates that the corresponding example was misclassified after being rotated by the corresponding angle. The visualization for is more fragmented than for and and the visualization for is more fragmented than for and .