Exploring Model Robustness with Adaptive Networks and Improved Adversarial Training

05/30/2020 ∙ by Zheng Xu, et al. ∙ University of Maryland 0

Adversarial training has proven to be effective in hardening networks against adversarial examples. However, the gained robustness is limited by network capacity and number of training samples. Consequently, to build more robust models, it is common practice to train on widened networks with more parameters. To boost robustness, we propose a conditional normalization module to adapt networks when conditioned on input samples. Our adaptive networks, once adversarially trained, can outperform their non-adaptive counterparts on both clean validation accuracy and robustness. Our method is objective agnostic and consistently improves both the conventional adversarial training objective and the TRADES objective. Our adaptive networks also outperform larger widened non-adaptive architectures that have 1.5 times more parameters. We further introduce several practical “tricks” in adversarial training to improve robustness and empirically verify their efficiency.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep neural networks have achieved impressive performance on many machine learning tasks, which has led to growing interests in deploying these models in practical applications. However, recent studies have revealed that models trained on benign examples are susceptible to

adversarial examples, examples crafted by an adversary to control model behavior at test time [4, 32, 11]. The adversarial perturbation overlaid on top of the benign examples is often small enough to be imperceptible to humans, yet can cause the model to misclassify the image.

The existence of adversarial examples has raised security concerns for many high-stakes real-world applications such as street sign detection for autonomous vehicles. While initial works stated that digital adversarial examples built for sign-detection may not be a real threat since the camera can view the objects from different distances and angles [22], more recent attacks were proposed for making stronger adversarial examples that are invariant to various transformations by optimizing over the expected value of a set of pre-defined transformations [3]. In fact, this security concern has turned into an actual threat after a recent study showed that adversarial stickers are able to fool real-world self-driving cars [13]. These security concerns and threats have guided researchers to create models that are both accurate in prediction and robust to attacks.

Various methods have been proposed for defending against adversarial examples. One popular approach is to detect and reject adversarial examples [23, 25, 40], which can be ineffective when the adversary is aware of the detection method in order to adapt accordingly [5]. Another approach is to introduce regularization for training robust models [7, 18], but the increase in robustness from such methods is limited. [2] showed that many proposed defenses give a false sense of security by obfuscating gradients, as meaningful gradient information is necessary for optimization based attacks. [2] broke these defenses by attacks that build good approximations for the gradients. Among various defense methods, adversarial training [24, 19, 39, 30] is one of the most common methods for training robust models. In adversarial training, a robust model is trained on adversarial examples that are generated on-the-fly, which is effective but also makes adversarial training expensive.

Robust models have some interesting properties that have been revealed in recent studies. First, it is argued that there exists trade-offs between accuracy and robustness [34, 42, 31]. It is difficult to make a model robust to all samples while maintaining the same level of accuracy. Second, it is difficult to adversarially train robust models that generalize since adversarially robust generalization requires more data [28] and models with more capacity [24]. Training high capacity models on large datasets increases the cost of adversarially training robust models. Third, while adversarial training is expensive, it is shown that adversarially trained models learn feature representations that align well with human perception [34]

. These feature embeddings can produce clean inter-class interpolations similar to generative models in Generative Adversarial Networks (GANs) 

[12]. These properties have inspired us to explore model capacity and sample efficiency.

Recently, conditional normalization, built upon instance normalization [35]

or batch normalization 

[17], has been successful in generative models [20] and style transfer [16]. Conditional normalization can be seen as an adaptive network that shifts the statistics of a layer’s activations by applying network parameters conditioned on the latent factors such as style and classes [9, 10]. Inspired by these studies, we propose to exploit adaptive networks for robustness in the adversarial training framework.

Contributions

We propose building hardened networks by adversarially training adaptive networks. To build adaptive networks, we introduce a normalization module conditioned on inputs which allows the network to “adapt” itself for different samples. The conditional normalization module includes a meta-convolutional network that changes the scale and bias parameter for normalization based on input samples. Conditional normalization is a powerful module that enlarges the representative capacity of networks. Our adversarially trained adaptive nets can be potentially more robust than conventional non-adaptive nets as they can adapt the network to be robust to adversarial attacks on a specific sample instead of all samples. Furthermore, adaptive normalization adds far fewer parameters than other methods for increasing expressiveness and robustness (e.g. wide resnets).

Our experiments on the CIFAR-10 and CIFAR-100 benchmarks empirically show that our proposed adaptive networks are better than their non-adaptive counterparts. The adaptive networks even outperform larger networks with more parameters in terms of both (clean) validation accuracy and robustness. Moreover, we have made several key observations that not only help our understanding but also significantly boost the performance of adversarial training. Such “tricks” like larger step-size and initializing with a natural model can be widely used in adversarial training, and help us build stronger baselines for non-adaptive networks. Our adaptive network outperforms previous reported results by about 4%, and the strong baselines we achieved by about 1% in robust accuracy.

The proposed adaptive network can be combined with various other methods to improve the robustness against adversarial examples. Besides extensive experiments with our improved fast adversarial training, we show adaptive networks can be combined with the stronger TRADES [42] objective formulation that is very effective for the CIFAR benchmark, which suggests that our method is objective agnostic and can be helpful in improving many of the well-established baselines. Finally, we introduce a variant of single-step adversarial training, when combined with adaptive network, can approach the robust accuracy of non-adaptive network with multi-step adversarial training. Though our single-step variant performs slightly worse than our improved fast adversarial training, it complements recent interests in accelerating adversarial training and showcases why conventional single-step methods did not result in robustness against iterative attacks.

2 Related work

Here we provide a brief overview of robustness and normalization layers which are closely related to our proposed adaptive networks. We also provide an overview of adversarial training, which plays a critical role in our method,

Robustness, in the white-box threat model, is commonly measured by computing the accuracy of the model on adversarial examples constructed by gradient-based optimization methods starting from validation samples. This evaluation method provides an upper-bound on robustness as there is no theoretical guarantee (at least for all classes of problems) that adversarial examples crafted using first-order gradient information are optimal. From a theoretical point of view, finding optimal adversarial examples is difficult. Some recent works have proposed finding the optimal solution by modeling neural networks as Mixed Integer Programs (MIPs) and solving those MIPs using commercial solvers [33]. However, finding the optimal solution of an MIP is generally NP-hard. Although recent advancements have been made in their formulations by enforcing some properties on the network [38], finding the optimal solution is only feasible for small networks and is very time consuming. That is why certified methods in practice provide lower-bounds on the size of perturbations needed for causing misclassification by solving a relaxed version of the problem.

[27] propose certified defences by including a differentiable certificate as a regularizer. Many studies follow this line of work and propose certified defenses [36, 37, 8]

. While from a theoretical point of view certified defenses are interesting, in practice, adversarial training is still the most popular method for hardening networks – leaders of various computer vision defense competitions and benchmarks utilize adversarial training in their approach

[39, 42, 24].

Adversarial training

, in its general form, corresponds to training on the following loss function,

(1)

where is a differentiable surrogate loss used for training the neural network such as the cross-entropy loss, () is the data-point and its correct label, is the network with trainable parameters , is a hyper-parameter that controls how much weight should be given to training on natural examples, and corresponds to the adversarial perturbation for the sample. To keep the perturbation unrecognizable to humans, the norm of is often bounded. Throughout this paper, we will use the common -norm bound on . Note that our adversarial training loss merges information from both natural and adversarial examples.

Early adversarial example generation methods required many iterations since their goal was to help an attacker build an adversarial example using minimal perturbations [32, 26, 6]. However, from a defender’s perspective, the goal is to train on fast and bounded adversarial examples. With speed in mind, [11] proposed training on a single-step attack called the Fast Gradient Sign Method (FGSM). FGSM computes and sets , where is the perturbation bound. Later, it was shown that stronger attacks such as BIM [21], completely break FGSM adversarially trained models. The BIM attack can be seen as an iterative version of FGSM where during each iteration, the perturbation is updated using an FGSM-type step but with a step-size which is usually smaller than ,

(2)

where is the perturbation at iteration of the BIM attack. After every iteration of the BIM attack (equation (2)), is clipped such that .

Adversarial training started blooming when [24] proposed training on adversarial examples generated using the PGD attack, which is a variant of BIM with a random initialization and projection back on the -norm ball. Through experiments, they showed that the PGD attack is the strongest first-order adversary, which was later verified by [2]. Consequently, almost all of the successful adversarially trained robust models use the PGD algorithm to generate adversarial examples.

Training on adversarial examples generated using PGD increases the cost of training by a factor of , where is the number of iterations of the PGD attack (i.e., number of times we update using equation (2)). While we will use PGD-K attacks for evaluating the robustness of all our models, due to the high computation cost associated with PGD adversarial training, we perform most of our adversarial training by modifying a recently proposed algorithm for speeding up adversarial training [29]. A recent study [1] suggested a well tuned single-step adversarial training can defend against strong PGD adversarial examples. However, the method in [1] heavily depends on domain specific cyclic learning rate schedule, using a step-size which is greater than , and early stopping based on frequent examination. Also, they only justify their results empirically without providing intuition on why this rather unconventional setup () is needed.

Normalization layers such as batch normalization [17] and instance normalization [35]

have become important modules in modern neural networks. Normalization layers standardize input to have zero mean and unit variance, and then shift these statistics using scaling and bias parameters.

[43] suggest scaling and bias parameters can be even more important than standardization. Conditional normalization, where scaling and bias are adaptively determined by latent factors, has shown to be powerful in many computer vision tasks including style transfer [16, 10] and generative adversarial networks [20].

3 Adaptive Networks

We introduce adaptive networks with conditional normalization modules in this section. Our motivation for adding conditional normalization modules is two-fold. First, by introducing adaptive layers conditioned on inputs, we can “adapt” a trained network to be more robust to an individual input sample without requiring any information about its class label – a useful trait for robustness evaluation.

Second, conditional normalization can increase the expressiveness and effective capacity of the network, which has been shown to have a positive effect in improving model robustness. Adversarially trained models with more expressive capacities are more robust than their less expressive alternatives [24, 29]. At a high level, these conditional normalization modules can be considered as adding multi-branch structures to a network which is known to be effective in improving accuracy on validation examples [15]. As we will see in the experiments, our normalization module indeed does improve the clean validation accuracy and is more effective111The adversarially trained adaptive nets have higher validation accuracy and robustness compared to networks with more trainable parameters. than simply widening or concatenating features in practice.

Below, we show how to create an adaptive network by adding conditional normalization modules to the wide residual network (WRN) [41] architecture.

Figure 1: Network architecture with adaptive layers.

3.1 Network architecture

Let represent the feature maps of a convolutional layer for a minibatch of samples, where is the batch size, is the width of the layer (number of channels), and and are the feature map’s height and width. If denotes the element at height , width of the channel from the sample, the conditional normalization module transforms the feature maps as,

(3)

where are scale and bias parameters of the normalization module. The network with conditional normalization becomes adaptive to the latent factor as are outputs of convolutional networks with trainable parameters. Equation (3) represents normalization in a general form: when latent factor is a style image and is normalized by its mean and variance, equation (3) becomes adaptive instance normalization for image style transfer [16]; when latent factor is latent code like random noise, equation (3) becomes the building module for the generator in StyleGAN [20]. We provide details on how we use input sample as latent factor as below.

In our experiments, we add our conditional normalization module to wide residual networks (WRNs) [41] to create adaptive networks for classification. WRNs are a derivative of ResNets [14], and are one of the state-of-the-art architectures used for image classification. A WRN is a stack of residual blocks (fig. 1 (a)). To specify WRNs, we follow [41] and denote the architecture as WRN--, where represents the depth and represents the widening factor of the network.

The WRN architecture for the CIFAR-10 and CIFAR-100 datasets we use in this paper consists of a stack of three groups of residual blocks. There is a downsampling layer between two groups, and the number of channels (width of a convolutional layer) is doubled after downsampling. In the three groups, the width of the convolutional layers are , respectively. Each group contains residual blocks, and each residual block contains two

convolutional layers equipped with ReLU activation and batch normalization. There is a

convolutional layer with

channels before the three groups of residual blocks. And there is a global average pooling, a fully-connected layer and a softmax layer after the three groups. The depth of the WRN is

.

We add conditional normalization for the first residual block of each of the three groups. The normalization module is applied between the two convolutional layers in a block, as shown in fig. 1 (b). The inputs to the conditional normalization module are the feature maps produced by the first convolutional layer. Our conditional normalization module consists of a three layer convolutional network: two convolutional layers with , and one convolutional layer to match the dimension of the three different residual blocks, , respectively. We use average pooling as the last layer to get for equation (3). Our adaptive network is only slightly larger than the original WRN, and becomes more robust when adversarially trained, as shown in section 5.

4 Adversarial training

We briefly review the adversarial training algorithm we will use to make our adaptive networks robust, and discuss the “tricks” we found useful in improving these algorithms. We then introduce a variant of single-step adversarial training that can couple with standard natural training without extra tuning, and shed light on why this single-step method works and why the conventional single-step methods fail to become robust against PGD attacks. Finally, we discuss an alternative objective function for adversarial training as adaptive networks can complement other active research directions in improving the practical robustness of networks.

PGD adversarial training. Well-known robust networks on MNIST and CIFAR-10 were adversarially trained by [24] by setting in equation (1) and only training on adversarial examples. Training just on adversarial examples is justified from a robust optimization framework, and modeled as a two-player constant sum game between the adversary, which is in charge of the perturbation

and the classifier with network parameters

. Formally, we consider adversarial training based on the following minimax formulation,

(4)

[24] solved the optimization problem in equation (4) in an alternating fashion. Before each minimization step on the network parameters , they compute using a PGD-K attack on the fly. Every perturbation update step of the PGD-K attack (equation (2)) requires computing , where are the adversarial perturbations of the j mini-batch after the previous times update step, and represents network parameters at the j minimization iteration. To compute , required for every step of PGD-, we need a complete forward and backward pass on the network. As a result, every iteration of PGD adversarial training is times more expensive than an iteration of natural training. A typical value used for is 7 to train a robust model for CIFAR-10 benchmark [24].

Fast adversarial training. To speed up training of robust models, we adopt a fast adversarial training algorithm recently proposed by [29]. [29] showed that they can achieve comparable robustness to PGD adversarial training [24] on the datasets of our interest (CIFAR-10 and CIFAR-100) with roughly the same cost as standard (non-robust) training.

The fast algorithm (Free-) has a perturbation parameter of shape which is updated once during every minimization iteration. To accelerate robust training, Free- applies simultaneous updates for the network parameters and perturbation , which makes its computation cost almost the same as natural training. In the j minimization iteration, both and are computed for the current mini-batch and network parameters ,

Then and are updated as,

In Free-, each mini-batch is replayed times. For example, if , we move on to the next mini-batch every other step, and therefore the data for the first two iterations would be the same (i.e., ). Since we train on the same mini-batch -times in a row, the hyper-parameter is more-or-less analogous to the number of iterations of the PGD training algorithm . We use the same number of minibatch updates for Free- adversarial training and natural training on clean images, i.e., we train Free- for

number of epochs in total. Free-

can achieve similar robustness accuracy as PGD- adversarially trained models. In our modification, we apply two “tricks” which we found to be particularly effective when combined with free adversarial training: initializing with the natural trained model and applying larger step-size for updating perturbations. We built stronger baselines with such techniques, which can be even further boosted with our adaptive networks.

(a) Classical RFGSM
(b) Proposed RFGSM
Figure 2: Loss surface plots for points surrounding the first two CIFAR-10 validation images for models adversarially trained on (a) classical RFGSM and (b) proposed RFGSM. The proposed method generates smoother surface, even for a misclassified example (No. 1).

Single-step adversarial training. Fast Gradient Sign Method (FGSM) [11] is one of the most popular single step method for generating adversarial examples. With a random initialization of perturbation, Random FGSM (RFGSM) is similar to doing a one step of the PGD algorithm. [24] showed robust model adversarially trained with FGSM and RFGSM have almost zero robust accuracy under PGD attacks. A more recent preprint [1] suggested RFGSM-based training can be used to defend PGD attacks when combined with cyclic learning rates and early stopping by examining the robust accuracy on the validation dataset. The RFGSM method in [1] provides an alternative way to train robust models besides PGD adversarial training [24] and fast adversarial training [29] on benchmark datasets such as CIFARs. However, it may encounter difficulty to generalize to problems without special learning rate schedules and problems where we cannot perform online validation for early stopping.

We introduce a variant of RFGSM that works well even with a normal training schedule. We make two key modifications to the classical RFGSM. First, instead of initializing from uniform random value between and

, we initialize from a normal distribution with zero mean and

variance. We find to be rather insensitive between and , and always use in experiments. Second, we do not clip the perturbation after the FGSM update. Note that the perturbation is still bounded to some extent as the stepsize of FGSM is . In the proposed variant, the initialized noise can be viewed as boosting training samples instead of the FGSM update.

The classical RFGSM may fail because adversarial examples generated by FGSM during training are likely to fall on the boundary of the bounded ball. After training on those adversarial examples, the loss surface becomes smooth at the boundary but the cross-entropy loss may take on large values within the ball which can be exploited by multi-step methods like PGD. As shown in fig. 2, the proposed RFGSM makes the loss surface smoother and hence harder to attack. Even for a difficult sample (validation example id 1), where there are adversarial examples for models trained by both the classical RFGSM and the proposed RFGSM, the loss surface of our proposed RFGSM is smoother.

TRADES objective. The proposed adaptive network is complementary to the choice of objective in adversarial training. Besides the minimax problem in equation (4), we can also train adaptive networks with the TRADES objective proposed in [42]. TRADES achieves impressive robust accuracy on the CIFAR-10 dataset by combining supervised training and virtual adversarial training as,

(5)

where controls the trade-off between robustness and natural accuracy. We follow [42] for training algorithms and parameter settings in our experiments.

Row # (Robust) model Evaluated Against
#Parameter
(million)
Natural PGD-20 PGD-100
1 Natural 94.10% 0.00% 0.00% 5.85
2 PGD-7 [24] 83.84% 40.03% 39.38% 5.85
3 Free-10 [29] 81.04% 40.56% 40.03% 5.85
4 Free-10-adaptive 85.00% 43.16% 42.68% 6.05
5 Free-10-lstep 77.75% 45.10% 44.77% 5.85
6 Free-10-WRN-28-5 77.81% 45.99% 45.77% 9.13
7 Free-10-init 80.60% 46.88% 46.67% 5.85
8 Free-10-adaptive 80.99% 48.09% 47.87% 6.05
Table 1: Performance of (robust) CIFAR-10 models. We inject adaptive layers in WRN-28-4, and compare with WRN-28-4 and WRN-28-5 with more parameters. We provide stronger baselines with our adversarial training “tricks” in row 5-7.

5 Experiments

In this section, we train robust models on CIFAR-10 and CIFAR-100. In all the experiments, we train WRNs without dropout for 120 epochs and with minibatch size 256. We start with learning rate 0.1 and decrease the learning rate by a factor of 10 at epochs 60 and 90. We use weight decay 1e-4 and momentum 0.9. For evaluating the robustness of the models, we attack them with PGD-K attacks. For the PGD attacks, we use and , and vary the number of attack iterations .

5.1 Quantitative evaluation and ablation study

We summarize our quantitative evaluation on CIFAR-10 and CIFAR-100 in table 1 - 5. In table 1 - 4, unless otherwise explicitly specified through the name of the model, the architecture used for producing these results is WRN-28-4. We report validation accuracy on natural images and adversarial images generated using PGD attacks with iterations and iterations. We also compare our method with adversarially trained robust models following [24] and [29]. Note that the PGD-7 adversarially trained model [24] requires more training time than natural training on clean images, while the Free-10 models [29] have similar computation cost as natural training. Models with the suffix “small” are adversarially trained using a step-size of . The adversarially trained models without the small suffix are trained with a step-size .

Row # (Robust) model Evaluated Against
#Parameter
(million)
Natural PGD-20 PGD-100
1 Natural 74.84% 0.00% 0.00% 5.87
2 PGD-7 [24] 57.18% 18.38% 18.13% 5.87
3 Free-10 [29] 54.18% 19.21% 18.98% 5.87
4 Free-10-adaptive 61.19% 21.95% 21.68% 6.07
5 Free-10-lstep 50.52% 23.08% 23.02% 5.87
6 Free-10-WRN-28-5 51.02% 23.12% 23.03% 9.16
7 Free-10-init 55.93% 24.86% 24.61% 5.87
8 Free-10-adaptive 57.26% 25.86% 25.69% 6.07
Table 2: Performance of (robust) CIFAR-100 models. Adaptive networks with our adversarial training “tricks” in row 8 has 7% robust accuracy improvement over PGD-7 [24] in row 2.

Advantage of adaptive network. We first evaluate robust models trained with step-size for perturbation updates following [24] (rows 2-4 in tables 2 and 1). We can train a robust WRN-28-4 with PGD-7 [24] that achieves about 40% accuracy under strong PGD attacks. Our alternative adversarial training mechanism, Free-10 [29] achieves slightly better robust accuracy under PGD attacks with a drop in natural accuracy on clean validation images. Since Free-10 is significantly faster than PGD adversarial training, we also use it to adversarially train our adaptive networks. Our adaptive network with conditional normalization built on WRN-28-4 (Free-10-adaptive, row 4) outperforms the PGD adversarially trained WRN-28-4 (PGD-7, row 2) and Free-10 (row 3) in both natural accuracy and robust accuracy, illustrating the advantage of our adaptive networks.

Strong baseline and effectiveness of our “tricks” in adversarial training. We explore “tricks” to improve the performance of adversarial training. As shown in tables 2 and 1, by comparing Free-10 (row 3) and Free-10-lstep (row 5), we can see that the larger step-size used for training does improve the robustness of free training but again at an additional cost of decreasing natural validation accuracy.

Note that our Free-10-adaptive model has slightly more parameters compared to the adversarially trained PGD-7 and Free-10 models. For this reason, we compare to higher capacity models to ensure that the superiority of our adaptive network is not solely due to having a (slightly) larger number of parameters. To create strong, high-capacity baselines we adversarially train a larger model WRN-28-5 (row 6), and WRN-28-4 with a naturally trained model as initialization (row 7). Our adaptive network is slightly larger than the non-adaptive WRN-28-4, and is much smaller than WRN-28-5. A good initialization surprisingly helps both natural accuracy and robust accuracy. Our adaptive network outperforms the best strong baseline for both natural accuracy and robust accuracy.

(Robust) model Evaluated Against
Natural PGD-20 PGD-100
Natural 94.90% 0.00% 0.00%
Non-adaptive 84.44% 53.74% 53.18%
Adaptive 84.79 % 54.98% 54.76 %
Table 3: Performance of (robust) CIFAR-10 WRN-28-4 models with TRADES training [42].
(Robust) model Evaluated Against
Natural PGD-20 PGD-100
Natural 94.10% 0.00% 0.00%
PGD-7 83.84% 40.03% 39.38%
RFGSM 85.81% 0.11% 0.00%
Our RFGSM 84.03 % 38.71% 37.99 %
Adaptive 84.87% 39.95% 38.92%
Table 4: Performance of (robust) CIFAR-10 WRN-28-4 models with RFGSM training.

TRADES objective and higher robustness. In table 3, we combine the proposed method with the TRADES objective [42] since our adaptive network is complementary to the objective design of adversarial training. We can achieve better robust accuracy on the CIFAR-10 dataset with the TRADES objective, and our adaptive network performs better than the non-adaptive network. Note that the TRADES method applies PGD-10 to generate adversarial examples in adversarial training, which is slower than PGD-7 in [24], and much slower than the fast algorithm we used.

RFGSM and the proposed variant. We present experimental results on RFGSM adversarial training in table 4. We halved the number of epochs for training so that RFGSM training can complete in similar time as natural training and free adversarial training. Classical RFGSM with uniform sampling and norm clipping cannot provide robustness against strong PGD attacks. The proposed RFGSM variant can defend against PGD attack, and achieves comparable robust accuracy as PGD adversarial training when combined with our adaptive network. Though our RFGSM results are worse than our best robust accuracy in table 1 when we use fast adversarial training with “tricks” to train the adaptive network, the proposed variant works well with standard training of ResNet on CIFAR , which is complementary to the recent interest in replacing PGD with RFGSM training.

(Robust) model Evaluated Against
#Param
(million)
Natural PGD-20 PGD-100
Natural 94.76% 0.00% 0.00% 46.16
PGD-7 [24] 87.3% 45.8% 45.3% 45.90
Free-8 [29] 85.96% 46.82% 46.19% 45.90
Free-10 79.45% 48.03% 47.9 % 46.16
Free-10-init 84.03% 50.23% 49.93% 46.16
Free-10-adaptive 84.39% 50.93% 50.68% 47.28
Table 5: Performance of (robust) CIFAR-10 WRN-34-10 models. We directly compare with previously reported results in [24, 29] and our strong baselines.

5.2 Larger network and previous benchmark

In table 5, we report results on a larger network WRN-34-10, which is widely used for the CIFAR-10 benchmark. We first directly compare with the accuracy values reported in the literature in [24] and [29] by training on the objective equation (4). Our adaptive network achieves better robust accuracy with more than 3% improvement. Moreover, our adaptive network outperforms the strong baselines we achieved with “tricks” (Free-10 and Free-10-init) on both natural accuracy and robust accuracy.

(a) Training accuracy.
(b) Natural validation accuracy.
(c) PGD-3 validation accuracy.
(d) Training accuracy.
(e) Natural validation accuracy.
(f) PGD-3 validation accuracy.
Figure 3: Training curves for robust models for (top) CIFAR-10 and (bottom) CIFAR-100: (left) accuracy on adversarial training samples; (middle) accuracy on clean validation samples; (right) accuracy on PGD-3 validation samples.
Figure 4: Visualization of adversarial examples generated for natural and robust WRN-34-10 for CIFAR-10 with large following [34]. The large adversarial examples generated for robust models align well with human perception.

5.3 Training curves and qualitative analysis

We plot the training and validation accuracy of the Free-10, Free-10-adaptive, and PGD-7 adversarially trained (PGD-7) models after each epoch in fig. 3. Training accuracy of robust models is computed using adversarial examples that are seen during training, and do not correspond to the natural training accuracy. They can be thought of as robustness on training examples. In figs. 2(d) and 2(a), the PGD-7 model fits the adversarial examples built for the training samples to a rather high accuracy, while Free-10 seems to never overfit to the training-set adversarial training samples. The training accuracy of Free-10 [29] is quite close to the final adversarial validation accuracy in figs. 2(f) and 2(c). The natural validation accuracy of PGD-7 increases faster than Free-10 at the beginning, while the accuracy at the end of training become close, as shown in figs. 2(e) and 2(b). Free-10 consistently improves robust accuracy against adversarial validation samples, while PGD-7 seems to saturate after the fast increase at the beginning (see figs. 2(f) and 2(c)). Our adaptive network (blue curve) always has higher natural and robust validation accuracy than the non-adaptive WRN-28-4 models except for a short range around epoch 60 in figs. 2(c) and 2(b), where the accuracy of the adaptive network decreases. Tuning the learning rate could potentially prevent this decrease and further boost the performance of adaptive networks.

[34] presented an interesting side effect of robust models: largely perturbed adversarial examples for adversarially robust models align with human perception. That is, they “look” like the class which they are getting misclassified to. We use PGD-50 to generate adversarial images with large perturbations (). The generated images for our adversarially trained adaptive nets have characteristics that align well with human perception (Fig. 4).

6 Conclusion

Inspired by recent research in conditional normalization [16, 20] and properties of robustness [24, 34, 28], we introduced an adaptive normalization module conditioned on inputs for boosting the robustness of networks. Our adaptive networks combined with a fast adversarial training algorithm, can effectively train robust models that outperform their non-adaptive parallels and also non-adaptive networks with more parameters. Our study on adversarial training presents several “tricks” that can be widely used to improve the training performance of robust models. We also introduce a variant of singe-step adversarial training that can achieve competitive robustness against multi-step attacks. We verify the effectiveness and efficiency of adaptive networks and our adversarial training with experiments on CIFAR-10 and CIFAR-100 benchmark and WRN networks.

References

  • [1] anonymous (2019) Fast is better than free: revisiting adversarial training. openreview preprint. Cited by: §2, §4.
  • [2] A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. ICML. Cited by: §1, §2.
  • [3] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok (2017) Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397. Cited by: §1.
  • [4] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli (2013) Evasion attacks against machine learning at test time. In ECML-PKDD, pp. 387–402. Cited by: §1.
  • [5] N. Carlini and D. Wagner (2017) Adversarial examples are not easily detected: bypassing ten detection methods. In

    ACM Workshop on Artificial Intelligence and Security

    ,
    pp. 3–14. Cited by: §1.
  • [6] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §2.
  • [7] M. Cissé, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier (2017) Parseval networks: improving robustness to adversarial examples. In ICML, pp. 854–863. Cited by: §1.
  • [8] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter (2019) Certified adversarial robustness via randomized smoothing. ICML. Cited by: §2.
  • [9] H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, and A. C. Courville (2017) Modulating early visual processing by language. In Advances in Neural Information Processing Systems, pp. 6594–6604. Cited by: §1.
  • [10] V. Dumoulin, J. Shlens, and M. Kudlur (2017) A learned representation for artistic style. ICLR. Cited by: §1, §2.
  • [11] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. ICLR. Cited by: §1, §2, §4.
  • [12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In NIPS, Cited by: §1.
  • [13] K. Hao (2019-04-01) Hackers trick a tesla into veering into the wrong lane. MIT Technology Review. Cited by: §1.
  • [14] K. He, X. Zhang, S. Ren, and J. Sun (2015) Deep Residual Learning for Image Recognition. CVPR 7 (3), pp. 171–180. External Links: 1512.03385 Cited by: §3.1.
  • [15] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In CVPR, Cited by: §3.
  • [16] X. Huang and S. Belongie (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In CVPR, pp. 1501–1510. Cited by: §1, §2, §3.1, §6.
  • [17] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In ICML, pp. 448–456. Cited by: §1, §2.
  • [18] D. Jakubovitz and R. Giryes (2018) Improving dnn robustness to adversarial attacks using jacobian regularization. In ECCV, pp. 514–529. Cited by: §1.
  • [19] H. Kannan, A. Kurakin, and I. Goodfellow (2018)

    Adversarial logit pairing

    .
    arXiv preprint arXiv:1803.06373. Cited by: §1.
  • [20] T. Karras, S. Laine, and T. Aila (2019) A style-based generator architecture for generative adversarial networks. CVPR. Cited by: §1, §2, §3.1, §6.
  • [21] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §2.
  • [22] J. Lu, H. Sibai, E. Fabry, and D. Forsyth (2017) No need to worry about adversarial examples in object detection in autonomous vehicles. arXiv preprint arXiv:1707.03501. Cited by: §1.
  • [23] X. Ma, B. Li, Y. Wang, S. M. Erfani, S. Wijewickrema, G. Schoenebeck, D. Song, M. E. Houle, and J. Bailey (2018) Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv preprint arXiv:1801.02613. Cited by: §1.
  • [24] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017)

    Towards deep learning models resistant to adversarial attacks

    .
    ICLR. Cited by: §1, §1, §2, §2, §3, Table 1, §4, §4, §4, §5.1, §5.1, §5.1, §5.2, Table 2, Table 5, §6.
  • [25] D. Meng and H. Chen (2017) Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147. Cited by: §1.
  • [26] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, pp. 2574–2582. Cited by: §2.
  • [27] A. Raghunathan, J. Steinhardt, and P. Liang (2018) Certified defenses against adversarial examples. arXiv preprint arXiv:1801.09344. Cited by: §2.
  • [28] L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry (2018) Adversarially robust generalization requires more data. In NeurIPS, pp. 5014–5026. Cited by: §1, §6.
  • [29] A. Shafahi, M. Najibi, A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L. S. Davis, G. Taylor, and T. Goldstein (2019) Adversarial training for free. NeurIPS. Cited by: §2, §3, Table 1, §4, §4, §5.1, §5.1, §5.2, §5.3, Table 2, Table 5.
  • [30] A. Shafahi, M. Najibi, Z. Xu, J. Dickerson, L. S. Davis, and T. Goldstein (2018) Universal adversarial training. arXiv preprint arXiv:1811.11304. Cited by: §1.
  • [31] D. Su, H. Zhang, H. Chen, J. Yi, P. Chen, and Y. Gao (2018) Is robustness the cost of accuracy?–a comprehensive study on the robustness of 18 deep image classification models. In ECCV, pp. 631–648. Cited by: §1.
  • [32] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. ICLR. Cited by: §1, §2.
  • [33] V. Tjeng, K. Xiao, and R. Tedrake (2017) Evaluating robustness of neural networks with mixed integer programming. arXiv preprint arXiv:1711.07356. Cited by: §2.
  • [34] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry (2018)

    Robustness may be at odds with accuracy

    .
    ICLR 1050, pp. 11. Cited by: §1, Figure 4, §5.3, §6.
  • [35] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. CoRR abs/1607.08022. Cited by: §1, §2.
  • [36] S. Wang, Y. Chen, A. Abdou, and S. Jana (2018) Mixtrain: scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625. Cited by: §2.
  • [37] E. Wong, F. Schmidt, J. H. Metzen, and J. Z. Kolter (2018) Scaling provable adversarial defenses. In NeurIPS, pp. 8400–8409. Cited by: §2.
  • [38] K. Xiao, V. Tjeng, N. M. Shafiullah, and A. Madry (2019) Training for faster adversarial robustness verification via inducing relu stability. ICLR. Cited by: §2.
  • [39] C. Xie, Y. Wu, L. van der Maaten, A. Yuille, and K. He (2019) Feature denoising for improving adversarial robustness. CVPR. Cited by: §1, §2.
  • [40] W. Xu, D. Evans, and Y. Qi (2017) Feature squeezing: detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155. Cited by: §1.
  • [41] S. Zagoruyko and N. Komodakis (2016) Wide residual networks. arXiv preprint arXiv:1605.07146. Cited by: §3.1, §3.
  • [42] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan (2019) Theoretically principled trade-off between robustness and accuracy. ICML. Cited by: §1, §1, §2, §4, §5.1, Table 3.
  • [43] H. Zhang, Y. N. Dauphin, and T. Ma (2019) Fixup initialization: residual learning without normalization. ICLR. Cited by: §2.