Adversarial Training for Free!

04/29/2019 ∙ by Ali Shafahi, et al. ∙ University of Maryland United States Naval Academy cornell university 0

Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters. Our "free" adversarial training algorithm achieves state-of-the-art robustness on CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40 against PGD attacks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning has been widely applied to various computer vision tasks with excellent performance. Prior to the realization of the adversarial example phenomenon by Biggio et al. [2013], Szegedy et al. [2013], model performance on clean examples was the the main evaluation criteria. However, in security-critical applications, robustness to adversarial attacks has emerged as a critical factor.

A robust classifier is one that correctly labels adversarially perturbed images. Alternatively, robustness may be achieved by detecting and rejecting adversarial examples 

[Ma et al., 2018, Meng and Chen, 2017, Xu et al., 2017]. Recently, Athalye et al. [2018] broke a complete suite of allegedly robust defenses, leaving adversarial training, in which the defender augments each minibatch of training data with adversarial examples [Madry et al., 2017]

, among the few that remain resistant to attacks. Adversarial training is time-consuming—in addition to the gradient computation needed to update the network parameters, each stochastic gradient descent (SGD) iteration requires multiple gradient computations to produce adversarial images. In fact, it takes 3-30 times longer to form a robust network with adversarial training than forming a non-robust equivalent. Put simply, the actual slowdown factor depends on the number of gradient steps used for adversarial example generation.

The high cost of adversarial training has motivated a number of alternatives. There are some recent works which replace the perturbation generation in adversarial training with a parameterized generator network [Baluja and Fischer, 2018, Poursaeed et al., 2018, Xiao et al., 2018]

. This approach is slower than standard training, and problematic on complex datasets, such as ImageNet, for which it is hard to produce highly expressive GANs that cover the entire image space. Another popular defense strategy is to regularize the training loss using label smoothing, logit squeezing, or a Jacobian regularization

[Shafahi et al., 2018a, Mosbach et al., 2018, Ross and Doshi-Velez, 2018, Hein and Andriushchenko, 2017, Jakubovitz and Giryes, 2018, Yu et al., 2018]. These methods have not been applied to large-scale problems, such as ImageNet, and can be applied in parallel to adversarial training.

Recently, there has been a surge of certified defenses [Wong and Kolter, 2017, Wong et al., 2018, Raghunathan et al., 2018a, b]. These methods were mostly demonstrated for small networks, low-res datasets, and relatively small perturbation budgets (). Cohen et al. [2019] propose randomized smoothing as a certified defense method suitable for ImageNet. They claim to achieve 12% robustness against non-targeted attacks that are within an radius of 3 (for images with pixels in ). This is roughly equivalent to an radius of when pixels lie in 111In Cohen et al. [2019] the radius is calculated after scaling the pixels between 0 and 1. Therefore a perturbation that changes each pixel by 2 will have a ..

Adversarial training remains among the most trusted defenses, but it is nearly intractable on large-scale problems. Adversarial training on high-resolution datasets, including ImageNet, has only been within reach for research labs having hundreds of GPUs222For example Xie et al. [2019] uses 128 V100s and Kannan et al. [2018] uses 53 P100s to do targeted adversarial training on ImageNet.. Even on reasonably-sized datasets, such as CIFAR-10 and CIFAR-100, adversarial training is time consuming and can take multiple days on a single GPU.

Contributions

We propose a fast adversarial training algorithm that produces robust models with almost no extra cost relative to natural training. The key idea is to update both the model parameters and image perturbations using one simultaneous backward pass, rather than using separate gradient computations for each update. Our proposed method has the same computational cost as conventional natural training, and can be 3-30 times faster than previous adversarial training methods [Madry et al., 2017, Xie et al., 2019]. Our robust models trained on CIFAR-10 and CIFAR-100 achieve state-of-the-art accuracy when defending against strong PGD attacks.

We can apply our algorithm to the large-scale ImageNet classification task on a single workstation with four P100 GPUs in about two days, achieving 40% accuracy against non-targeted PGD attacks. To the best of our knowledge, our method is the first to successfully train a robust model for ImageNet based on the non-targeted formulation and achieves results competitive with previous (significantly more complex) methods [Kannan et al., 2018, Xie et al., 2019].

2 Non-targeted adversarial examples

Adversarial examples come in two flavors: non-targeted and targeted. Given a fixed classifier with parameters , an image with true label , and classification proxy loss , a bounded non-targeted attack sneaks an example out of its natural class and into another. This is done by solving

(1)

where is the adversarial perturbation, is some -norm distance metric, and is the adversarial manipulation budget. In contrast to non-targeted attacks, a targeted attack scooches an image into a specific class of the attacker’s choice.

In what follows, we will use non-targeted adversarial examples both for evaluating the robustness of our models and also for adversarial training. We briefly review some of the closely related methods for generating adversarial examples. In the context of -bounded attacks, the Fast Gradient Sign Method (FGSM) by Goodfellow et al. [2015] is one of the most popular non-targeted methods that uses the sign of the gradients to construct an adversarial example in one iteration:

(2)

The Basic Iterative Method (BIM) by Kurakin et al. [2016a] is an iterative version of FGSM. The PGD attack is a variant of BIM with uniform random noise as initialization, which is recognized by Athalye et al. [2018] to be one of the most powerful first-order attacks. The initial random noise was first studied by Tramèr et al. [2017] to enable FGSM to attack models that rely on “gradient masking.” sdfssdsfsdfsdfsfs In the PGD attack algorithm, the number of iterations plays an important role in the strength of attacks, and also the computation time for generating adversarial examples. In each iteration, a complete forward and backward pass is needed to compute the gradient of the loss with respect to the image. Throughout this paper we will refer to a -step PGD attack as PGD-.

1:Training samples , perturbation bound , step size , maximization iterations per minimization step , and minimization learning rate
2:Initialize
3:for

 epoch

 do
4:     for minibatch  do
5:         Build for with PGD:          
6:              Assign a random perturbation
7:                
8:                
9:              for   do
10:                  
11:                  
12:                  
13:              end for         
14:         Update with stochastic gradient descent:
15:           
16:           
17:     end for
18:end for
Algorithm 1 Standard Adversarial Training (K-PGD)

3 Adversarial training

Adversarial training can be traced back to [Goodfellow et al., 2015], in which models were hardened by producing adversarial examples and injecting them into training data. The robustness achieved by adversarial training depends on the strength of the adversarial examples used. Training on fast non-iterative attacks such as FGSM and Rand+FGSM only results in robustness against non-iterative attacks, and not against PGD attacks [Kurakin et al., 2016b, Madry et al., 2017]. Consequently, Madry et al. [2017] propose training on multi-step PGD adversaries, achieving state-of-the-art robustness levels against attacks on MNIST and CIFAR-10 datasets.

While many defenses were broken by Athalye et al. [2018], PGD-based adversarial training was among the few that withstood strong attacks. Many other defenses build on PGD adversarial training or leverage PGD adversarial generation during training. Examples include Adversarial Logit Pairing (ALP) [Kannan et al., 2018], Feature Denoising [Xie et al., 2019], Defensive Quantization [Lin et al., 2019], Thermometer Encoding [Buckman et al., 2018], PixelDefend [Song et al., 2017], Robust Manifold Defense [Ilyas et al., 2017], L2-nonexpansive nets [Qian and Wegman, 2018], Jacobian Regularization [Jakubovitz and Giryes, 2018], Universal Perturbation [Shafahi et al., 2018b], and Stochastic Activation Pruning [Dhillon et al., 2018].

We focus on the min-max formulation of adversarial training [Madry et al., 2017], which has been theoretically and empirically justified. This widely used -PGD adversarial training algorithm is summarized in alg. 1. The inner loop constructs adversarial examples by -PGD (line ), while the outer loop updates the model using minibatch SGD on the generated examples. In the inner loop, the gradient for updating adversarial examples requires a forward-backward pass of the entire network, which has similar computation cost as calculating the gradient for updating network parameters. Compared to natural training, which only requires and does not have an inner loop, K-PGD adversarial training needs roughly times more computation.

4 “Free” adversarial training

-PGD adversarial training [Madry et al., 2017] is generally slow and requires times more computation than natural training. For example, the 7-PGD training of a WideResNet on CIFAR-10 in Madry et al. [2017] takes about four days on a Titan X GPU. To scale the algorithm to ImageNet, Xie et al. [2019] and Kannan et al. [2018] had to deploy large GPU clusters at a scale far beyond the reach of most organizations.

Here, we propose free adversarial training, which has a negligible complexity overhead compared to natural training. Our free adversarial training algorithm (alg. 2) computes the ascent step by re-using the backward pass needed for the descent step. To update the network parameters, the current training minibatch is passed forward through the network. Then, the gradient with respect to the network parameters is computed on the backward pass. When the “free” method is used, the gradient of the loss with respect to the input image is also computed on this same backward pass.

Unfortunately, this approach does not allow for multiple adversarial updates to be made to the same image without performing multiple backward passes. To overcome this restriction, we propose a minor yet nontrivial modification to training: train on the same minibatch times in a row. Note that in this case we decrease the number of epochs such that the overall number of training iterations remains constant. This strategy provides multiple adversarial updates to each training image, thus providing strong/iterative adversarial examples.

Finally, when a new minibatch is formed, the perturbation generated on the previous minibatch is used to warm-start the perturbation for the new minibatch.

1:Training samples , perturbation bound , learning rate , hop steps
2:Initialize
3:
4:for epoch  do
5:     for minibatch  do
6:         for i  do
7:              Update with stochastic gradient descent
8:                
9:                
10:                
11:              Use gradients calculated for the minimization step to update
12:                
13:                
14:         end for
15:     end for
16:end for
Algorithm 2 “Free” Adversarial Training (Free-)

The effect of mini-batch replay on natural training

While the hope for alg. 2 is to build robust models, we still want models to perform well on natural examples. As we increase in alg. 2, there is risk of increasing generalization error. Furthermore, it may be possible that catastrophic forgetting happens. Consider the worst case where all the “informative” images of one class are in the first few mini-batches. In this extreme case, we do not see useful examples for most of the epoch, and forgetting may occur. Consequently, a natural question is: how much does mini-batch replay hurt generalization?

To answer this question, we naturally train wide-resnet 32-10 models on CIFAR-10 and CIFAR-100 using different levels of replay. Fig. 1 plots clean validation accuracy as a function of the replay parameter .

We see some dropoff in accuracy for small values of Note that a small compromise in accuracy is acceptable given a large increase in robustness due to the fundamental tradeoffs between robustness and generalization [Tsipras et al., 2018, Zhang et al., 2019, Shafahi et al., 2019]. As a reference, CIFAR-10 and CIFAR-100 models that are 7-PGD adversarially trained using the standard (non-free) method have natural accuracies of 87.25% and 59.87%, respectively. These same accuracies are exceeded by natural training with We see in section 5 that good robustness can be achieved using “free” adversarial training with just

(a) CIFAR-10 sensitivity to
(b) CIFAR-100 sensitivity to
Figure 1: Natural validation accuracy of Wide Resnet 32-10 models using varied mini-batch replay parameters . Note that corresponds to conventional training. For large values, the validation accuracy drops drastically. However, small ’s have little effect. For reference, CIFAR-10 and CIFAR-100 models that are 7-PGD adversarially trained have natural accuracies of 87.25% and 59.87%, respectively.

5 Robust models on CIFAR-10 and 100

In this section, we train robust models on CIFAR-10 and CIFAR-100 using the proposed “free” adversarial training ( alg. 2) and compare them to PGD-based adversarial training (alg. 1). We find that free training is able to achieve state-of-the-art robustness on CIFAR-10 without the overhead of standard PGD training333Our free training code is available at https://github.com/ashafahi/free_adv_train.

Cifar-10

We train various CIFAR-10 models using the Wide-Resnet 32-10 model and standard hyper-parameters used by Madry et al. [2017]. In the proposed method (alg. 2), we repeat (i.e. replay) each minibatch times before switching to the next minibatch. We present the experimental results for various choices of in table 1. Training each of these models costs roughly the same as natural training since we preserve the same number of iterations. We compare with the 7-PGD adversarially trained model from Madry et al. [2017] 444Results based on the “adv_trained” model in Madry’s CIFAR-10 challenge repo., whose training requires roughly more time than all of our free training variations. We attack all models using PGD attacks with iterations on both the cross-entropy loss (PGD-) and the Carlini-Wagner loss (CW-) [Carlini and Wagner, 2017]. We test using the PGD- attack following Madry et al. [2017], and also increase the number of attack iterations and employ random restarts to verify robustness under stronger attacks. Note that gradient free-attacks such as SPSA will result in inferior results for adversarially trained models in comparison to optimization based attacks such as PGD as noted by Uesato et al. [2018]. Gradient-free attacks are superior in settings where the defense works by masking or obfuscating the gradients.

Training Evaluated Against
Training
Time
(minutes)
Natural Images PGD-20 PGD-100 CW-100
10 restart
PGD-20
Natural 95.01% 0.00% 0.00% 0.00% 0.00% 780
Free 91.45% 33.92% 33.20% 34.57% 33.41% 816
Free 87.83% 41.15% 40.35% 41.96% 40.73% 800
Free 85.96% 46.82% 46.19% 46.60% 46.33% 785
Free 83.94% 46.31% 45.79% 45.86% 45.94% 785
Madry et al.
(7-PGD trained)
87.25% 45.84% 45.29% 46.52% 45.53% 5418
Table 1: Validation accuracy and robustness of CIFAR-10 models trained with various methods.
Training Evaluated Against
Training Time
(minutes)
Natural Images PGD-20 PGD-100
Natural 78.84% 0.00% 0.00% 811
Free 69.20% 15.37% 14.86% 816
Free 65.28% 20.64% 20.15% 767
Free 64.87% 23.68% 23.18% 791
Free 62.13% 25.88% 25.58% 780
Free 59.27% 25.15% 24.88% 776
Madry et al. (2-PGD trained) 67.94% 17.08% 16.50% 2053
Madry et al. (7-PGD trained) 59.87% 22.76% 22.52% 5157
Table 2: Validation accuracy and robustness of CIFAR-100 models trained with various methods.

Our “free training” algorithm successfully reaches robustness levels comparable to a 7-PGD adversarially trained model. As we increase , the robustness is increased at the cost of validation accuracy on natural images. Additionally note that we achieve reasonable robustness over a wide range of choices of the main hyper-parameter of our model, , and the proposed method is significantly faster than -PGD adversarial training.

Cifar-100

We also study the robustness results of “free training” on CIFAR-100 which is a more difficult dataset with more classes. As we will see in sec. 4, training with large

values on this dataset hurts the natural validation accuracy more in comparison to CIFAR-10. This dataset is less studied in the adversarial machine learning community and therefore for comparison purposes, we adversarially train our own Wide ResNet 32-10 models for CIFAR-100. We train two robust models by varying

in the -PGD adversarial training algorithm (alg. 1). One is trained on PGD-2 with a computational cost almost that of free training, and the other is trained on PGD-7 with a computation time roughly that of free training. We adopt the code for adversarial training from Madry et al. [2017], which produces state-of-the-art robust models on CIFAR-10. We summarize the results in table. 2.

We see that “free training” exceeds the accuracy on both natural images and adversarial images when compared to traditional adversarial training. Similar to the effect of increasing , increasing in -PGD adversarial training results in increased robustness at the cost of clean validation accuracy. However, unlike the proposed “free training” where increasing has no extra cost, increasing for standard -PGD substantially increases training time.

6 Does “free” training behave like standard adversarial training?

Here, we analyze two properties that are associated with PGD adversarially trained models: The interpretability of their gradients and the flattness of their loss surface. We find that “free” training enjoys these benefits as well.

(a) plane
(b) cat
(c) dog
(d) cat
(e) ship
(f) cat
(g) dog
(h) car
(i) horse
(j) car
(k) dog
(l) cat
(m) deer
(n) cat
(o) frog
(p) bird
(q) frog
(r) dog
(s) car
(t) dog
(u) cat
(v) deer
(w) cat
(x) horse
(y) cat
(z) plane
(aa) car
Figure 2: Attack images built for adversarially trained models look like the class into which they get misclassified. (top row) Clean images. (middle row) Adversarial images for a 7-PGD adversarially trained CIFAR-10 model. (bottom row) adversarial examples for the “free” adversarial trained model. To avoid cherry picking, we display the last 9 images of the validation set.

Generative behavior for largely perturbed examples

Tsipras et al. [2018] observed that hardened classifiers have interpretable gradients; adversarial examples built for PGD trained models often look like the class into which they get misclassified.

Fig. 2 plots “weakly bounded” adversarial examples for the CIFAR-10 7-PGD adversarially trained model [Madry et al., 2017] and our free adversarially trained model. Both models were trained to resist attacks with . The examples are made using a 50 iteration BIM attack with and . “Free training” maintains generative properties, as our model’s adversarial examples resemble the target class.

Smooth and flattened loss surface

Another property of PGD adversarial training is that it flattens and smoothens the loss landscape. In contrast, some defenses work by “masking” the gradients, i.e., making it difficult to identify adversarial examples using gradient methods, even though adversarial examples remain present. Reference Engstrom et al. [2018] argues that gradient masking adds little security. We show in fig. 2(a) that free training does not operate by masking gradients using a rough loss surface. In fig. 3 we plot the cross-entropy loss projected along two directions in image space for the first few validation examples of CIFAR-10 [Li et al., 2018]. In addition to the loss of the free model, we plot the loss of the 7-PGD adversarially trained model for comparison.

(a) Free
(b) 7-PGD adv trained
(c) Free both rad
(d) 7-PGD adv trained both rad
Figure 3: The loss surface of a 7-PGD adversarially trained model and our free trained model for CIFAR-10 on the first 2 validation images. In (a) and (b) we display the cross-entropy loss projected on one random (Rademacher) and one adversarial direction. In (c) and (d) we display the the cross entropy loss projected along two random directions. Both training methods behave similarly and do not operate by masking the gradients as the adversarial direction is indeed the direction where the cross-entropy loss changes the most.

7 Robust ImageNet classifiers

ImageNet is a large image classification dataset of over 1 million high-res images and 1000 classes (Russakovsky et al. [2015]). Due to the high computational cost of ImageNet training, only a few research teams have been able to afford building robust models for this problem. Kurakin et al. [2016b] first hardened ImageNet classifiers by adversarial training with non-iterative attacks.555Training using a non-iterative attack such as FGSM only doubles the training cost. Adversarial training was done using a targeted FGSM attack. They found that while their model became robust against targeted non-iterative attacks, the targeted BIM attack completely broke it.

Later, Kannan et al. [2018] attempted to train a robust model that withstands targeted PGD attacks. They trained against 10 step PGD targeted attacks (a process that costs 11 times more than natural training) to build a benchmark model. They also generated PGD targeted attacks to train their adversarial logit paired (ALP) ImageNet model. Their baseline achieves a top-1 accuracy of against PGD-20 targeted attacks with .

Very recently, Xie et al. [2019] trained a robust ImageNet model against targeted PGD-30 attacks, with a cost that of natural training. Training this model required a distributed implementation on 128 GPUs with batch size 4096. Their robust ResNet-101 model achieves a top-1 accuracy of on targeted PGD attacks with many iterations.

Free training results

Our alg. 2 is designed for non-targeted adversarial training. As Athalye et al. [2018] state, defending on this task is important and more challenging than defending against targeted attacks, and for this reason smaller values are typically used.

Even for (the smallest we consider defensing against), a PGD-50 non-targeted attack on a natural model achieves roughly top-1 accuracy. To put things further in perspective, Uesato et al. [2018] broke three defenses for non-targeted attacks on ImageNet [Guo et al., 2017, Liao et al., 2018, Xie et al., 2017], degrading their performance below 1%.

Our free training algorithm is able to achieve 43% robustness against PGD attacks bounded by . Furthermore, we ran each experiment on a single workstation with four P100 GPUs. Even with this modest setup, training time for each ResNet-50 experiment is below 50 hours.

We summarize our results for various ’s and ’s in table 3 and fig. 4. To craft attacks, we used a step-size of 1 and the corresponding used during training. In all experiments, the training batch size was 256.

Training Evaluated Against
Natural Images PGD-10 PGD-50 PGD-100
Natural 76.038% 0.166% 0.052% 0.036%
Free 71.210% 37.012% 36.340% 36.250%
Free 64.446% 43.522% 43.392% 43.404%
Free 60.642% 41.996% 41.900% 41.892%
Free 58.116% 40.044% 40.008% 39.996%
Table 3: ImageNet validation accuracy and robustness of ResNet-50 models trained with various replay parameters and .

Table 3 shows the robustness of Resnet-50 on ImageNet with . The validation accuracy for natural images decreases when we increase the minibatch replay , just like it did for CIFAR in section 5.

The naturally trained model is vulnerable to PGD attacks (first row of table 3), while the proposed method produces robust models that achieve over 40% accuracy vs PGD attacks ( in table 3). Attacking the models using PGD-100 does not result in a meaningful drop in accuracy compared to PGD-50. Therefore, we did not experiment with increasing the number of PGD iterations further.

Fig. 4 summarizes experimental results for robust models trained and tested under different perturbation bounds

. Each curve represents one training method (natural training or free training) with hyperparameter choice

. Each point on the curve represents the validation accuracy for an -bounded robust model. These results are also provided as tables in the appendix. The proposed method consistently improves the robust accuracy under PGD attacks for , and performs the best. It is difficult to train robust models when is large, which is consistent with previous studies showing that PGD-based adversarial training has limited robustness for ImageNet [Kannan et al., 2018].

(a) Clean
(b) PGD-10
(c) PGD-50
(d) PGD-100
Figure 4: The effect of the perturbation bound and the mini-batch replay hyper-parameter on the robustness achieved by free training.

Comparison with PGD-trained models

We compare “free” training to a more costly method using 2-PGD adversarial examples . We run alg. 1 and set , , and . All other hyper-parameters were identical to those used for training our “free” models. Note that in our experiments, we do not use any label-smoothing or other common tricks for improving robustness since we want to do a fair comparison between PGD training and our “free” training. These extra regularizations can likely improve results for both approaches.

We compare our “free trained” ResNet-50 model and the 2-PGD trained ResNet-50 model in table 4. 2-PGD adversarial training takes roughly longer than “free training” and only achieves slightly better results (4.5%). This gap is less than 0.5% if we free train a higher capacity model (i.e. ResNet-152, see below).

Model & Training Evaluated Against
Train time
(minutes)
Natural Images PGD-10 PGD-50 PGD-100
ResNet-50 – Free 60.206% 32.768% 31.878% 31.816% 3016
ResNet-50 – 2-PGD trained 64.134% 37.172% 36.352% 36.316% 10,435
Table 4: Validation accuracy and robustness of free trained ResNet-50 and 2-PGD trained Resnet-50 models – both trained to resist attacks. Note that 2-PGD training time is that of “free” training.

Free training on models with more capacity

It is believed that increased network capacity leads to greater robustness from adversarial training [Madry et al., 2017, Kurakin et al., 2016b]. We verify that this is the case by “free training” ResNet-101 and ResNet-152 with . The comparison between ResNet-152, ResNet-101, and ResNet-50 is summarized in table 5. Free training on ResNet-101 and ResNet-152 each take roughly and more time than ResNet-50 on the same machine, respectively. The higher capacity model enjoys a roughly 4% boost to accuracy and robustness.

Architecture Evaluated Against
Natural Images PGD-10 PGD-50 PGD-100
ResNet-50 60.206% 32.768% 31.878% 31.816%
ResNet-101 63.340% 35.388% 34.402% 34.328%
ResNet-152 64.446% 36.992% 36.044% 35.994%
Table 5: Validation accuracy and robustness of free trained ResNet-50 vs ResNet-101 vs ResNet-152 models with and .

8 Conclusions

Adversarial training is a well-studied method that boosts the robustness and interpretability of neural networks. While it remains one of the few effective ways to harden a network to attacks, few can afford to adopt it because of its high computation cost. We present a “free” version of adversarial training with cost nearly equal to natural training. Free training can be further combined with other defenses to produce robust models without a slowdown. We hope that this approach can put adversarial training within reach for organizations with modest compute resources.

Acknowledgements: Goldstein and his students were supported by DARPA’s Lifelong Learning Machines and YFA programs, the Office of Naval Research, the AFOSR MURI program, and the Sloan Foundation. Davis and his students were supported by the Office of the Director of National Intelligence (ODNI), and IARPA (2014-14071600012). Studer was supported by Xilinx, Inc. and the US NSF under grants ECCS-1408006, CCF-1535897, CCF-1652065, CNS-1717559, and ECCS-1824379 Taylor was supported by the Office of Naval Research (N0001418WX01582) and the Department of Defense High Performance Computing Modernization Program. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.”

References

Appendix A Complete ImageNet results

Here we provide the results used for generating fig. 4 in tables 6 ~10.

Training Evaluated Against
Natural Images PGD-10 PGD-50 PGD-100
Natural 76.038% 0.078% 0.024% 0.014%
Free 69.634% 29.652% 28.094% 27.952%
Free 61.968% 37.332% 37.108% 37.096%
Free 58.096% 35.388% 35.172% 35.202%
Free 55.938% 33.150% 32.922% 32.906%
Table 6: Val. accuracy and robustness of Resnet-50 models trained with .
Training Evaluated Against
Natural Images PGD-10 PGD-50 PGD-100
Natural 76.038% 0.072% 0.014% 0.010%
Free 68.126% 23.902% 21.224% 20.978%
Free 60.206% 32.768% 31.878% 31.816%
Free 55.988% 30.804% 30.282% 30.250%
Free 52.190% 29.004% 28.624% 28.608%
Table 7: Validation accuracy and robustness of Resnet-50 models trained with .
Training Evaluated Against
Natural Images PGD-10 PGD-50 PGD-100
Natural 76.038% 0.058% 0.012% 0.006%
Free 67.536% 20.810% 16.652% 16.240%
Free 59.052% 28.000% 26.342% 26.262%
Free 53.326% 26.746% 25.670% 25.670%
Free 50.570% 25.854% 25.086% 25.080%
Table 8: Validation accuracy and robustness of Resnet-50 models trained with .
Training Evaluated Against
Natural Images PGD-10 PGD-50 PGD-100
Natural 76.038% 0.052% 0.010% 0.008%
Free 63.628% 14.216% 9.038% 8.612%
Free 56.808% 24.912% 21.728% 21.506%
Free 49.972% 23.874% 21.872% 21.828%
Free 47.882% 23.122% 21.266% 21.228%
Table 9: Validation accuracy and robustness of Resnet-50 models trained with .
Training Evaluated Against
Natural Images PGD-10 PGD-50 PGD-100
Natural 76.038% 0.046% 0.012% 0.006%
Free 64.256% 0.084% 0.028% 0.018%
Free 53.824% 22.168% 16.654% 16.297%
Free 47.388% 13.232% 7.508% 6.576%
Free 44.314% 13.954% 9.390% 8.828%
Table 10: Validation accuracy and robustness of Resnet-50 models trained with

Appendix B The effect of batch-size

Our free training algorithm produces state-of-the-art results on CIFAR-10 and CIFAR-100 and results in robust models on ImageNet. We see that the ImageNet results are more sensitive to the replay parameter . While, our best results for CIFARs were with , our best ImageNet result is with . We believe that can be due to the ratio of number of classes () over batch-size (). Our batch-size in the CIFAR experiments was 128. Since, we ran our ImageNet experiments on a single node with four GPUs, we were only able to use a batch-size of 256. If is large and

is large, the probability that we do not see an example for some random class for more than

iterations becomes large. This can result in catastrophically forgetting that class. To see the effect of batch-size in practice, we experimented with changing for CIFAR-100 and . In these experiments, we adjusted the learning-rate when we changed the batch-size. We used the linear learning-rate adjustment rule. The results which are consistent with our guess are summarized in fig. 5.

Figure 5: If the number of classes () is large, having a larger batch-size () can result in better robustness and generalization specially for larger values of . In this experiment, we use which yields the best result for CIFAR-100 (), and vary .