1 Introduction
^{†}^{†}Work in progress. Corresponding author: vvikash@princeton.edu^{†}^{†}A short version is published at ICLR 2021 Workshop on Security and Safety in Machine Learning Systems.
To achieve robustness against adversarial examples, adversarial training remains the most effective technique (madry2017towards; zhang2019tradeoff; pang2021bagoftricks). However, it is largely used with limited training samples available in current image datasets (such as only 50,000 training images in the CIFAR10 dataset) where it suffers from multiple challenges, such as poor generalization on the test set (madry2017towards) and accuracy vs robustness tradeoff (Raghunathan2020advtradeoff). Recent works have demonstrated that more training data can improve the performance of adversarial training (schmidt2018AdvMoreData; alayrac2019unsupadv; deng2020AdvExtraOutDomain)
. However, this approach runs into the challenge of curating a large set of realworld images for training. We circumvent this challenge using proxy distributions, distributions that closely approximate the underlying distribution of the original training dataset. Note that the proxy distributions are modeled using only available training images in current datasets. Proxy distributions when modeled with deep neural networks, such as Generative adversarial networks (GAN), allow us to generate a potentially unlimited number of highfidelity images
(goodfellow2014GAN; gui2020ganreview; ho2020denoisingdiffusion). Our key goal is to improve the adversarial robustness of deep neural networks by taking advantage of samples from such proxy distributions.Note that proxy distributions are only an approximation of the underlying distribution of the training data. This begs the question of whether robustness achieved on samples from the proxy distribution will transfer to the original dataset. A more generic version of this question is how much the robustness of a classifier transfers between data distributions? We prove that the difference in the robustness of a classifier on two distributions is tightly upper bounded by the conditional Wasserstein distance between them. We refer to Wasserstein distance conditioned on class labels as conditional Wasserstein distance. This result confirms the intuition that samples from a proxy distribution that closely approximates training data distribution should be able to improve robustness on the latter. We also empirically validate this claim where we show that adversarial training only on samples from the proxy distribution indeed achieves nontrivial robustness on samples from the original distribution.
To improve adversarial robustness on current datasets, we aim to leverage samples from a proxy distribution which closely approximates the underlying distribution of these datasets. Here we propose to simultaneously train on both the original training set and a set of additional images sampled from the proxy distribution. We use stateoftheart generative models as a model for proxy distribution since they can closely approximate the data distribution from only a limited number of training samples (karras2020styleganAda; ho2020denoisingdiffusion; gui2020ganreview). Our experimental results demonstrate that the use of samples from the proxy distribution improves robust accuracy by up to % and % in and threat models, respectively, over baselines not using proxy distributions on the CIFAR10 dataset. In the category of not using any extra realworld data, our models achieve the first rank on RobustBench (croce2020robustbench), a standardized benchmark for adversarial training. We also improve the certified robust accuracy (cohen2019certified) by up to % with randomized smoothing on the CIFAR10 dataset. Intriguingly, we achieve better certified robust accuracy with proxy distribution samples than using an additional set of K curated realworld images (carmon2019unlabeled).
Since access to proxy distributions gives us the ability to generate a large number of images, this allows us to delve deeper into the relationship of adversarial robustness with the number of training samples on the scale of image classification with deep neural networks. Next, we work solely with a proxy distribution, where we train multiple networks with an increasing amount of training images (we use KM training images) and test each on another fixed set of images from it. Our goal is to analyze multiple intriguing properties of the adversarial training w.r.t. the number of training samples. We first analyze the sample complexity of adversarial training. Earlier works (wei2019allLayerMargin; bhagoji2019lowerBounds; schmidt2018AdvMoreData)
have derived sample complexity bounds in simplified settings such as classifying a mixture of Gaussian distributions. On the scale of deep neural networks for image classification, we empirically demonstrate that more data continues to improve adversarial robustness. Next, we analyze the accuracy vs robustness tradeoff: earlier works showed that robustness in adversarial training is achieved at a cost of clean accuracy of deep neural networks
(Raghunathan2020advtradeoff; javanmard2020preciseTradeoff). We demonstrate that increase in the number of training samples significantly reduces this tradeoff for deep neural networks. To the best of our knowledge, this is the first empirical investigation into adversarial training of deep neural networks on millions of images on the scale of CIFAR10 dataset.Contributions. We make the following key contributions.

We provide a theoretical understanding as well as an empirical validation of the transfer of adversarial robustness between data distributions. In particular, we provide a tight upper bound on the difference between the robustness of a classifier on two data distributions.

We propose to combine adversarial training with proxy distributions. By leveraging additional images sampled from proxy distributions, we improve robust accuracy by up to % and % in and threat models, respectively, and certified robust accuracy by % on the CIFAR10 dataset.

We provide the first large scale empirical investigation of accuracy vs robustness tradeoff and sample complexity of adversarial training by training deep neural networks on K to M images.
2 Integrating proxy distributions in adversarial training
We first provide a brief overview of adversarial training in deep neural networks. Before integrating proxy distributions in the adversarial training, we first delve deeper into the question of whether adversarial robustness will transfer from the proxy distribution to the original training data distribution. After answering it affirmatively, we integrate proxy distributions in both adversarial training and randomized smoothing, where the latter is used to achieve certified adversarial robustness.
Notation. We represent the input space by and corresponding label space as
. We assume that the data is sampled from a joint distribution, i.e.,
. We refer to the underlying distribution of current image datasets, such as CIFAR10, as . While is unknown, we assume that a limited set of training and test images are available from this distribution. We denote the proxy distribution as . Unlike , is known as it is modeled using a generative model. We denote the neural network for classification by , parameterized by, which maps input images to output probability vectors (
). We represent the crossentropy loss function, which is used to train the classifier, as
, where . For a set sampled from a distribution , we use to denote the empirical distribution with respect to set .Formulation of adversarial training. The key objective in adversarial training is to minimize the training loss on adversarial examples obtained with adversarial attacks, such as projected gradient descent (PGD) (madry2017towards) based attacks, under the following formulation.
where is the threat model which includes the magnitude of adversarial perturbations (), number of attack steps, and the size of each step.
2.1 Understanding transfer of adversarial robustness between data distributions
As explained above, our goal is to use samples from a proxy distribution instead of the actual distribution . We consider two distributions and supported on which is the space of labeled examples. We use a classifier to predict class label of an input sample. We first define the average robustness of a classifier on a distribution followed by the definition of conditional Wasserstein distance, a measure of the distance between two labeled distributions.
Definition 1 (Average Robustness).
We define average robustness for a classifier on a distribution according to a distance metric as follows:
This definition refers to the expectation of the distance to the closest adversarial example for each sample. In contrast to robust accuracy, which measures whether an adversarial example exists within a given distance, we calculate the distance to the closest adversarial example. Having this definition, we can now explain our goal using the following decomposition of adversarial error. Let be a learning algorithm (e.g. adversarial training). In robust learning from a proxy distribution, we are interested in bounding the average robustness of the classifier obtained by , on distribution , when the training set is a set of labeled examples sampled from a proxy distribution . In particular we want to provide a lower bound on the following quantity
In order to understand this quantity better, suppose is a classifier trained on a set that is sampled from , using algorithm . We decompose to three quantities
Using this decomposition, by linearity of expectation and triangle inequality we can bound from below by
As the above inequality suggests, in order to bound the average robustness, we need to bound both the generalization penalty and the distribution shift penalty. Indeed, if and were identical, we were in the standard robust learning setting and we had to only deal with the generalization penalty. The generalization penalty has been studied before in multiple works (cullina2018pac; montasser2019vc; cchmidt2018robustgeneralization) where distributionindependent bounds on robust generalization of adversarial training on VC classes are provided. Hence, we mostly focus on bounding the distribution shift penalty. Our goal is to provide a bound on the distributionshift penalty that is independent of the classifier in hand and is only related to the properties of the distributions. With this goal, we define a notion of distance between two distributions.
Definition 2 (Conditional Wasserstein distance).
For two labeled distributions and supported on , we define conditional wasserstein distance according to a distance metric as follows:
where is the set of joint distributions whose marginals are identical to and .
Conditional Wasserstein distance between the two distributions is simply the expectation of Wasserstein distance between conditional distributions for each class. Note that Wasserstein distance is indeed used as a metric to measure the quality of generative models (Heusel2017FID). Now, we are ready to state our main theorem that bounds the distribution shift penalty for any learning algorithm based only on the Wasserstein distance of the two distributions.
Theorem 1 (Bounding distributionshift penalty).
Let and be two labeled distributions supported on with identical label distributions, i.e., . Then for any classifier
Theorem 1 shows how one can bound the distributionshift penalty. Importantly, it gives a method of measuring the quality of a proxy distribution for robust training. Note that, despite the generalization penalty, the distributionshift penalty does not decrease when more data is provided. Hypothetically, we can have many samples from the proxy distribution which reduces the effect of generalization penalty and makes the distributionshift penalty the dominant factor in reducing robustness. Although there are interesting theoretical results showing the effect of sample complexity on the generalization penalty, these bounds do not apply to neural networks and we take an empirical approach to show that the generalization penalty indeed approaches zero when more samples from the proxy distribution is provided (See Section 3.4). This shows the importance of Theorem 1 in understanding robustness we get by training on proxy distributions. In other words, this theorem enables us to switch our attention from robust generalization to creating high quality generative models for which the underlying distributions is close to the original distribution.
A natural question is what happens when we combine the original distribution with the proxy distribution. For example, one might have access to a generative model but they want to combine the samples from the generative model with some samples from the original distribution and train a robust classifier on the aggregated dataset. The following corollary answers this question.
Corollary 2.
Let and be two labeled distributions supported on with identical label distributions and let be the weighted mixture of and . Then for any classifier
Note that the value of is usually very small as the number of data from proxy distribution is usually much higher than the original distribution. This shows that including (or not including) the data from original distribution should not have a large effect on the obtained bound on distributionshift penalty.
Finally, we show a theorem that shows our bound on distributionshift penalty is tight. The following theorem shows that one cannot obtain a bound on the distributionshift penalty for a specific classifier that is always better than our bound.
Theorem 3 (Tightness of Theorem 1).
For any distribution supported on , any classifier , any homogeneous distance and any , there is a labeled distribution such that
Note that Theorem 3 only shows the tightness of Theorem 1 for a specific classifier. But there might exist a learning algorithm that incurs a much better bound in the expectation. Namely, there might exist such that for any two distributions and we have
We leave finding such an algorithm as an open question.
2.2 Improving adversarial robustness using proxy distributions
Now we focus on improving robustness on original training data distribution (). As Theorem 1 states, robust training on a close proxy distribution () can generalize to the training data distribution. Therefore, to improve robustness on , we propose to augment original training set with samples from . In particular, we use the following adversarial training formulation.
We approach it as an empirical risk minimization problem where we estimate the loss on original training data distribution using available samples in its training set. Similarly, we estimate loss on the proxy distribution using a set of synthetic images sampled from it. Since increasing the number of synthetic samples helps (Section
3.4), we use a significantly large number of synthetic samples than available training samples from D.Next, we aim to also improve certified robustness on the training data distribution (). We use randomized smoothing (cohen2019certified) to certify robustness as it provides better performance and scalability than alternative techniques (Wong2018ScalingProve; Zhang2020crownIbp). Similar to our modification in adversarial training, we propose to train on samples from both proxy distribution () and original training data distribution ().
where , and is the KLdivergence.
We use the combination of both crossentropy loss over unperturbed images and KLdivergence loss on images perturbed with Gaussian noise. This loss function, originally proposed for stability training (zheng2016stabilityTrain), performs better than using as (carmon2019unlabeled), where is the crossentropy loss function.
3 Experimental results
We first describe our experimental setup followed by the choice of generative model for the proxy distribution. Next, we demonstrate the gains in both clean and robust accuracy using samples from the proxy distribution. We also demonstrate that filtering poor quality synthetic images further improve the gains. In the end, we explore different intriguing properties of adversarial training with increasing number of training samples.
3.1 Common experimental setup across experiments
We use network architectures from the ResNet family, namely ResNet18 (he2016resnet) and variants of WideResNet (zagoruyko2016wrn). We primarily work with the CIFAR10 dataset and its twoclass subset, i.e., an easier problem of binary classification between class1 (automobile) and class9 (truck). We refer to the latter as CIFAR2. We consider both and threat models. We use a perturbation budget () of and for the former and latter threat model, respectively. We perform adversarial training using a 10step projected gradient descent attack (PGD10) and benchmark test set robustness with the much stronger AutoAttack (croce2020autoattack)^{1}^{1}1We don’t report numbers with PGD attacks as AutoAttack already captures them while also making it easier to compare with other works (croce2020robustbench).. When using proxy distributions in training, we set
in favor of simplicity. We train each network using stochastic gradient descent and cosine learning rate with weight decay of
and epochs. We use two key metrics to evaluate performance of trained models: clean accuracy and robust accuracy. While the former refers to the accuracy on unmodified test set images, the latter refers to the accuracy on adversarial examples generated from test set images. We also measure certified robust accuracy when we use randomized smoothing. We use , , samples for selection, and samples for estimation in randomized smoothing, as described in cohen2019certified.Model^{1}  FID  Inception score  On synthetic data  On CIFAR10  
Clean  Robust  Clean  Robust  
StyleGAN (karras2020styleganAda)  2.92  10.24  94.1  
DDPM (ho2020denoisingdiffusion)  79.2  85.4  73.6 
3.2 Choice of generative model for the proxy distribution
We work with two stateoftheart generative models, namely StyleGAN (karras2020styleganAda) and DDPM (ho2020denoisingdiffusion). While the former is a generative adversarial network (GAN), the latter is a probabilistic model based on the diffusion process. We sample M labeled images from the conditional StyleGAN and another set of M images from the DDPM model^{4}^{4}4It is a presampled set of images made available by nakkiran2021deepbootstrap.. We use a smaller number of samples from the latter as the cost of generating each sample from it is significantly higher than the former. Note that these samples are drawn from an unconditional DDPM model, thus class labels aren’t available. We label these images using LaNet (wang2019lanet) network, which achieves % clean accuracy on the CIFAR10 dataset without using any additional data. We discuss the effect of different labeling strategies later in this section. To avoid any leakage of the test set in generated samples, both generative models are trained only on the training set of the CIFAR10 dataset.
StyleGAN vs DDPM. Our key question is that which generative model is more useful in improving robustness on the CIFAR10 dataset? We judge this by adversarial training on only synthetic samples (M in total) from the generative model and evaluating the performance on the test set of the CIFAR10 dataset. We use PGD4 attack in both training and evaluating robustness.
We note that existing metrics to evaluate quality of synthetic samples, such as Fréchet Inception Distance (FID) (Heusel2017FID) and Inception Score (IS) (Salimans2016InceptionScore), fails to answer this question. While samples from the StyleGAN model achieve better FID and Inception score, adversarial training on them achieves lower performance on the CIFAR10 test set than samples from the DDPM model (Table 3). For example, training on samples from the latter model achieve % robust accuracy while samples from the former achieve only % robust accuracy on the CIFAR10 test set. We hypothesize that DDPM model generates more hard to learn instances which helps most in learning but are less photorealistic, thus have lower FID and inception score than StyleGAN images. Given their better generalization to the CIFAR10 dataset, we will use samples from the DDPM model to improve robustness in the next section.
3.3 Improving adversarial robustness using proxy distributions
Now we demonstrate that integrating synthetic samples, i.e., samples from the proxy distribution can lead to stateoftheart adversarial robustness. First we describe our experimental setup. Next we demonstrate the benefit of filtering poor quality synthetic samples. Finally, using this filtered set of samples, we significantly improve the adversarial robustness of deep neural networks.
Setup. We use M synthetic images sampled from the unconditional DDPM model which are later labeled using the LaNet model (wang2019lanet). We combine real and synthetic images in a : ratio in each batch. We use the same number of training steps as in the baseline setup which doesn’t use synthetic samples. Therefore our computational cost, despite using millions of synthetic samples, is only of baseline and equal to earlier works that have incorporated extra samples (carmon2019unlabeled; gowal2020uncoveringadv).
3.3.1 Filtering poor quality images from the synthetic data
We explore different classifiers to label the unlabeled synthetic data generated from the DDPM model. In particular, we use BiT (Kolesnikov2020BiT), SplitNet (zhao2020SplitNet), and LaNet (wang2019lanet) where they achieve %, %, and % clean accuracy, respectively, on the CIFAR10 dataset. We find that labels generated from different classifiers achieve slightly different downstream performance when used with adversarial training in the proposed approach (Table 2). We measure both clean accuracy () and robust accuracy with AutoAttack () when performing adversarial (adv.) training with both synthetic and original training images. We find that only up to % of synthetic images are labeled differently by these networks, which causes these differences. On manual inspection, we find that some of these images are of poor quality, i.e., images that aren’t photorealistic or wrongly labeled and remain hard to classify, even for a human labeler. Since filtering millions of images with a human in the loop is extremely costly, we use two deep neural networks, namely LaNet (wang2019lanet) and SplitNet (zhao2020SplitNet), to solve this task.
Network 




BiT  
SplitNet  
LaNet  

84.4  54.8 
We avoid using labels from BiT as it requires transfer learning from ImageNet
(deng2009imagenet) dataset, whereas our goal is to avoid any dependency on extra realworld data. We discard an image when the predicted class of both networks doesn’t match and it is classified with less than 90% confidence by both networks. While the former step flags images which are potentially hard to classify, the latter step ensures that we do not discard images where at least one network is highly confident in its prediction. We also try the % and % confidence threshold but find that % gives the best downstream results. In this process, we discarded K images from M synthetic images. We display some of these discarded images in Figure 4(d) in Appendix. Our experimental results show that the filtering step slightly increases the performance gains from synthetic images (Table 2). We use this filtered set of synthetic images in further experiments.


3.3.2 Integrating the filtered set of synthetic samples in adversarial training
Stateoftheart robust accuracy. We observe that incorporating the filtered set of synthetic images in adversarial training leads to stateoftheart robust accuracy. In the threat model and using a Wrn3410 network, it improves it to %, an improvement of % over previous work using the same network. Similarly, we observe improvement by up to % for attacks. Note that clean accuracy also improves simultaneously. In the category of not using any extra realworld data, we achieve first rank on RobustBench (croce2020robustbench), a standardized benchmark for adversarial robustness, across both threat models where we outperform the previous startoftheart from gowal2020uncoveringadv. Note that in comparison to gowal2020uncoveringadv, which uses a Wrn7016 network with M parameters, we use a smaller Wrn3410 network with only M parameters.
Proxy distribution offsets increase in network parameters. We find that gains from using synthetic samples are equivalent to ones obtained by scaling network size by an order of magnitude (Table 3). For example, a ResNet18 network with synthetic data achieves higher robust accuracy () than a Wrn3420 trained without it, while having fewer parameters than the latter. Similar trend holds for Wrn3410 networks, when compared with a much larger Wrn7016 network. This trend holds for both and threat models (Table 3(a), 3(b)).
Simultaneous improvement in clean accuracy. Due to the accuracy vs robustness tradeoff in adversarial training, improvement in robust accuracy often comes at the cost of clean accuracy. However synthetic samples provide boost in both clean and robust accuracy, simultaneously. We observe improvement in clean accuracy by up to % and % across and threat models, respectively.
3.3.3 Improving certified robustness using the filtered set of synthetic samples
We provide results on certified adversarial robustness in Figure 2 and Table 4. We first compare the performance of our proposed approach with the baseline technique, i.e., RST (carmon2019unlabeled). We achieve significantly higher certified robust accuracy than the baseline approach at all perturbations budgets for both ResNet18 and Wrn2810 network architectures. Additionally, the robustness of our approach decays at a smaller rate than the baseline. At perturbations of , equivalent to perturbation of , our approach achieves % higher certified robust accuracy than RST. We also significantly outperform other certified robustness techniques which aren’t based on randomized smoothing (Zhang2020crownIbp; Wong2018ScalingProve; balunovic2019advPlusProve). Along with better certified robust accuracy, our approach also achieve better clean accuracy than previous approaches, simultaneously.
Synthetic images outperform realworld images. Using only synthetic samples, we also outperform RST when it uses an additional curated set of K realworld images (RST). While the latter achieves % certified robust accuracy, we improve it to %. We also achieve % higher clean accuracy than RST.
3.4 Investigating adversarial robustness with increasing number of training samples
Given the ability to sample an unlimited amount of synthetic images from a proxy distribution, now we investigate the performance of adversarial training with increasing number of training samples. We train the network only on synthetic images and measure its performance on another held out set of synthetic images. We also measure how much the performance generalizes on the CIFAR10 test set.
Setup. We robustly train a ResNet18 network on K to M synthetic images from the StyleGAN model, as in both class and class setup. We opt for StyleGAN over DDPM model as sampling images from the former is much faster, thus we were able to generate up to M synthetic from it. Note that the cost of adversarial training increases almost linearly with the number of attack steps and training images. Thus to achieve manageable computational cost when training on millions of images, we opt for using only a 4step PGD attack (PGD4) in both training. Since robustness achieved with this considerably weak attack may not hold against a strong attack, such as AutoAttack, we opt for evaluating with the PGD4 attack itself. We also perform natural training, i.e., training on unmodified images in some experiments. We test each network on a fixed set of K images from the StyleGAN and K images from the CIFAR10 test set.
Accuracy vs robustness tradeoff. We compare the clean accuracy achieved with both natural and adversarial training in Figure 4. Indeed with a very small number of samples, clean accuracy in adversarial training is traded to achieve robustness. This is evident from the gap between the clean accuracy of natural and adversarial training. However, with the increasing number of training samples, this gap keeps decreasing for both CIFAR2 and CIFAR10 datasets. Most interestingly, this tradeoff almost vanishes when we use a sufficiently high number of training samples for the CIFAR2 classification.
On sample complexity of adversarial training. We report both clean and robust accuracy with adversarial training in Figure 4. We find that both clean and robust accuracy continue to improve with the number of training samples. We also observe nontrivial generalization to test images from the CIFAR10 dataset, which also improves with the number of training samples. Both of these results suggest that even with a small capacity network, such as ResNet18, adversarial robustness can continue to benefit from an increase in the number of training samples.
4 Related work
4.1 Robust Machine learning
Adversarial examples. Adversarial examples, crafted using an adversarial attack, aim to evade the classifier at the test time (biggio2013evasion; Szegedy2013IntrigueAdv; biggio2018wildPatterns). While we focus on perturbation based adversarial examples (Goodfellow2014ExplainAdv), there exist multiple other threat models (wong2019wassersteinAdv; sehwag2019analyzing; hosseini2018semanticAdv; Laidlaw2019functionalAdv; kang2019unforseenAdv). Adversarial examples have been successful against almost all applications of machine learning (xu2020AdvImageGraphText; xu2016autoEvadePdfAdv; rahman2019mockingbird).
On use of threat model. We consider perturbation based adversarial examples since it’s the most widely studied threat model (croce2020robustbench). Additionally, robustness achieved in this threat model has also led to further benefits in explainability (chalasani2020advExplain), image recognition (xie2020advImproveCV; salman2020Unadversarial), transfer learning (Salman2020AdvTransfer; utrera2021AdvTransfer)
, selfsupervised learning
(Ho2020SSLAdvExamples), and image synthesis (Santurkar2019AdvImageSynthesis).Adversarial training. Adversarial training (Goodfellow2014ExplainAdv; madry2017towards) still remains the most effective defense against adversarial examples. The performance of the baseline technique (madry2017towards) is further improved using adaptive loss functions (zhang2019tradeoff), larger networks (gowal2020uncoveringadv), pretraining (hendrycks2019AdvPretrain)
, smooth activation function
(xie2020smooth), and weight perturbations (wu2020adversarial). The fundamental minmax robust optimization behind adversarial training has also been successful in robustifying other model architectures, such as decision trees
(chen2019robustTree) and graph neural networks (feng2019AdvTraingraph), while also extending to other domains like reinforcement learning
(Gleave2020AdvRL)and natural language processing
(liu2020advTrainNLP).Progress on improving adversarial robustness. Improving adversarial robustness in deep neural networks remains a challenging problem and its progress has been slow, as tracked by the RobustBench (croce2020robustbench). Following initial success of baseline adversarial training in madry2017towards, robust accuracy has improved by only % in the setting where networks architecture is fixed to Wrn2810 and no extra realworld images are used (from % in madry2017towards to % in wu2020adversarial). Additionally, there have been more than twenty failed attempts in improving it over the baseline (croce2020robustbench). To put our work in context, we further improve robust accuracy on the CIFAR10 dataset to %, without using additional techniques like smooth activations or weight perturbations.
Certified robustness. Here the goal is to certify the robustness of each examples (Wong2018ScalingProve; cohen2019certified; Zhang2020crownIbp). Certified robustness provides a lower bound on the adversarial robustness against all attacks in the threat model. We use randomized smoothing (cohen2019certified) to achieve certified robustness, since it achieves much better performance over other methods in this domain (cohen2019certified; carmon2019unlabeled). Further improvements in randomized smoothing includes integration with adversarial training (salman2019AdvSmooth), using subnetworks (sehwag2020Hydra), and using samplespecific smoothing (alfarra2020dataDependSmooth).
4.2 Intriguing properties of adversarial training
Sample complexity of adversarial training. Multiple earlier works provide theoretical results studying the effect of the number of training samples on adversarial robustness (wei2019allLayerMargin; bhagoji2019lowerBounds; schmidt2018AdvMoreData; chen2020dataHurtsAdv; min2020datahurts). chen2020dataHurtsAdv and min2020datahurts further suggest that more data may hurt generalization in adversarial training. We explore this direction empirically by adversarial training deep neural networks on an increasing number of training images.
Tradeoff in adversarial training. There also exists an accuracy vs robustness tradeoff where improvement in robust accuracy comes at the cost of clean accuracy in adversarial training (Raghunathan2020advtradeoff; balaji2019instAdaptAdvtrain; javanmard2020preciseTradeoff). We empirically demonstrate that increase in the number of training samples can significantly reduce this tradeoff.
Transfer of adversarial robustness. This line of work focuses on the transfer of adversarial robustness, i.e., correct classification even under adversarial perturbations, when testing the model on different data distributions (shafahi2020AdvRobustTransfer; sehwag2019analyzing). Note that this is different from just achieving correct classification on unmodified images across different distributions (taori2020NatShiftRobust; hendrycks2019CommonCorrupt). Here we provide a theoretical analysis of the transfer of adversarial robustness between data distributions.
Using extra curated realworld data. Prior works (zhai2019advJustmoreData; carmon2019unlabeled; alayrac2019unsupadv; najafi2019advIncompleteData; deng2020AdvExtraOutDomain) have argued for using more training data in adversarial training and often resort to curating additional realworld samples. In contrast, we don’t require additional realworld samples, as we model the proxy distribution from the limited training images available and sample additional synthetic images from this distribution.
4.3 Using generative models for proxy distributions
Generative models for proxy distributions. Stateoftheart generative models are capable of modeling the distribution of current largescale image datasets. In particular, generative adversarial networks (GANs) have excelled at this task (goodfellow2014GAN; karras2020styleganAda; gui2020ganreview). Though GANs generate images with high fidelity, they often lack high diversity (ravuri2019CAS). However, samples from recently proposed diffusion process based models achieve both high diversity and fidelity (ho2020denoisingdiffusion; nichol2021improvedDdpm).
Evaluating quality of samples from a generative model. Fréchet Inception Distance (FID) (Heusel2017FID) and Inception Score (IS) (Heusel2017FID) are two key metrics to evaluate the quality of samples from generative models. While IS considers only generated images, FID considers both generative and realworld images. We find that better FID or Inception score may not translate to higher robustness with synthetic data. Another line of work evaluates generative models by training a classifier on its generated samples and testing it on realworld data (ravuri2019CAS; semeniuta2018accGanEval).
Using generative models to improve adversarial robustness. Earlier works have used generative models to learn training data manifold and the use it to map input samples to data manifold (samangouei2018defenseGan; jalal2017robustmanifold; Xu2018featsqueeze). However, most of these techniques are broken against adaptive attacks (athalye2018obfuscatedGrad; Tramer2020AdaptiveAttack). We use generative models to sample additional training samples which lead to further improvement in adversarial robustness.
Additional applications of proxy distributions. While we use proxy distributions to improve robustness, synthetic samples from proxy distributions are also useful in privacypreserving healthcare (jordon2018pateGan), autonomous driving (mayer2016synDriving), crowdcounting (wang2019countCrowd), text recognition (jaderberg2014synTextReco; ye2018AnotherCaptcha), and natural language processing (marzoev2020unnaturalLangProcess; puri2020syntheticQA). Earlier works have also created dedicated synthetic datasets for some of these applications (gaidon2016virtualkitti; mayer2018viewSynthetic).
Comparison with rebuffi2021fixingaugmentation. A concurrent work by rebuffi2021fixingaugmentation also uses samples from generative models to improve adversarial robustness. While it broadly focuses on the effect of different data augmentations, with synthetic samples from a proxy distribution being one of them, our goal is to delve deeper into the integration of proxy distribution in adversarial training. This includes providing tight bounds on the transfer of adversarial robustness from the proxy distribution followed by empirical analysis of the transfer of robustness, accuracy vs robustness tradeoff, and sample complexity of adversarial training. We also demonstrate an improvement in certified robust accuracy using proxy distributions. However, despite the differences, similar benefits of using generative models in two independent works further ascertain the importance of this research direction.
5 Discussion and Future work
Using synthetic data has been a compelling solution in many applications, such as healthcare (jordon2018pateGan) and autonomous driving (mayer2016synDriving), since it makes collecting a large amount of data feasible. In a similar spirit, we use synthetic data to make deep neural networks more robust to adversarial attacks. However synthetic data is sampled from a proxy distribution, i.e., a distribution only approximating the underlying data distribution of the training data. Thus the first key question is whether synthetic data will help at all in improving robustness. We study the transfer of robustness from proxy to original training data distribution and provide a tight upper bound on it. This result validates the intuition that a proxy distribution, which closely approximates the training data distribution, should be able to improve robustness.
When selecting a generative model for the proxy distribution, we argue that an inflection point exists in their progress, post which generative models sufficiently capture the modes of data, thus generating both photorealistic and diverse set of samples. On the CIFAR10 dataset, we find that both StyleGAN and DDPM models are past this inflection point as samples from both improve the performance. However on ImageNet
(deng2009imagenet) dataset, we didn’t observe any performance improvement with stateoftheart BigGANdeep (brock2018bigGandeep) model. It suggests that we still remain far from the aforementioned inflection point on the ImageNet dataset.On the CIFAR10 dataset, we improve adversarial robust accuracy by up to % and certified robust accuracy by % using synthetic samples from the DDPM model. Since the main goal of the paper was to show the effectiveness of synthetic data, we didn’t use additional techniques to boost robustness. Future work can incorporate these techniques, such as using weight perturbations, weight averaging, longer training schedules, and larger networks, along with synthetic data to further boost robustness. While a major push in recent works is to use larger networks to improve robustness (gowal2020uncoveringadv), we show that similar gains can be obtained by expanding training data to include synthetic samples. Motivated by these findings, we encourage the community to further innovate on the training data distribution itself to improve adversarial robustness.
References
Appendix A Appendix
a.1 Proofs
Theorem 1.
Let and be two distributions supported on . Then for any classifier
Proof.
We first provide the sketch the steps of the proof informally and then formalize the steps after that. Consider to be the distribution that is the outcome of the following process: First sample from , then find the closest such that and output . By definition, the conditional Wasserestein distance between and is equal to . Now consider a similar distribution corresponding to , so that we have . By triangle inequality for Wasserestein distance we have,
(1) 
Also, by the way the distributions and are defined we have
(2) 
Roughly, the reason behind this is that all examples sampled from could be seen as an adversarial example for all elements of with the label . And we know that consists of optimal adversarial examples for , therefore, the optimal transport between and should be smaller than the optimal transport between and . Now combining Inequality 1 and 2 we have
(3) 
With a similar argument, because of symmetry of and , we can also prove
(4) 
Corollary 2.
Let and be two labeled distributions supported on with identical label distributions and let be the weighted mixture of and . Then for any classifier
Proof.
We just need to show that Note that since the label distributions are equal, we have
. Now let be the optimal transport between and . Now construct a joint distribution . Notice that is a joint distribution with marginals equal to and . Therefor is a transport between and and we can calculate its cost. We have
Therefore, we have
∎
Theorem 3 (Tightness of Theorem 1).
For any distribution supported on , any classifier , any homogeneous distance and any , there is a labeled distribution such that
Proof.
for let be the distribution of the following process: First sample from , then find the closest such that and output . By definition, the conditional Wasserestein distance between and is equal to . We also have .
Observe that for any classifier we have because if is an adversarial example for , then is also an adversarial example for with distance . On the other hand we have because any adversarial example for with distance is also an adversarial example for with distance at most and since is the optimal adversarial example for then must be at least . Therefore, we have Putting everything together and setting we have
∎
Comments
There are no comments yet.