Improving Adversarial Robustness Using Proxy Distributions

04/19/2021 ∙ by Vikash Sehwag, et al. ∙ Princeton University 9

We focus on the use of proxy distributions, i.e., approximations of the underlying distribution of the training dataset, in both understanding and improving the adversarial robustness in image classification. While additional training data helps in adversarial training, curating a very large number of real-world images is challenging. In contrast, proxy distributions enable us to sample a potentially unlimited number of images and improve adversarial robustness using these samples. We first ask the question: when does adversarial robustness benefit from incorporating additional samples from the proxy distribution in the training stage? We prove that the difference between the robustness of a classifier on the proxy and original training dataset distribution is upper bounded by the conditional Wasserstein distance between them. Our result confirms the intuition that samples from a proxy distribution that closely approximates training dataset distribution should be able to boost adversarial robustness. Motivated by this finding, we leverage samples from state-of-the-art generative models, which can closely approximate training data distribution, to improve robustness. In particular, we improve robust accuracy by up to 6.1 robust accuracy by 6.7 CIFAR-10 dataset. Since we can sample an unlimited number of images from a proxy distribution, it also allows us to investigate the effect of an increasing number of training samples on adversarial robustness. Here we provide the first large scale empirical investigation of accuracy vs robustness trade-off and sample complexity of adversarial training by training deep neural networks on 2K to 10M images.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 24

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Work in progress. Corresponding author: vvikash@princeton.edu

A short version is published at ICLR 2021 Workshop on Security and Safety in Machine Learning Systems.

To achieve robustness against adversarial examples, adversarial training remains the most effective technique (madry2017towards; zhang2019tradeoff; pang2021bagoftricks). However, it is largely used with limited training samples available in current image datasets (such as only 50,000 training images in the CIFAR-10 dataset) where it suffers from multiple challenges, such as poor generalization on the test set (madry2017towards) and accuracy vs robustness trade-off (Raghunathan2020advtradeoff). Recent works have demonstrated that more training data can improve the performance of adversarial training (schmidt2018AdvMoreData; alayrac2019unsupadv; deng2020AdvExtraOutDomain)

. However, this approach runs into the challenge of curating a large set of real-world images for training. We circumvent this challenge using proxy distributions, distributions that closely approximate the underlying distribution of the original training dataset. Note that the proxy distributions are modeled using only available training images in current datasets. Proxy distributions when modeled with deep neural networks, such as Generative adversarial networks (GAN), allow us to generate a potentially unlimited number of high-fidelity images 

(goodfellow2014GAN; gui2020ganreview; ho2020denoisingdiffusion). Our key goal is to improve the adversarial robustness of deep neural networks by taking advantage of samples from such proxy distributions.

Note that proxy distributions are only an approximation of the underlying distribution of the training data. This begs the question of whether robustness achieved on samples from the proxy distribution will transfer to the original dataset. A more generic version of this question is how much the robustness of a classifier transfers between data distributions? We prove that the difference in the robustness of a classifier on two distributions is tightly upper bounded by the conditional Wasserstein distance between them. We refer to Wasserstein distance conditioned on class labels as conditional Wasserstein distance. This result confirms the intuition that samples from a proxy distribution that closely approximates training data distribution should be able to improve robustness on the latter. We also empirically validate this claim where we show that adversarial training only on samples from the proxy distribution indeed achieves non-trivial robustness on samples from the original distribution.

To improve adversarial robustness on current datasets, we aim to leverage samples from a proxy distribution which closely approximates the underlying distribution of these datasets. Here we propose to simultaneously train on both the original training set and a set of additional images sampled from the proxy distribution. We use state-of-the-art generative models as a model for proxy distribution since they can closely approximate the data distribution from only a limited number of training samples (karras2020styleganAda; ho2020denoisingdiffusion; gui2020ganreview). Our experimental results demonstrate that the use of samples from the proxy distribution improves robust accuracy by up to % and % in and threat models, respectively, over baselines not using proxy distributions on the CIFAR-10 dataset. In the category of not using any extra real-world data, our models achieve the first rank on RobustBench (croce2020robustbench), a standardized benchmark for adversarial training. We also improve the certified robust accuracy (cohen2019certified) by up to % with randomized smoothing on the CIFAR-10 dataset. Intriguingly, we achieve better certified robust accuracy with proxy distribution samples than using an additional set of K curated real-world images (carmon2019unlabeled).

Since access to proxy distributions gives us the ability to generate a large number of images, this allows us to delve deeper into the relationship of adversarial robustness with the number of training samples on the scale of image classification with deep neural networks. Next, we work solely with a proxy distribution, where we train multiple networks with an increasing amount of training images (we use K-M training images) and test each on another fixed set of images from it. Our goal is to analyze multiple intriguing properties of the adversarial training w.r.t. the number of training samples. We first analyze the sample complexity of adversarial training. Earlier works (wei2019allLayerMargin; bhagoji2019lowerBounds; schmidt2018AdvMoreData)

have derived sample complexity bounds in simplified settings such as classifying a mixture of Gaussian distributions. On the scale of deep neural networks for image classification, we empirically demonstrate that more data continues to improve adversarial robustness. Next, we analyze the accuracy vs robustness trade-off: earlier works showed that robustness in adversarial training is achieved at a cost of clean accuracy of deep neural networks 

(Raghunathan2020advtradeoff; javanmard2020preciseTradeoff). We demonstrate that increase in the number of training samples significantly reduces this trade-off for deep neural networks. To the best of our knowledge, this is the first empirical investigation into adversarial training of deep neural networks on millions of images on the scale of CIFAR-10 dataset.

Contributions. We make the following key contributions.

  • We provide a theoretical understanding as well as an empirical validation of the transfer of adversarial robustness between data distributions. In particular, we provide a tight upper bound on the difference between the robustness of a classifier on two data distributions.

  • We propose to combine adversarial training with proxy distributions. By leveraging additional images sampled from proxy distributions, we improve robust accuracy by up to % and % in and threat models, respectively, and certified robust accuracy by % on the CIFAR-10 dataset.

  • We provide the first large scale empirical investigation of accuracy vs robustness trade-off and sample complexity of adversarial training by training deep neural networks on K to M images.

2 Integrating proxy distributions in adversarial training

We first provide a brief overview of adversarial training in deep neural networks. Before integrating proxy distributions in the adversarial training, we first delve deeper into the question of whether adversarial robustness will transfer from the proxy distribution to the original training data distribution. After answering it affirmatively, we integrate proxy distributions in both adversarial training and randomized smoothing, where the latter is used to achieve certified adversarial robustness.

Notation. We represent the input space by and corresponding label space as

. We assume that the data is sampled from a joint distribution, i.e.,

. We refer to the underlying distribution of current image datasets, such as CIFAR-10, as . While is unknown, we assume that a limited set of training and test images are available from this distribution. We denote the proxy distribution as . Unlike , is known as it is modeled using a generative model. We denote the neural network for classification by , parameterized by

, which maps input images to output probability vectors (

). We represent the cross-entropy loss function, which is used to train the classifier, as

, where . For a set sampled from a distribution , we use to denote the empirical distribution with respect to set .

Formulation of adversarial training. The key objective in adversarial training is to minimize the training loss on adversarial examples obtained with adversarial attacks, such as projected gradient descent (PGD) (madry2017towards) based attacks, under the following formulation.

where is the threat model which includes the magnitude of adversarial perturbations (), number of attack steps, and the size of each step.

2.1 Understanding transfer of adversarial robustness between data distributions

As explained above, our goal is to use samples from a proxy distribution instead of the actual distribution . We consider two distributions and supported on which is the space of labeled examples. We use a classifier to predict class label of an input sample. We first define the average robustness of a classifier on a distribution followed by the definition of conditional Wasserstein distance, a measure of the distance between two labeled distributions.

Definition 1 (Average Robustness).

We define average robustness for a classifier on a distribution according to a distance metric as follows:

This definition refers to the expectation of the distance to the closest adversarial example for each sample. In contrast to robust accuracy, which measures whether an adversarial example exists within a given distance, we calculate the distance to the closest adversarial example. Having this definition, we can now explain our goal using the following decomposition of adversarial error. Let be a learning algorithm (e.g. adversarial training). In robust learning from a proxy distribution, we are interested in bounding the average robustness of the classifier obtained by , on distribution , when the training set is a set of labeled examples sampled from a proxy distribution . In particular we want to provide a lower bound on the following quantity

In order to understand this quantity better, suppose is a classifier trained on a set that is sampled from , using algorithm . We decompose to three quantities

Using this decomposition, by linearity of expectation and triangle inequality we can bound from below by

As the above inequality suggests, in order to bound the average robustness, we need to bound both the generalization penalty and the distribution shift penalty. Indeed, if and were identical, we were in the standard robust learning setting and we had to only deal with the generalization penalty. The generalization penalty has been studied before in multiple works (cullina2018pac; montasser2019vc; cchmidt2018robustgeneralization) where distribution-independent bounds on robust generalization of adversarial training on VC classes are provided. Hence, we mostly focus on bounding the distribution shift penalty. Our goal is to provide a bound on the distribution-shift penalty that is independent of the classifier in hand and is only related to the properties of the distributions. With this goal, we define a notion of distance between two distributions.

Definition 2 (Conditional Wasserstein distance).

For two labeled distributions and supported on , we define conditional wasserstein distance according to a distance metric as follows:

where is the set of joint distributions whose marginals are identical to and .

Conditional Wasserstein distance between the two distributions is simply the expectation of Wasserstein distance between conditional distributions for each class. Note that Wasserstein distance is indeed used as a metric to measure the quality of generative models (Heusel2017FID). Now, we are ready to state our main theorem that bounds the distribution shift penalty for any learning algorithm based only on the Wasserstein distance of the two distributions.

Theorem 1 (Bounding distribution-shift penalty).

Let and be two labeled distributions supported on with identical label distributions, i.e., . Then for any classifier

Theorem 1 shows how one can bound the distribution-shift penalty. Importantly, it gives a method of measuring the quality of a proxy distribution for robust training. Note that, despite the generalization penalty, the distribution-shift penalty does not decrease when more data is provided. Hypothetically, we can have many samples from the proxy distribution which reduces the effect of generalization penalty and makes the distribution-shift penalty the dominant factor in reducing robustness. Although there are interesting theoretical results showing the effect of sample complexity on the generalization penalty, these bounds do not apply to neural networks and we take an empirical approach to show that the generalization penalty indeed approaches zero when more samples from the proxy distribution is provided (See Section 3.4). This shows the importance of Theorem 1 in understanding robustness we get by training on proxy distributions. In other words, this theorem enables us to switch our attention from robust generalization to creating high quality generative models for which the underlying distributions is close to the original distribution.

A natural question is what happens when we combine the original distribution with the proxy distribution. For example, one might have access to a generative model but they want to combine the samples from the generative model with some samples from the original distribution and train a robust classifier on the aggregated dataset. The following corollary answers this question.

Corollary 2.

Let and be two labeled distributions supported on with identical label distributions and let be the weighted mixture of and . Then for any classifier

Note that the value of is usually very small as the number of data from proxy distribution is usually much higher than the original distribution. This shows that including (or not including) the data from original distribution should not have a large effect on the obtained bound on distribution-shift penalty.

Finally, we show a theorem that shows our bound on distribution-shift penalty is tight. The following theorem shows that one cannot obtain a bound on the distribution-shift penalty for a specific classifier that is always better than our bound.

Theorem 3 (Tightness of Theorem 1).

For any distribution supported on , any classifier , any homogeneous distance and any , there is a labeled distribution such that

Note that Theorem 3 only shows the tightness of Theorem 1 for a specific classifier. But there might exist a learning algorithm that incurs a much better bound in the expectation. Namely, there might exist such that for any two distributions and we have

We leave finding such an algorithm as an open question.

2.2 Improving adversarial robustness using proxy distributions

Now we focus on improving robustness on original training data distribution (). As Theorem 1 states, robust training on a close proxy distribution () can generalize to the training data distribution. Therefore, to improve robustness on , we propose to augment original training set with samples from . In particular, we use the following adversarial training formulation.

We approach it as an empirical risk minimization problem where we estimate the loss on original training data distribution using available samples in its training set. Similarly, we estimate loss on the proxy distribution using a set of synthetic images sampled from it. Since increasing the number of synthetic samples helps (Section 

3.4), we use a significantly large number of synthetic samples than available training samples from D.

Next, we aim to also improve certified robustness on the training data distribution (). We use randomized smoothing (cohen2019certified) to certify robustness as it provides better performance and scalability than alternative techniques (Wong2018ScalingProve; Zhang2020crownIbp). Similar to our modification in adversarial training, we propose to train on samples from both proxy distribution () and original training data distribution ().

where , and is the KL-divergence.

We use the combination of both cross-entropy loss over unperturbed images and KL-divergence loss on images perturbed with Gaussian noise. This loss function, originally proposed for stability training (zheng2016stabilityTrain), performs better than using as  (carmon2019unlabeled), where is the cross-entropy loss function.

Figure 1: Overview of our approach. Sketch of our proposed approach to improve adversarial robustness using proxy distributions. Using only training images from a given dataset, we first train a generative model and sample a significantly large amount of synthetic images from it. Next, we filter poor quality synthetic images from it. Finally, we robustly train a classifier on the combined set of filtered synthetic samples and original training samples.

3 Experimental results

We first describe our experimental setup followed by the choice of generative model for the proxy distribution. Next, we demonstrate the gains in both clean and robust accuracy using samples from the proxy distribution. We also demonstrate that filtering poor quality synthetic images further improve the gains. In the end, we explore different intriguing properties of adversarial training with increasing number of training samples.

3.1 Common experimental setup across experiments

We use network architectures from the ResNet family, namely ResNet-18 (he2016resnet) and variants of WideResNet (zagoruyko2016wrn). We primarily work with the CIFAR-10 dataset and its two-class subset, i.e., an easier problem of binary classification between class-1 (automobile) and class-9 (truck). We refer to the latter as CIFAR-2. We consider both and threat models. We use a perturbation budget () of and for the former and latter threat model, respectively. We perform adversarial training using a 10-step projected gradient descent attack (PGD-10) and benchmark test set robustness with the much stronger AutoAttack (croce2020autoattack)111We don’t report numbers with PGD attacks as AutoAttack already captures them while also making it easier to compare with other works (croce2020robustbench).. When using proxy distributions in training, we set

in favor of simplicity. We train each network using stochastic gradient descent and cosine learning rate with weight decay of

and epochs. We use two key metrics to evaluate performance of trained models: clean accuracy and robust accuracy. While the former refers to the accuracy on unmodified test set images, the latter refers to the accuracy on adversarial examples generated from test set images. We also measure certified robust accuracy when we use randomized smoothing. We use , , samples for selection, and samples for estimation in randomized smoothing, as described in cohen2019certified.

Model1 FID Inception score On synthetic data On CIFAR-10
Clean Robust Clean Robust
StyleGAN (karras2020styleganAda) 2.92 10.24 94.1
DDPM (ho2020denoisingdiffusion) 79.2 85.4 73.6
Table 1: Comparing different generative models. Comparing quality of synthetic samples generated from StyleGAN and DDPM model. We report FID and Inception score which are standard metrics to evaluate quality of synthetic samples333We provide numbers reported in the original publication of each method.. We also measure generalization to a held out test set of K synthetic images and CIFAR-10 test set, when a ResNet-18 network is adversarially trained only on M synthetic images from the respective model.

3.2 Choice of generative model for the proxy distribution

We work with two state-of-the-art generative models, namely StyleGAN (karras2020styleganAda) and DDPM (ho2020denoisingdiffusion). While the former is a generative adversarial network (GAN), the latter is a probabilistic model based on the diffusion process. We sample M labeled images from the conditional StyleGAN and another set of M images from the DDPM model444It is a pre-sampled set of images made available by nakkiran2021deepbootstrap.. We use a smaller number of samples from the latter as the cost of generating each sample from it is significantly higher than the former. Note that these samples are drawn from an unconditional DDPM model, thus class labels aren’t available. We label these images using LaNet (wang2019lanet) network, which achieves % clean accuracy on the CIFAR-10 dataset without using any additional data. We discuss the effect of different labeling strategies later in this section. To avoid any leakage of the test set in generated samples, both generative models are trained only on the training set of the CIFAR-10 dataset.

StyleGAN vs DDPM. Our key question is that which generative model is more useful in improving robustness on the CIFAR-10 dataset? We judge this by adversarial training on only synthetic samples (M in total) from the generative model and evaluating the performance on the test set of the CIFAR-10 dataset. We use PGD-4 attack in both training and evaluating robustness.

We note that existing metrics to evaluate quality of synthetic samples, such as Fréchet Inception Distance (FID) (Heusel2017FID) and Inception Score (IS) (Salimans2016InceptionScore), fails to answer this question. While samples from the StyleGAN model achieve better FID and Inception score, adversarial training on them achieves lower performance on the CIFAR-10 test set than samples from the DDPM model (Table 3). For example, training on samples from the latter model achieve % robust accuracy while samples from the former achieve only % robust accuracy on the CIFAR-10 test set. We hypothesize that DDPM model generates more hard to learn instances which helps most in learning but are less photorealistic, thus have lower FID and inception score than StyleGAN images. Given their better generalization to the CIFAR-10 dataset, we will use samples from the DDPM model to improve robustness in the next section.

3.3 Improving adversarial robustness using proxy distributions

Now we demonstrate that integrating synthetic samples, i.e., samples from the proxy distribution can lead to state-of-the-art adversarial robustness. First we describe our experimental setup. Next we demonstrate the benefit of filtering poor quality synthetic samples. Finally, using this filtered set of samples, we significantly improve the adversarial robustness of deep neural networks.

Setup. We use M synthetic images sampled from the unconditional DDPM model which are later labeled using the LaNet model (wang2019lanet). We combine real and synthetic images in a : ratio in each batch. We use the same number of training steps as in the baseline setup which doesn’t use synthetic samples. Therefore our computational cost, despite using millions of synthetic samples, is only of baseline and equal to earlier works that have incorporated extra samples (carmon2019unlabeled; gowal2020uncoveringadv).

3.3.1 Filtering poor quality images from the synthetic data

We explore different classifiers to label the unlabeled synthetic data generated from the DDPM model. In particular, we use BiT (Kolesnikov2020BiT), SplitNet (zhao2020SplitNet), and LaNet (wang2019lanet) where they achieve %, %, and % clean accuracy, respectively, on the CIFAR-10 dataset. We find that labels generated from different classifiers achieve slightly different downstream performance when used with adversarial training in the proposed approach (Table 2). We measure both clean accuracy () and robust accuracy with AutoAttack () when performing adversarial (adv.) training with both synthetic and original training images. We find that only up to % of synthetic images are labeled differently by these networks, which causes these differences. On manual inspection, we find that some of these images are of poor quality, i.e., images that aren’t photorealistic or wrongly labeled and remain hard to classify, even for a human labeler. Since filtering millions of images with a human in the loop is extremely costly, we use two deep neural networks, namely LaNet (wang2019lanet) and SplitNet (zhao2020SplitNet), to solve this task.

Network
CIFAR-10
(Clean)
Adv. training
Clean     Auto
BiT
SplitNet
LaNet
LaNet
(filtered)
84.4 54.8
Table 2: Filtering synthetic images. Different labeling networks for synthetic data achieves slightly different downstream performance. Filtering poor quality images based on these labels further improves the performance.

We avoid using labels from BiT as it requires transfer learning from ImageNet 

(deng2009imagenet) dataset, whereas our goal is to avoid any dependency on extra real-world data. We discard an image when the predicted class of both networks doesn’t match and it is classified with less than 90% confidence by both networks. While the former step flags images which are potentially hard to classify, the latter step ensures that we do not discard images where at least one network is highly confident in its prediction. We also try the % and % confidence threshold but find that % gives the best downstream results. In this process, we discarded K images from M synthetic images. We display some of these discarded images in Figure 4(d) in Appendix. Our experimental results show that the filtering step slightly increases the performance gains from synthetic images (Table 2). We use this filtered set of synthetic images in further experiments.

Method Architecture Parameters (M) Clean Auto
zhang2019tradeoff ResNet-18 11.2 82.0 48.7
madry2017towards ResNet-50 23.5 87.0 49.0
zhang2019tradeoff Wrn-34-10 46.2 84.9 53.1
rice2020overfitadv Wrn-34-20 184.5 85.3 53.4
gowal2020uncoveringadv Wrn-70-16 266.8 85.3 57.2
Ours ResNet-18 11.2 84.4 54.8
Ours Wrn-34-10 46.2 85.8 59.1
(a) threat model.
Method Architecture Parameters (M) Clean Auto
rice2020overfitadv ResNet-18 11.2 88.7 67.7
madry2017towards ResNet-50 23.5 90.8 69.2
wu2020adversarial Wrn-34-10 46.2 88.5 73.7
gowal2020uncoveringadv Wrn-70-16 266.8 90.9 74.5
Ours ResNet-18 11.2 89.5 73.4
Ours Wrn-34-10 46.2 90.3 76.1
(b) threat model.
Table 3: State-of-the-art adversarial robustness. Experimental results with adversarial training on the CIFAR-10 dataset for both and threat model. Using additional synthetic data brings a large gain in adversarial robustness across networks architecture and threat models. Clean and Auto refers to clean accuracy and robust accuracy measured with AutoAttack, respectively.

3.3.2 Integrating the filtered set of synthetic samples in adversarial training

State-of-the-art robust accuracy. We observe that incorporating the filtered set of synthetic images in adversarial training leads to state-of-the-art robust accuracy. In the threat model and using a Wrn-34-10 network, it improves it to %, an improvement of % over previous work using the same network. Similarly, we observe improvement by up to % for attacks. Note that clean accuracy also improves simultaneously. In the category of not using any extra real-world data, we achieve first rank on RobustBench (croce2020robustbench), a standardized benchmark for adversarial robustness, across both threat models where we outperform the previous start-of-the-art from gowal2020uncoveringadv. Note that in comparison to gowal2020uncoveringadv, which uses a Wrn-70-16 network with M parameters, we use a smaller Wrn-34-10 network with only M parameters.

Proxy distribution offsets increase in network parameters. We find that gains from using synthetic samples are equivalent to ones obtained by scaling network size by an order of magnitude (Table 3). For example, a ResNet-18 network with synthetic data achieves higher robust accuracy () than a Wrn-34-20 trained without it, while having fewer parameters than the latter. Similar trend holds for Wrn-34-10 networks, when compared with a much larger Wrn-70-16 network. This trend holds for both and threat models (Table 3(a)3(b)).

Simultaneous improvement in clean accuracy. Due to the accuracy vs robustness trade-off in adversarial training, improvement in robust accuracy often comes at the cost of clean accuracy. However synthetic samples provide boost in both clean and robust accuracy, simultaneously. We observe improvement in clean accuracy by up to % and % across and threat models, respectively.

Figure 2: Certified robustness. Certified robust accuracy of baseline randomized smoothing technique, i.e., RST (carmon2019unlabeled) and our work with two different models.
Method Clean Certified Wong2018ScalingProve (single) Wong2018ScalingProve (ensemble) CROWN-IBP (Zhang2020crownIbp) balunovic2019advPlusProve RST (carmon2019unlabeled) RST (carmon2019unlabeled) Ours (ResNet-18) 81.1 62.5 Ours (Wrn-28-10) 83.5 65.3 Table 4: Detailed comparison of certified robustness. Comparing both clean accuracy (Clean) and certified robust accuracy (Certified) of our work with earlier approaches at an perturbation of .

3.3.3 Improving certified robustness using the filtered set of synthetic samples

We provide results on certified adversarial robustness in Figure 2 and Table 4. We first compare the performance of our proposed approach with the baseline technique, i.e., RST (carmon2019unlabeled). We achieve significantly higher certified robust accuracy than the baseline approach at all perturbations budgets for both ResNet-18 and Wrn-28-10 network architectures. Additionally, the robustness of our approach decays at a smaller rate than the baseline. At perturbations of , equivalent to perturbation of , our approach achieves % higher certified robust accuracy than RST. We also significantly outperform other certified robustness techniques which aren’t based on randomized smoothing (Zhang2020crownIbp; Wong2018ScalingProve; balunovic2019advPlusProve). Along with better certified robust accuracy, our approach also achieve better clean accuracy than previous approaches, simultaneously.

Synthetic images outperform real-world images. Using only synthetic samples, we also outperform RST when it uses an additional curated set of K real-world images (RST). While the latter achieves % certified robust accuracy, we improve it to %. We also achieve % higher clean accuracy than RST.

(a) CIFAR-2 (b) CIFAR-10
Figure 3: Reduction in accuracy vs robustness trade-off. Accuracy vs robustness trade-off when training on an increasing amount of synthetic images from the StyleGAN model. It shows that the drop in clean accuracy with adversarial training decreases with increase in training samples.
(a) CIFAR-2 (b) CIFAR-10
Figure 4: Sample complexity of adversarial training. Clean and robust accuracy on the test set of synthetic samples when trained on an increasing number of synthetic samples from the StyleGAN model. It shows that performance of adversarial training continues to benefit from increase in number of training samples. We also measure generalization to the CIFAR-10 dataset, which also improves with number of training samples.

3.4 Investigating adversarial robustness with increasing number of training samples

Given the ability to sample an unlimited amount of synthetic images from a proxy distribution, now we investigate the performance of adversarial training with increasing number of training samples. We train the network only on synthetic images and measure its performance on another held out set of synthetic images. We also measure how much the performance generalizes on the CIFAR-10 test set.

Setup. We robustly train a ResNet-18 network on K to M synthetic images from the StyleGAN model, as in both -class and -class setup. We opt for StyleGAN over DDPM model as sampling images from the former is much faster, thus we were able to generate up to M synthetic from it. Note that the cost of adversarial training increases almost linearly with the number of attack steps and training images. Thus to achieve manageable computational cost when training on millions of images, we opt for using only a 4-step PGD attack (PGD-4) in both training. Since robustness achieved with this considerably weak attack may not hold against a strong attack, such as AutoAttack, we opt for evaluating with the PGD-4 attack itself. We also perform natural training, i.e., training on unmodified images in some experiments. We test each network on a fixed set of K images from the StyleGAN and K images from the CIFAR-10 test set.

Accuracy vs robustness trade-off. We compare the clean accuracy achieved with both natural and adversarial training in Figure 4. Indeed with a very small number of samples, clean accuracy in adversarial training is traded to achieve robustness. This is evident from the gap between the clean accuracy of natural and adversarial training. However, with the increasing number of training samples, this gap keeps decreasing for both CIFAR-2 and CIFAR-10 datasets. Most interestingly, this trade-off almost vanishes when we use a sufficiently high number of training samples for the CIFAR-2 classification.

On sample complexity of adversarial training. We report both clean and robust accuracy with adversarial training in Figure 4. We find that both clean and robust accuracy continue to improve with the number of training samples. We also observe non-trivial generalization to test images from the CIFAR-10 dataset, which also improves with the number of training samples. Both of these results suggest that even with a small capacity network, such as ResNet-18, adversarial robustness can continue to benefit from an increase in the number of training samples.

4 Related work

4.1 Robust Machine learning

Adversarial examples. Adversarial examples, crafted using an adversarial attack, aim to evade the classifier at the test time (biggio2013evasion; Szegedy2013IntrigueAdv; biggio2018wildPatterns). While we focus on perturbation based adversarial examples (Goodfellow2014ExplainAdv), there exist multiple other threat models (wong2019wassersteinAdv; sehwag2019analyzing; hosseini2018semanticAdv; Laidlaw2019functionalAdv; kang2019unforseenAdv). Adversarial examples have been successful against almost all applications of machine learning (xu2020AdvImageGraphText; xu2016autoEvadePdfAdv; rahman2019mockingbird).

On use of threat model. We consider perturbation based adversarial examples since it’s the most widely studied threat model (croce2020robustbench). Additionally, robustness achieved in this threat model has also led to further benefits in explainability (chalasani2020advExplain), image recognition (xie2020advImproveCV; salman2020Unadversarial), transfer learning (Salman2020AdvTransfer; utrera2021AdvTransfer)

, self-supervised learning 

(Ho2020SSLAdvExamples), and image synthesis (Santurkar2019AdvImageSynthesis).

Adversarial training. Adversarial training (Goodfellow2014ExplainAdv; madry2017towards) still remains the most effective defense against adversarial examples. The performance of the baseline technique (madry2017towards) is further improved using adaptive loss functions (zhang2019tradeoff), larger networks (gowal2020uncoveringadv), pretraining (hendrycks2019AdvPretrain)

, smooth activation function 

(xie2020smooth), and weight perturbations (wu2020adversarial)

. The fundamental min-max robust optimization behind adversarial training has also been successful in robustifying other model architectures, such as decision trees

(chen2019robustTree) and graph neural networks (feng2019AdvTraingraph)

, while also extending to other domains like reinforcement learning 

(Gleave2020AdvRL)

and natural language processing 

(liu2020advTrainNLP).

Progress on improving adversarial robustness. Improving adversarial robustness in deep neural networks remains a challenging problem and its progress has been slow, as tracked by the RobustBench (croce2020robustbench). Following initial success of baseline adversarial training in madry2017towards, robust accuracy has improved by only % in the setting where networks architecture is fixed to Wrn-28-10 and no extra real-world images are used (from % in madry2017towards to % in wu2020adversarial). Additionally, there have been more than twenty failed attempts in improving it over the baseline (croce2020robustbench). To put our work in context, we further improve robust accuracy on the CIFAR-10 dataset to %, without using additional techniques like smooth activations or weight perturbations.

Certified robustness. Here the goal is to certify the robustness of each examples (Wong2018ScalingProve; cohen2019certified; Zhang2020crownIbp). Certified robustness provides a lower bound on the adversarial robustness against all attacks in the threat model. We use randomized smoothing (cohen2019certified) to achieve certified robustness, since it achieves much better performance over other methods in this domain (cohen2019certified; carmon2019unlabeled). Further improvements in randomized smoothing includes integration with adversarial training (salman2019AdvSmooth), using sub-networks (sehwag2020Hydra), and using sample-specific smoothing (alfarra2020dataDependSmooth).

4.2 Intriguing properties of adversarial training

Sample complexity of adversarial training. Multiple earlier works provide theoretical results studying the effect of the number of training samples on adversarial robustness (wei2019allLayerMargin; bhagoji2019lowerBounds; schmidt2018AdvMoreData; chen2020dataHurtsAdv; min2020datahurts). chen2020dataHurtsAdv and min2020datahurts further suggest that more data may hurt generalization in adversarial training. We explore this direction empirically by adversarial training deep neural networks on an increasing number of training images.

Trade-off in adversarial training. There also exists an accuracy vs robustness trade-off where improvement in robust accuracy comes at the cost of clean accuracy in adversarial training (Raghunathan2020advtradeoff; balaji2019instAdaptAdvtrain; javanmard2020preciseTradeoff). We empirically demonstrate that increase in the number of training samples can significantly reduce this trade-off.

Transfer of adversarial robustness. This line of work focuses on the transfer of adversarial robustness, i.e., correct classification even under adversarial perturbations, when testing the model on different data distributions (shafahi2020AdvRobustTransfer; sehwag2019analyzing). Note that this is different from just achieving correct classification on unmodified images across different distributions (taori2020NatShiftRobust; hendrycks2019CommonCorrupt). Here we provide a theoretical analysis of the transfer of adversarial robustness between data distributions.

Using extra curated real-world data. Prior works (zhai2019advJustmoreData; carmon2019unlabeled; alayrac2019unsupadv; najafi2019advIncompleteData; deng2020AdvExtraOutDomain) have argued for using more training data in adversarial training and often resort to curating additional real-world samples. In contrast, we don’t require additional real-world samples, as we model the proxy distribution from the limited training images available and sample additional synthetic images from this distribution.

4.3 Using generative models for proxy distributions

Generative models for proxy distributions. State-of-the-art generative models are capable of modeling the distribution of current large-scale image datasets. In particular, generative adversarial networks (GANs) have excelled at this task (goodfellow2014GAN; karras2020styleganAda; gui2020ganreview). Though GANs generate images with high fidelity, they often lack high diversity (ravuri2019CAS). However, samples from recently proposed diffusion process based models achieve both high diversity and fidelity (ho2020denoisingdiffusion; nichol2021improvedDdpm).

Evaluating quality of samples from a generative model. Fréchet Inception Distance (FID) (Heusel2017FID) and Inception Score (IS) (Heusel2017FID) are two key metrics to evaluate the quality of samples from generative models. While IS considers only generated images, FID considers both generative and real-world images. We find that better FID or Inception score may not translate to higher robustness with synthetic data. Another line of work evaluates generative models by training a classifier on its generated samples and testing it on real-world data (ravuri2019CAS; semeniuta2018accGanEval).

Using generative models to improve adversarial robustness. Earlier works have used generative models to learn training data manifold and the use it to map input samples to data manifold (samangouei2018defenseGan; jalal2017robustmanifold; Xu2018featsqueeze). However, most of these techniques are broken against adaptive attacks (athalye2018obfuscatedGrad; Tramer2020AdaptiveAttack). We use generative models to sample additional training samples which lead to further improvement in adversarial robustness.

Additional applications of proxy distributions. While we use proxy distributions to improve robustness, synthetic samples from proxy distributions are also useful in privacy-preserving healthcare (jordon2018pateGan), autonomous driving (mayer2016synDriving), crowd-counting (wang2019countCrowd), text recognition (jaderberg2014synTextReco; ye2018AnotherCaptcha), and natural language processing (marzoev2020unnaturalLangProcess; puri2020syntheticQA). Earlier works have also created dedicated synthetic datasets for some of these applications (gaidon2016virtualkitti; mayer2018viewSynthetic).

Comparison with rebuffi2021fixingaugmentation. A concurrent work by  rebuffi2021fixingaugmentation also uses samples from generative models to improve adversarial robustness. While it broadly focuses on the effect of different data augmentations, with synthetic samples from a proxy distribution being one of them, our goal is to delve deeper into the integration of proxy distribution in adversarial training. This includes providing tight bounds on the transfer of adversarial robustness from the proxy distribution followed by empirical analysis of the transfer of robustness, accuracy vs robustness trade-off, and sample complexity of adversarial training. We also demonstrate an improvement in certified robust accuracy using proxy distributions. However, despite the differences, similar benefits of using generative models in two independent works further ascertain the importance of this research direction.

5 Discussion and Future work

Using synthetic data has been a compelling solution in many applications, such as healthcare (jordon2018pateGan) and autonomous driving (mayer2016synDriving), since it makes collecting a large amount of data feasible. In a similar spirit, we use synthetic data to make deep neural networks more robust to adversarial attacks. However synthetic data is sampled from a proxy distribution, i.e., a distribution only approximating the underlying data distribution of the training data. Thus the first key question is whether synthetic data will help at all in improving robustness. We study the transfer of robustness from proxy to original training data distribution and provide a tight upper bound on it. This result validates the intuition that a proxy distribution, which closely approximates the training data distribution, should be able to improve robustness.

When selecting a generative model for the proxy distribution, we argue that an inflection point exists in their progress, post which generative models sufficiently capture the modes of data, thus generating both photorealistic and diverse set of samples. On the CIFAR-10 dataset, we find that both StyleGAN and DDPM models are past this inflection point as samples from both improve the performance. However on ImageNet 

(deng2009imagenet) dataset, we didn’t observe any performance improvement with state-of-the-art BigGAN-deep (brock2018bigGandeep) model. It suggests that we still remain far from the aforementioned inflection point on the ImageNet dataset.

On the CIFAR-10 dataset, we improve adversarial robust accuracy by up to % and certified robust accuracy by % using synthetic samples from the DDPM model. Since the main goal of the paper was to show the effectiveness of synthetic data, we didn’t use additional techniques to boost robustness. Future work can incorporate these techniques, such as using weight perturbations, weight averaging, longer training schedules, and larger networks, along with synthetic data to further boost robustness. While a major push in recent works is to use larger networks to improve robustness (gowal2020uncoveringadv), we show that similar gains can be obtained by expanding training data to include synthetic samples. Motivated by these findings, we encourage the community to further innovate on the training data distribution itself to improve adversarial robustness.

References

Appendix A Appendix

a.1 Proofs

Theorem 1.

Let and be two distributions supported on . Then for any classifier

Proof.

We first provide the sketch the steps of the proof informally and then formalize the steps after that. Consider to be the distribution that is the outcome of the following process: First sample from , then find the closest such that and output . By definition, the conditional Wasserestein distance between and is equal to . Now consider a similar distribution corresponding to , so that we have . By triangle inequality for Wasserestein distance we have,

(1)

Also, by the way the distributions and are defined we have

(2)

Roughly, the reason behind this is that all examples sampled from could be seen as an adversarial example for all elements of with the label . And we know that consists of optimal adversarial examples for , therefore, the optimal transport between and should be smaller than the optimal transport between and . Now combining Inequality 1 and 2 we have

(3)

With a similar argument, because of symmetry of and , we can also prove

(4)

Combining inequalities 3 and 4 finishes the proof. To formalize the proof steps mentioned above, let be the optimal transport between the conditional distributions and . We have

Corollary 2.

Let and be two labeled distributions supported on with identical label distributions and let be the weighted mixture of and . Then for any classifier

Proof.

We just need to show that Note that since the label distributions are equal, we have

. Now let be the optimal transport between and . Now construct a joint distribution . Notice that is a joint distribution with marginals equal to and . Therefor is a transport between and and we can calculate its cost. We have

Therefore, we have

Theorem 3 (Tightness of Theorem 1).

For any distribution supported on , any classifier , any homogeneous distance and any , there is a labeled distribution such that

Proof.

for let be the distribution of the following process: First sample from , then find the closest such that and output . By definition, the conditional Wasserestein distance between and is equal to . We also have .

Observe that for any classifier we have because if is an adversarial example for , then is also an adversarial example for with distance . On the other hand we have because any adversarial example for with distance is also an adversarial example for with distance at most and since is the optimal adversarial example for then must be at least . Therefore, we have Putting everything together and setting we have

(a) CIFAR-10
(b) StyleGAN
(c) DDPM
(d) Discarded images from DDPM model.
Figure 5: Visualizing images from different sets. Randomly selected images form the CIFAR-10 dataset and synthetic images from the StyleGAN (karras2020styleganAda) and DDPM (ho2020denoisingdiffusion) model. In figure (d) we show some of the discarded images from the images generated by the DDPM model. Rows in each figure correspond to following classes: Airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.