# Robust GANs against Dishonest Adversaries

Robustness of deep learning models is a property that has recently gained increasing attention. We formally define a notion of robustness for generative adversarial models, and show that, perhaps surprisingly, the GAN in its original form is not robust. Indeed, the discriminator in GANs may be viewed as merely offering "teaching feedback". Our notion of robustness relies on a dishonest discriminator, or noisy, adversarial interference with its feedback. We explore, theoretically and empirically, the effect of model and training properties on this robustness. In particular, we show theoretical conditions for robustness that are supported by empirical evidence. We also test the effect of regularization. Our results suggest variations of GANs that are indeed more robust to noisy attacks, and have overall more stable training behavior.

## Authors

• 12 publications
• 11 publications
• 51 publications
• ### Robust Generative Adversarial Network

Generative adversarial networks (GANs) are powerful generative models, b...
04/28/2020 ∙ by Shufei Zhang, et al. ∙ 0

• ### Dualing GANs

Generative adversarial nets (GANs) are a promising technique for modelin...
06/19/2017 ∙ by Yujia Li, et al. ∙ 0

• ### Robust Estimation and Generative Adversarial Nets

Robust estimation under Huber's ϵ-contamination model has become an impo...
10/04/2018 ∙ by Chao Gao, et al. ∙ 0

• ### Distributional Robustness with IPMs and links to Regularization and GANs

Robustness to adversarial attacks is an important concern due to the fra...
06/08/2020 ∙ by Hisham Husain, et al. ∙ 0

• ### Lipizzaner: A System That Scales Robust Generative Adversarial Network Training

GANs are difficult to train due to convergence pathologies such as mode ...
11/30/2018 ∙ by Tom Schmiedlechner, et al. ∙ 0

• ### Understanding and Stabilizing GANs' Training Dynamics with Control Theory

09/29/2019 ∙ by Kun Xu, et al. ∙ 0

• ### Learning Improvised Chatbots from Adversarial Modifications of Natural Language Feedback

The ubiquitous nature of chatbots and their interaction with users gener...
10/14/2020 ∙ by Makesh Narsimhan Sreedhar, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In recent years, the adversarial training of generative models (GANs) [12] has received much attention and found numerous applications, including realistic image generation, text to image synthesis, 3D object generation, and video prediction [29, 38, 37]. Despite their success, GANs have known training instabilities [11], and many recent works address this issue by either modifying the objective function, the network architecture or training dynamics [28, 25, 11, 30, 2, 15, 1, 18, 39, 22, 20, 8, 4].

In general, as we will also observe in this work, empirical instability may be closely related to notions of robustness. Robustness has emerged as an important (but often lacking) property of deep learning models, spurred by a somewhat different sense of “adversarial” – the identification of adversarial examples that fool in particular supervised, discriminative models [33, 13]. In settings like classification, robustness is often defined as the model being stable to small perturbations of the data points. “Adversarial training” in this sense leads to a minimax problem that is known to induce regularization and thereby generalization [5, 14, 10, 31, 23], and has recently gained attention.

Despite the terminology generative adversarial networks, the discriminator in GAN models may be viewed as taking on a cooperative, “teaching” role, and sharing useful feedback with the generator part: “the discriminator is more like a teacher instructing the generator in how to improve than an adversary. So far, this cooperative view has not led to any particular change in the development of the mathematics” [11]. Truly non-cooperative adversaries that lead to meaningful notions of robustness require new frameworks, and, to the best of our knowledge, have not yet been studied in the context of GANs.

In this paper, we start closing this gap. We introduce a new notion of adversarial robustness pertinent to GANs, and study model and training properties that affect this robustness. In particular, as opposed to discriminative models, in GANs, the generator receives information about the training data indirectly via the feedback signal of the discriminator. Hence, as opposed to perturbations of the data itself, we study perturbations of the discriminator signal. This may be viewed as a dishonest discriminator (instead of a mere “teacher”), or as an adversary interfering with the channel between discriminator and generator, as illustrated in Figure 1. This new viewpoint leads to questions we study in this paper, such as: Does this perspective lead to a consistent measure of robustness that we can study systematically? What properties of the GAN affect this robustness?.

In short, we make the following contributions: (1) We formulate a new notion of robustness for GANs that arises from training with dishonest discriminators. (2) We show that, perhaps surprisingly, the original GAN is not robust in this sense, even with small perturbations. (3) We establish general conditions on the model (objective function) that induce robustness, both theoretically and empirically. (4) We empirically explore the “robustifying” effect of regularizing modifications during training at the example of clipping weights. Overall, our experiments confirm our theoretical results. The criteria developed in this work may serve as general guidelines, and open avenues for further research.

##### Further related work

Most existing work aims at improving among three directions: formulations (objective functions), network architectures, and training dynamics. For example, one could derive other objective functions based on different divergence measures between probability distributions

[25, 2]; include multiple discriminators or generators in the hope that multiple sources could lead to more stable behavior [8, 34, 24]; or try to devise algorithms that better regularize the training process [30, 15, 36, 20]

. Our work mostly studies the first direction, but takes a different perspective. In addition, robustness in general has recently gained much attention in machine learning, notably due to the presence of adversarial examples

[33, 13]. There is a growing body of work in understanding attack and defense mechanisms [9, 26, 6, 7, 17, 35]. Several works [13, 7, 21] that perturb the data during training have an underlying connection to robust optimization [3]. Distributionally robust optimization that perturbs the data-generating distribution has also been applied in adversarial training and obtained promising results [32].

## 2 Failure of GANs for a Simple Adversary

We begin with an illustrative example of our robust GAN framework: the perhaps surprising observation that the standard GAN can fail even with a rare perturbation of the discriminator signal. Recall the standard GAN’s objective function:

 minGmaxDV(G,D)=minGmaxD{Ex∼Pdata[logD(x)]+Ez∼Pz[log(1−D(G(z)))]}, (2.1)

where is a discriminator that maps a sample to the probability that it comes from the true data distribution , and

is the generator that maps a noise vector

, drawn from a simple distribution , to the data space. This in turn defines an implicit distribution for ’s generated data.

It can be shown that, when fixing , the optimal discriminator is given by  [12]. Substituting into , the generator then essentially seeks to minimize

 V(D∗G,G)=−log(4)+2×JSD(Pdata% ∥PG),

and the optimal generator would give .

Existing work implicitly assumes that during training, the discriminator is honest, i.e., it always gives “true feedback” to the generator about how likely it deems the generated sample to come from . In an adversarial setting, this assumption may no longer hold true, e.g., there could be channel contamination, adversarial interventions, or constraints such as privacy that prevent the discriminator from releasing precise feedback. Formally, we treat such dishonest feedbacks as applying a transformation to the discriminator’s outputs so that the generator receives . In other words, one may view as an adversary that encodes how a contaminated channel or privacy constraint affect what the generator actually receives. Not knowing about the existence of such an adversary, the generator regards as honest feedback.

Ideally, we desire a robust GAN model: if does not alter the original outputs too much, the model should still be able to learn the data distribution. Is this true for the GAN? Let us consider a simple flipping adversary defined as follows:

 Φ(D(x))={1−D(x)with % probability pD(x)otherwise (2.2)

That is, with error probability , the feedback is flipped to be . Note that we assume the signal from to to be always correct, i.e., always receives the original real and generated data. As such, the optimal discriminator is still . With the flipping adversary, the minimization problem for then becomes

 minGp{Ex∼Pdata[log(1−D∗G(x))]+Ex∼PG[logD∗G(x)]}+(1−p){Ex∼P% data[logD∗G(x)]+Ex∼PG[log(1−D∗G(x))]}. (2.3)
###### Lemma 1.

Given the optimal discriminator , the minimization of the objective (2.3) becomes

 minG 2×JSD(Pdata||PG)−log(4) (2.4) −p{KL(Pdata||PG)+KL(PG||Pdata)}.

Furthermore, for every , the optimal can be arbitrarily far from in terms of KL-divergence.

To build an intuitive understanding, note that for any , if , then the objective function in (2.4) becomes 0. However, because of the presence of the term , the objective function can be made much smaller. In fact, it can be . To see this, note that the Jensen-Shannon divergence is bounded, but the KL-divergence is not. Any that has a disjoint support from can make the above term to . As a concrete example, a learned distribution that only concentrates on a particular mode of with no coverage on the other modes is optimal, achieving for the objective function in (2.4). Such a behavior is highly undesirable for the generator.

Essentially, Lemma 1 establishes that even for very small perturbations, the GAN is not robust: even if the discriminator is almost always honest, it fails to extract sufficient information from the data. This observation raises the question whether it is possible to construct a robust GAN, and, if so, what types of dishonest adversaries other than the flipping adversary it can defend against. Next, we formally define families of adversarial attacks, and corresponding conditions for robustness.

## 3 GAN with Dishonest Adversaries

Motivated by the simple flipping adversary, we generalize the notion of an adversary to obtain a more powerful framework and notion of robustness.

We formalize dishonest discriminator feedbacks as post-processing the original outputs by an adversary. Since is typically viewed as the probability of the data coming from the true distribution, the transformed feedback should still lie in the range . More explicitly, we refer to a differentiable transformation function as dishonest function. The flipping adversary (2.2) then consists of two dishonest functions: and . We define an adversary to combine possibly several such transformations into a more complex attack:

Let be a set of dishonest functions. An adversary with respect to is a probability distribution over finitely many dishonest functions, . Denote by the probability the adversary assigns to the th dishonest function . Given input , the adversary outputs with probability .

Definition 2 generalizes the flipping adversary defined in (2.2) to more powerful and flexible attacks. In general, we do not expect to be able to construct GANs that are robust against all possible adversaries – imagine the adversary always replaces the signal with random noise. Instead, we will assume that most of the time the feedback is honest.

###### Definition 3 (Mostly Honest Adversary).

An adversary is mostly honest if the probability it assigns to the function is larger than 0.5.

We will refer to a GAN as robust if it learns the data distribution with a mostly honest adversary. Intuitively, it seems reasonable that in principle, a mostly honest adversary should retain sufficient signal to learn, if the learning is not too sensitive to perturbations. Yet, with this definition of robustness, the standard GAN is still not robust.

### 3.2 GAN Formulation with Adversaries

Before adding an adversary, we revisit the GAN objective in Eq. (2.1). The function in the objective was suggested because of its nice information-theoretic interpretation. Recent variants such as the Wasserstein GAN [2] replace the with other functions. In a unified framework, one could think of the GAN objectives as:

 maxDEx∼Pdata[fD(D(x))]+Ez∼Pz[fD(1−D(G(z)))], (3.1)
 minGEx∼Pdata[fG(D(x))]+Ez∼Pz[fG(1−D(G(z)))]. (3.2)

We obtain the standard GAN with .

In presence of an adversary , the generator receives transformed feedback instead of , as shown in Figure 1. Without knowing the existence of such an adversary, the generator treats as if it is . The generator’s objective then becomes

 minG Ex∼Pdata[fG(Φ(D(x)))]+Ez∼Pz[fG(1−Φ(D(G(z))))] (3.3) ≡minG L∑i=1pi(Ex∼Pdata% [fG(ψi(D(x)))]+Ez∼Pz[fG(1−ψi(D(G(z))))]). (3.4)

In summary, with an an adversary , the discriminator’s objective (3.1) remains unchanged, because the adversary does not affect what the discriminator receives. In contrast, the generator’s objective now becomes Eq. (3.4).

## 4 Robustness against Dishonest Adversaries

Next, we study conditions on and that imply robustness. When designing these functions, we need to keep three aspects in mind: (1) The objective (3.1) of the discriminator aims to maximize the probability that the discriminator can distinguish true data from fake data; (2) The objective (3.4) of the generator aims to minimize the probability that the discriminator recognizes the generated data as fake; (3) when there is no adversary or the adversary is mostly honest, the optimal generator should be able to learn the true data distribution, i.e., .

The first two criteria are easily met by choosing and to be monotonically increasing. For robustness, we already saw that the function is not suitable. To construct robust models, we will need the class of functions:

 H≜{f(θ):f(θ) is strictly increasing and differentiable in [0,1], and f(θ)=−f(1−θ),∀θ∈[0,1]}.

The following lemma characterizes the optimal discriminator when .

###### Lemma 4.

Suppose that , then for a fixed , the optimal that maximizes the objective (3.1) is

 (4.1)

We will construct two GAN frameworks that are robust under mostly honest adversaries. The first framework retains the function for the discriminator and chooses a function from for the generator:

##### Framework 1:

and .

Theorem 5 establishes the robustness of Framework 1 under mild conditions on the dishonest functions :

###### Theorem 5.

Suppose that and . Let be the set of dishonest functions that satisfy either one of the following:

1. is non-decreasing in and ;

2. is non-increasing in , , and

 {ψ(θ)+θ≥1,for θ∈(12,1],ψ(θ)+θ≤1,for θ∈[0,12).

Then, for any mostly honest adversary with respect to , given the optimal discriminator , the optimal generator satisfies .

Unlike Framework 1, the second framework we present uses functions from for both the discriminator and the generator. It turns out that such a choice leads to a stronger robustness guarantee against mostly honest adversaries, without conditions on the dishonest functions.

##### Framework 2:

and .

###### Theorem 6.

Suppose that and . Let be the set of all possible dishonest functions. Then, for any mostly honest adversary with respect to , given the optimal discriminator , the optimal generator satisfies .

Theorems 5 and 6 show that the flipping adversary is just a special case that our GAN frameworks can defend. In fact, the theorems provides a stronger robustness guarantee that holds across a variety of mostly honest adversaries.

Proof of Theorems 5 and 6 (Sketch): Since the adversary is mostly honest, the probability it assigns to the function is larger than 0.5. Without loss of generality, denote by the previous function, i.e., . Both frameworks use a function from the class for the generator. By the properties of , one can show, by rearranging the terms, that the generator’s objective (3.4) can be rewritten as:

 minGV1+V2,

where

 V1≜ (p1−L∑i=2pi)(Ex∼Pdata[fG(D(x))]−Ex∼PG[fG(D(x))]), V2≜

It is immediate that if , then . Now, if we can show that is greater than for any , the two theorems will be established. This amounts to show that for both frameworks, the following two claims hold:

1. If , then .

2. If , then .

The two claims can be proved by considering the different optimal discriminator for each framework. Note that the fact that the adversary is mostly honest guarantees that the term in is positive. Hence, to establish the first claim, we only need to show that the second term in is positive if . For the second claim, the terms in involve different dishonest functions . This is why Theorem 5 requires some mild conditions on . Under Framework 1, one can show that the second claim holds if those conditions are satisfied. However, for Framework 2, the second claim can be proved without additional conditions. See Appendix A for the details.

###### Remark 7.

The second framework is significantly stronger than the first. In particular, Framework 2 requires no additional conditions on the dishonest functions. As long as the adversary is mostly honest, robustness is guaranteed. Hence, for robust GANs, it is desirable to use functions in instead of the logarithm.

###### Remark 8.

With , the resulting model not only belongs to Framework 2, it also corresponds to the well-known Wasserstein GAN. This observation may give further support for its empirical performance, besides the interpretation of using Wasserstein distance.

### 4.1 Regularization and other Factors

Apart from the objective function, factors such as the training algorithm and data influence the outcome of learning. For example, clipping large weights or, in general, regularizing the Lipschitz constant of the discriminator [27, 36, 15, 22], appear to stabilize the overall training process. In particular, any modifications that result in more averaging and slower adoption of information from single training data points would be expected to make the GAN more robust. As a representative for such regularizing mechanisms, in our experiments, we also test the effect of clipping large weights during training, and its interplay with the objective function. (Indeed, clipping at smaller values leads to faster learning, see Figure 13 in the appendix.)

## 5 Empirical Results

To probe our theoretical results in practice, we empirically evaluate the robustness of the models in Section 4. In addition, we explore how one representative of regularization, clipping large weights, affects robustness. Following the convention in most of the GAN literature, we use a zero-sum game formulation, i.e., (Framework 2). Figure 3 displays the functions in that we investigate. These functions are chosen to have different gradients in different locations, e.g., constant, relatively smaller, or larger gradients around the midpoint. We will refer to the GANs with as robust GANs, and to the standard GAN with simply as GAN. Details and additional plots for all experiments may be found in the appendix.

### 5.1 Synthetic Data: Mixture of Gaussians

We begin with the common illustrative toy problem of a mixture of eight two-dimensional Gaussians evenly spaced on a circle. Both and are fully connected networks. We alternatively train and , and clip the weights of with a maximum absolute value of . Here, we apply the simple flipping adversary (2.2) with different error probabilities . Figure 3 shows some typical results for one of the robust models and GAN, for and . Indeed, as opposed to the GAN, the robust GAN reliably learns all modes, even with a fairly high . The figure illustrates an additional effect: with higher noise , learning indeed becomes more challenging, and the generator needs more iterations to learn. Figure 13 in the appendix confirms that this is generally the case.

##### Clipping and its interaction with the model.

For any adversarial network to reasonably learn a distribution, both the model (i.e., objective functions) and the learning algorithm are crucial. Hence, we next include effects of clipping large weights as one example. Figure 4 shows the success rate for different clipping thresholds, averaged over 10 runs. Here, a success is defined as correctly learning all the 8 modes (average number of modes are shown in the appendix). To better visualize the effect of clipping,

is intentionally made more powerful by having significantly more hidden neurons (4x more hidden neurons for each layer).

Figure 4 offers several observations:
(1) With large clipping thresholds, the discriminator is too powerful and this generally impairs GAN learning [2, 22]. Very small thresholds (e.g. ) hinder learning by limiting the capacity of too much.
(2) If a robust GAN can learn the distribution without noise (), then it also learns the distribution with an adversary, confirming its robustness.
(3) In general, the robust GANs work across a wider range of clipping thresholds, i.e., they are also less sensitive to parameter choice of clipping.
(4) In some cases, clipping increases the robustness of the standard GAN, in particular for the threshold 0.05, and helps it perform closer to the robust ones (albeit still a bit worse).

Observation (4) is curious and warrants further investigation. How does it relate to Lemma 1? Indeed, instead of a contradiction, it points out further interesting theoretical questions. Recall that in the theory, we assume that the discriminator is optimal. In reality, we are faced with a non-convex learning problem, and clipping reduces the capacity of . To investigate this point, Figure 5 shows the probabilities assigns to the true and generated data (i.e., ’s output before applying any dishonest adversary) when the clipping threshold is either or .

Initially, we expect the generated data to be easily distinguishable from the true data, and to assign high probabilities to the true data (blue) and low probabilities to the generated data (red). As the generator learns, ’s outputs for true and fake data should become increasingly closer and merge. Recall that if , the actual output value of the optimal discriminator in Lemma 4 can be any value in . Indeed, we observe this behavior for the robust model for both clipping thresholds, with and without the attack ( and ).

The standard GAN follows this pattern only for the harsher clipping at , but not for the more powerful discriminator at . When clipping at , the attack seems to affect the GAN too: while it never achieves to fool completely (have the red and blue curves merge), it partially fools without the attack, but makes significantly less progress with adversarial noise.

For clipping at , the discriminator’s outputs are generally closer to 0.5. The reason is that we have restricted

’s weights to be so small that the unnormalized logits concentrate around

(i.e., 0.5 after passing through the sigmoid layer). With such limitations in capacity, is most likely not near-optimal, and hence this case is outside our current theoretical analysis. It is an interesting open question why this restricted enables better robustness and learning. A conjecture may be that, with output values close to 0.5, the perturbations too are less severe.

In summary, our robust frameworks and regularizing techniques such as clipping both improve the robustness of GANs, but by different mechanisms. These observations support our theoretical analysis in Sections 2 and 4.

### 5.2 Mnist

Next, we perform a similar analysis with the MNIST data, using a CNN for both and . Without clipping, both the GAN and the robust GANs learn the distribution well, and generate all digits with the same probability. Here, we call a learning experiment a success if

learns to generate all digits with the same probability (see the appendix for a plot of the total variation distance between the distribution of the learned digits and the uniform distribution). To further explore our theoretical frameworks, we apply a more sophisticated adversary as follows:

 Φ(D(x))=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩1−D(x)with % probability 0.1√D(x)with probability 0.1(D(x))2with probability 0.1D(x)otherwise

By Theorem 6, all the robust models we explore here should be robust against such a complex adversary.

Figure 6 visualizes the output of the GAN and the “piecewise linear” robust GAN with and without an attack. Clearly, the GAN is heavily affected by the attack – it appears to learn a point mass that maximizes the KL-divergence between and , well in line with the theoretical analysis. The robust GAN, as expected, still performs well.

##### Clipping.

As for the Gaussians, Figure 7 shows the success rate over 10 independent runs for various clipping thresholds, with and without attack. As above, the robust models succeed over a wide range of clipping thresholds, both with and without attack. The GAN is sensitive to attacks and fails completely for thresholds above 0.1 and below 0.01. In a limited range, clipping indeed makes the GAN robust.

Figure 8 shows the discriminator’s outputs for the true and generated images, without clipping and with clipping at . The figure indicates a similar phenomenon as for the Gaussians.

Without clipping and without attack, both the GAN and robust GAN learn, bringing the red and blue curves closer together and generating desired digits as shown in Figure 6. This remains true for the robust model with noise attack, but not for the GAN, which fails completely, enabling perfect discrimination by . Without clipping is probably close to optimal, and Lemma 1 applies. A small enough clipping threshold significantly limits ’s capacity, the curves remain close to 0.5 for both models, and both models are able to learn. These observations further support the robustifying effect of the “robust” objective functions, and, via a different mechanism, of clipping.

### 5.3 Cifar

Finally, we perform our empirical analysis on the CIFAR10 data. Here, our adversary is

 Φ(D(x))=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩1−D(x)with % probability 0.1√D(x)with probability 0.1(D(x))2with probability 0.2D(x)otherwise,

For this set of experiments, recall our assumptions in Section 4: (1) Access to an optimal discriminator ; and (2) with access to an optimal , the model is able to learn the corresponding optimal generator, leading e.g. to successful lerning without an adversary. While these assumptions are reasonable for relatively easy tasks (and, above, lead to the expected results), on more complex tasks factors such as the optimization algorithm and data properties may play a role too.

Figure 9 displays example images generated by the standard GAN and one of the robust models. Surprisingly, from a human viewpoint, it seems that with or without the adversary, both models learn to generate images that resemble CIFAR10 data. However, the plots of ’s outputs in Figure 10 suggest a different story that also differs from the previous experiments. Even without the adversary, tends to output extreme values: the CIFAR10 data are assigned probabilities close to 1 while the generated data are assigned negligible probabilities, indicating perfect distinction. Although the plots seem to suggest there is some learning in the beginning, starts to produce extreme values before a similar distribution has been learned by . The extreme behavior of contradicts any strong evidence that the models approximately learn the data distribution. One conjecture to reconcile the strong discriminator with the visual results is that focuses on low-level perturbations off the image manifold that are less perceptible by the human eye [19, 16].

Good performance on high-dimensional images often requires some stabilizing modifications of the training algorithm [15, 22, 20]. We conjecture that those modifications, just like clipping, improve the robustness of the models. These methods also probably change the space of functions parameterized by the network. Incorporating the effect of these modifications, and the data properties, into a full theoretical analysis is an interesting avenue of future work.

## 6 Conclusion

Since the advent of GANs, much effort has been devoted to improving the original formulation. In this work, we take a novel perspective, and probe the robustness of GANs to internal noise, or attacks of dishonest feedbacks from the discriminator. This leads to a formal notion of robustness, and opens avenues for theoretical analysis of how model parameters affect this robustness. In particular, we show in theory and practice how certain conditions on the objective function induce robustness, and also probe the effect of regularization in the form of clipping. Indeed, if the assumptions are approximately satisfied, the empirical results closely follow our theoretical analysis.

As a first step, we provide a fairly general class of robust models. Our study opens several interesting questions for future research, such as integrating effects of regularization (and the induced function class), the optimization procedure, model and data, as well as a graded analysis of how robustness decays with the strength (error probability) or other properties of an adversary. In conclusion, this work initiates but only scratches the surface of robustness in the field of GANs, opening up many fruitful research avenues in this direction.

## References

• Arjovsky and Bottou [2017] M. Arjovsky and L. Bottou. Towards principled methods for training generative adversarial networks. In International Conference on Learning Representations (ICLR), 2017.
• Arjovsky et al. [2017] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. Int. Conference on Machine Learning (ICML), 2017.
• Ben-Tal et al. [2009] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust optimization. Princeton University Press, 2009.
• Bora et al. [2018] A. Bora, E. Price, and A. G. Dimakis. Ambientgan: Generative models from lossy measurements. International Conference on Learning Representations (ICLR), 2018. accepted.
• Caramanis et al. [2011] C. Caramanis, S. Mannor, and H. Xu. Optimization for Machine Learning, chapter Robust Optimization in Machine Learning. MIT Press, 2011.
• Carlini and Wagner [2017a] N. Carlini and D. Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

, 2017a.
• Carlini and Wagner [2017b] N. Carlini and D. Wagner.

Towards evaluating the robustness of neural networks.

In Security and Privacy (SP), 2017 IEEE Symposium on, 2017b.
• Durugkar et al. [2017] I. Durugkar, I. Gemp, and S. Mahadevan. Generative multi-adversarial networks. International Conference on Learning Representations (ICLR), 2017.
• Fawzi et al. [2015] A. Fawzi, O. Fawzi, and P. Frossard.

Analysis of classifiers’ robustness to aversarial perturbations.

Machine Learning, 2015.
• Gao et al. [2017] R. Gao, X. Chen, and A. J. Kleywegt. Wasserstein distributional robustness and regularization in statistical learning. ArXiv e-prints, 2017.
• Goodfellow [2016] I. Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016.
• Goodfellow et al. [2014] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), pages 2672–2680, 2014.
• Goodfellow et al. [2015] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. International Conference on Learning Representations (ICLR), 2015.
• Gotoh et al. [2015] J. Gotoh, M. Kim, and A. Lim.

Robust empirical optimization is almost the same as mean-variance optimization, 2015.

• Gulrajani et al. [2017] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems (NIPS), 2017.
• Guo et al. [2018] C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Countering adversarial images using input transformations. In International Conference on Learning Representations (ICLR), 2018.
• He et al. [2017] W. He, J. Wei, X. Chen, N. Carlini, and D. Song. Adversarial example defenses: Ensembles of weak defenses are not strong. arXiv preprint arXiv:1706.04701, 2017.
• Huang et al. [2017] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie. Stacked generative adversarial networks.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

, 2017.
• Jo and Bengio [2017] J. Jo and Y. Bengio. Measuring the tendency of CNNs to learn surface statistical regularities. CoRR, abs/1711.11561, 2017.
• Karras et al. [2018] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and variation. International Conference on Learning Representations (ICLR), 2018. accepted.
• Madry et al. [2018] A. Madry, A. Makelov, L. Schmidt, and A. V. Dimitris Tsipras. Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations (ICLR), 2018.
• Miyato et al. [2018] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. International Conference on Learning Representations (ICLR), 2018. accepted.
• Namkoong and Duchi [2017] H. Namkoong and J. C. Duchi. Variance-based regularization with convex objectives. In Advances in Neural Information Processing Systems (NIPS), 2017.
• Nguyen et al. [2017] T. Nguyen, T. Le, H. Vu, and D. Phung. Dual discriminator generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), 2017.
• Nowozin et al. [2016] S. Nowozin, B. Cseke, and R. Tomioka. f-gan: Training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems (NIPS), 2016.
• Papernot et al. [2016] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, 2016.
• Qi [2017] G.-J. Qi. Loss-sensitive generative adversarial networks on lipschitz densities. arXiv preprint arXiv:1701.06264, 2017.
• Radford et al. [2015] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
• Reed et al. [2016] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. In Int. Conference on Machine Learning (ICML), 2016.
• Salimans et al. [2016] T. Salimans, I. Good fellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. Advances in Neural Information Processing Systems (NIPS), 2016.

Distributionally robust logistic regression.

In Advances in Neural Information Processing Systems (NIPS), 2015.
• Sinha et al. [2018] A. Sinha, H. Namkoong, and J. Duchi. Certifiable distributional robustness with principled adversarial training. International Conference on Learning Representations (ICLR), 2018.
• Szegedy et al. [2013] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
• Tolstikhin et al. [2017] I. Tolstikhin, S. Gelly, O. Bousquet, C.-J. Simon-Gabriel, and B. Schölkopf. Adagan: Boosting generative models. Advances in Neural Information Processing Systems (NIPS), 2017.
• Tramer et al. [2018] F. Tramer, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel. Ensemble adversarial training: Attacks and defenses. International Conference on Learning Representations (ICLR), 2018.
• Uehara et al. [2016] M. Uehara, I. Sato, M. Suzuki, K. Nakayama, and Y. Matsuo.

Generative adversarial nets from a density ratio estimation perspective.

NIPS Workshop on Adversarial Training, 2016.
• Vondrick et al. [2016] C. Vondrick, H. Pirsiavash, and A. Torralba. Generating videos with scene dynamics. In Advances in Neural Information Processing Systems (NIPS), 2016.
• Wu et al. [2016] J. Wu, C. Zhang, T. Xue, W. T. Freeman, and J. B. Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems (NIPS), 2016.
• Zhao et al. [2017] J. Zhao, M. Mathieu, and Y. LeCun. Energy-based generative adversarial network. In International Conference on Learning Representations (ICLR), 2017.

## Appendix A Proofs

This section provides all the missing proofs in the main paper. For convenience, we also repeat the theorems here.

### a.1 Proof of Lemma 1

Lemma 1. Given the optimal discriminator , the minimization of the objective (2.3) becomes

 minG

Furthermore, for every , the optimal can be arbitrarily far from in terms of KL-divergence.

###### Proof.

Given the optimal discriminator , the generator’s objective (i.e., Eq. (2.3)) becomes

 (A.1)

From the last equality, note that the Jensen-Shannon divergence is bounded, and hence the minimum value is whenever the error probability , which can be achieved, for example, by any that concentrates on a particular point. In contrast, if , then the objective achieves value . Therefore, the discriminator can learn a distribution that is significantly different from . ∎

### a.2 Proof of Lemma 4

Lemma 4. Suppose that , then for a fixed , the optimal that maximize Eq. (3.1) is

 D∗G(x)=⎧⎪ ⎪⎨⎪ ⎪⎩1,if Pdata(x)>Pg(x),0,if Pdata(x)
###### Proof.

Since , we have

 Ex∼Pdata[fD(D(x))]+Ez∼PZ[fD(1−D(G(z)))]=∫fD(D(x))Pdata(x)+fD(1−D(x))PG(x)dx=∫fD(D(x))(Pdata(x)−PG(x))dx (A.2)

Note that is strictly increasing in and . Therefore, when , is maximized at ; when , is maximized at . This shows that the integration (A.2) is maximized by the discriminator given in Lemma 4. ∎

### a.3 Proof of Theorem 5

Theorem 5. Suppose that and . Let be the set of dishonest functions that satisfy either one of the following:

1. is non-decreasing in and ;

2. is non-increasing in , , and

 {ψ(θ)+θ≥1,for θ∈(12,1],ψ(θ)+θ≤1,for θ∈[0,12).

Then, for any mostly honest adversary with respect to , given the optimal discriminator, the optimal generator satisfies .

###### Proof.

Fix a mostly honest attack with respect to the set defined in Theorem 5. By definition, a mostly honest adversary assigns more than 0.5 probability on the function . Without loss of generality, denote by the previous function, i.e., . Then, with our notation in Definition 2, .

Since , we can rewrite the generator’s objective function (i.e., Eq.(3.4)) as follows:

 (A.3)

When , it is obvious that . In what follows, we prove the following two facts:

1. If , then .

2. If , then .

Combining the two facts, it is clear that in order to minimize , the optimal generator must satisfies , and this completes the proof of Theorem 5.

Proof of Fact 1: Note that since , the optimal discriminator for a fixed generator is given by [12]. With this optimal discriminator, we then have

 ^V1≜Ex∼Pdata[fG(D(x))]−Ex∼PG[fG(D(x))]=∫{fG(D(x))Pdata(x)−fG(D(x))PG(x)}dx=∫fG(Pdata(x)Pdata(x)+PG(x))(Pdata(x)−PG(x))dx.

To show that implies , we note that implies that

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩f(θ)>0,for θ∈(12,1],f(12)=0,f(θ)<0,for θ∈[0,12).

Therefore, for any such that , we have

 f(Pdata(x)Pdata(x)+PG(x))(Pdata(x)−PG(x))>0.

This means that if . By assumption, and hence . This completes the proof. Therefore, if .

Proof of Fact 2: Note that

For any such that , the optimal discriminator . We then claim that . To see why this must hold, consider first the case where satisfies the first condition in Theorem 5. Then,

 fG(ψi(D(x)))+fG(D(x))>f(12)+f(12)=0,

where the first inequality holds because is strictly increasing and is non-decreasing. For the case where satisfies the second condition in Theorem 5, since , we then have . Therefore,

 fG(ψi(D(x)))+fG(D(x))≥fG(1−D(x))+fG(D(x))=−fG(D(x))+fG(D(x))=0,

where we have used the property that .

Similarly, for any such that , the optimal discriminator and we claim that . ff Consider first the case where satisfies the first condition in Theorem 5. Then,

 fG(ψi(D(x)))+fG(D(x))

For the case where satisfies the second condition in Theorem 5, since , we then have and hence

 fG(ψi(D(x)))+fG(D(x))≤fG(1−D(x))+fG(D(x))=0.

In conclusion, for any such that . Therefore, and this completes the proof of Fact 2. ∎

### a.4 Proof of Theorem 6

Theorem 6. Suppose that and . Let be the set of all possible dishonest functions. Then, for any mostly honest adversary with respect to , given the optimal discriminator, the optimal generator satisfies .

###### Proof.

The proof is quite similar to the proof of Theorem 5. Since , we can again rewrite the generator’s objective function (i.e., Eq.(3.4)) as (i.e., Eq.(A.3). Obviously, when , it is obvious that . Note that since , the optimal discriminator for a fixed generator is now given by Lemma 4. In the sequel, we follow the proof of Theorem 5 to show the two facts below when given the optimal discriminator:

1. If , then .

2. If , then .

The desired result in Theorem 6 then immediately follows.

Proof of Fact 1: Let

 ^V1≜Ex∼Pdata[fG(D(x))]−Ex∼PG[fG(D(x))]=∫fG(D(x))(Pdata(x)−PG(x))dx.

Substitute the optimal discriminator in Lemma 4 into , it can be readily observed that whenever . Specifically, for any such that , we have ; for any such that , we have . Hence, for any such that . This implies, together with the assumption that , that if .

Proof of Fact 2: We now shift gears to . Note that

For any such that , the optimal discriminator in Lemma 4 gives . Hence,

 fG(ψi(D(x)))+fG(D(x))=fG(ψi(1))+fG(1)=fG(ψi(1))−fG(0)≥0, (A.4)

where the second equality follows from the property that and the last inequality holds because for every and is strictly increasing.

Similarly, for any such that , the optimal discriminator gives and we then have

 f(ψi(D(x)))+f(D(x))=f(ψi(0))+f(0)=f(ψi(0))−f(1)≤0. (A.5)

In summary, we have for any such that . Therefore, and this completes the proof of Fact 2. ∎

## Appendix B Experimental Details

In this section, we show the details of our experiments as well as figures omitted from the main text. While the theory assumes the optimal discriminator, practical training relies on gradient-based algorithms. In our experiments, with the presence of an adversary, each step of training the generator consists of a forward pass, where the generated images pass through the discriminator and then the adversary to produce the signal

, and a backward pass, where the gradients the generator received are computed by backpropagating through the adversary

first and then the discriminator . The presence of an adversary affects the signals as well as the training gradients the generator received. From the viewpoint of the generator, the discriminator and the adversary as a whole can be viewed as a “dishonest discriminator” that, upon receiving the genrated images, produces a noisy signal and the corresponding gradients for the generator. For all the experiments, we train the models by alternating between updating the generator and the discriminator. If the experiment involves clipping, then a full update step consists of first clipping the weights of the discriminator, and then updating the discriminator and the generator once.

### b.1 Mixture of Gaussians

The synthetic data is generated from a mixture of 8 two-dimensional Gaussians with equal variance but different means evenly spaced on a circle. We fix the network architectures and hyper-parameters throughout the experiments. While it is possible to boost the individual performance by adapting the hyper parameters to different models and error probabilities, our focus in this section is to establish a fair comparison among different models and dishonest adversaries.

The generator consists of a fully connected network with 3 hidden layers, each of size 64 with ReLU activations. The output layer contains two neurons that linearly project the input to 2 dimensions. The discriminator consists of a fully connected network with 3 hidden layers, each of size 256 with ReLU activations, followed by a sigmoid output layer. The latent vectors are sampled from a 256-dimensional multivariate Gaussian distribution with 0 mean and identity covariance matrix. For training algorithms, we use Adam with a learning rate of

and

for the generator and RMSprop with a learning rate of

for the discriminator. The size of each minibatch is fixed to 512. Finally, all the models are trained for 50k steps, 100k steps, and 180k steps when the error probabilities are 0, 0.2, and 0.4, respectively.

Here, we collect all the results for the robust models that are omitted in Figure 3. Those models consistently learn the mixture distribution with or without an adversary.

#### b.1.2 Averaged Number of Learned Modes

Figure 12 supplements the results presented in Figure 4. Recall that for each model and each parameter setting, we run 10 experiments. Figure 12 shows the averaged number of modes learned by each model. The results are consistent with what we presented in the main text (cf., Figure 4 and the corresponding discussion), namely, the robust models tend to perform better under various settings.

#### b.1.3 Averaged Number of Steps for a Successful Learning

For each experiment, if the model successfully learns all the 8 modes, we count the number of steps needed and report the average steps in Figure 13. This confirms our intuition: the larger the error probability is, the more steps the robust models will need to average out the noise and extract the right signal to help the overall learning.

### b.2 Mnist

We fix the network architectures and hyper-parameters throughout the experiments. The network is adapted from a publicly available CNN model

. In particular, we remove the Batch Normalization layers in the generator. The reason for this is to minimize the effect of architectures on robustness so that we can control as many factors as possible and fairly evaluate how the model

itself affects the overall robustness. On the other hand, BN is kept for the discriminator. Our theory relies on ideal assumptions of an optimal discriminator. Hence, to verify the theory, it would be beneficial to have a nice discriminator that can discriminate the true and generated data, and provide useful signals.

We alternate between updating the generator and the discriminator with a minibatch of size 100. The latent vectors are sampled from the uniform distribution on . The generator is trained using Adam with a learning rate of and , while the discriminator is trained by RMSprop with a learning rate of

. Each model is trained for 50 epochs.

We show results for those robust models that are not presented in Figure 6. Again, the robust models is able to defend the adversary and learn to generate the desired digits with no apparent mode collapse.

#### b.2.2 TV Distance to the Uniform Distribution

Figure 15 supplements the plots of success rates in Figure 7. For each experiment, when the model learns to generate digits, we use an auxiliary classifier to classify the generated data and compute the total variation distance between the learned distribution to the uniform distribution over the 10 digits. Recall that for each model and each parameter setting, we independently run the experiments for 10 times. Figure 15 shows the resulting total variation distance, averaged over successful runs. Note that the distance is uniformly small, implying that for the successful runs, there is no mode collapse. Consequently, this justifies that we can focus on the plot of success rate in Figure 7 and draw conclusions correspondingly.

### b.3 Cifar10

We adapt a publicly available network architecture that was used in [15]. Following the suggestions mentioned in the code, we halve the number of feature maps for each convolution layer. For the same reason discussed in B.2, we remove the BN layers in the generator but keep them in the discriminator. We alternate the updates of the generator and the discriminator. As the data become more complicated, it is difficult to have a universal setting for all the models. Consequently, we use the default parameters for the vanilla GAN: for both the generator and the discriminator, Adam with a learning rate of and is used. For the robust models, we do a grid search on the following parameters: (1) optimizer: {Adam, RMSprop}; (2) learning rate: . As a result, for all the robust models we explored, we train the generator by using RMSprop with a learning rate of and the discriminator by employing Adam with a learning rate of . Each model is trained for 120k steps with minibatchs of size 64.