Log In Sign Up

Consistency Regularization for Generative Adversarial Networks

by   Han Zhang, et al.

Generative Adversarial Networks (GANs) are known to be difficult to train, despite considerable research effort. Several regularization techniques for stabilizing training have been proposed, but they introduce non-trivial computational overheads and interact poorly with existing techniques like spectral normalization. In this work, we propose a simple, effective training stabilizer based on the notion of consistency regularization—a popular technique in the semi-supervised learning literature. In particular, we augment data passing into the GAN discriminator and penalize the sensitivity of the discriminator to these augmentations. We conduct a series of experiments to demonstrate that consistency regularization works effectively with spectral normalization and various GAN architectures, loss functions and optimizer settings. Our method achieves the best FID scores for unconditional image generation compared to other regularization methods on CIFAR-10 and CelebA. Moreover, Our consistency regularized GAN (CR-GAN) improves state-of-the-art FID scores for conditional generation from 14.73 to 11.67 on CIFAR-10 and from 8.73 to 6.66 on ImageNet-2012.


page 13

page 14

page 15

page 16


Improved Consistency Regularization for GANs

Recent work has increased the performance of Generative Adversarial Netw...

Consistency Regularization with Generative Adversarial Networks for Semi-Supervised Image Classification

Generative Adversarial Networks (GANs) based semi-supervised learning (S...

Feature Statistics Mixing Regularization for Generative Adversarial Networks

In generative adversarial networks, improving discriminators is one of t...

GraN-GAN: Piecewise Gradient Normalization for Generative Adversarial Networks

Modern generative adversarial networks (GANs) predominantly use piecewis...

ViTGAN: Training GANs with Vision Transformers

Recently, Vision Transformers (ViTs) have shown competitive performance ...

Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation

Unpaired image-to-image (I2I) translation has received considerable atte...

The GAN Landscape: Losses, Architectures, Regularization, and Normalization

Generative Adversarial Networks (GANs) are a class of deep generative mo...

1 Introduction

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have recently demonstrated impressive results on image-synthesis benchmarks (Radford et al., 2016; Zhang et al., 2017; Miyato and Koyama, 2018; Zhang et al., 2018; Brock et al., 2018; Karras et al., 2019)

. In the original setting, GANs are composed of two neural networks trained with competing goals: the

generator is trained to synthesize realistic samples to fool the discriminator and the discriminator is trained to distinguish real samples from fake ones produced by the generator.

One major problem with GANs is the instability of the training procedure and the general sensitivity of the results to various hyperparameters

(Salimans et al., 2016). Because GAN training implicitly requires finding the Nash equilibrium of a non-convex game in a continuous and high dimensional parameter space, it is substantially more complicated than standard neural network training. In fact, formally characterizing the convergence properties of the GAN training procedure is mostly an open problem (Odena, 2019). Previous work (Miyato et al., 2018a; Odena et al., 2017)

has shown that interventions focused on the discriminator can mitigate stability issues. Most successful interventions fall into two categories, normalization and regularization. Spectral normalization is the most effective normalization method, in which weight matrices in the discriminator are divided by an approximation of their largest singular value. For regularization,

Gulrajani et al. (2017) penalize the gradient norm of straight lines between real data and generated data. Roth et al. (2017) propose to directly regularize the squared gradient norm for both the training data and the generated data. DRAGAN (Kodali et al., 2017) introduces another form of gradient penalty where the gradients at Gaussian perturbations of training data are penalized. One may anticipate simultaneous regularization and normalization could improve sample quality. However, most of these gradient based regularization methods either provide marginal gains or fail to introduce any improvement when normalization is used (Kurach et al., 2019), which is also observed in our experiments. These regularization methods and spectral normalization are motivated by controlling Lipschitz constant of the discriminator. We suspect this might be the reason that applying both does not lead to overlaid gain.

In this paper, we examine a technique called consistency regularization (Sajjadi et al., 2016; Laine and Aila, 2016; Zhai et al., 2019; Xie et al., 2019)

in contrast to gradient-based regularizers. Consistency regularization is widely used in semi-supervised learning to ensure that the classifier output remains unaffected for an unlabeled example even it is augmented in semantic-preserving ways. In light of this intuition, we hypothesize a well-trained discriminator should also be regularized to have the consistency property, which enforces the discriminator to be unchanged by arbitrary semantic-preserving perturbations and to focus more on semantic and structural changes between real and fake data. Therefore, we propose a simple regularizer to the discriminator of GAN: we augment images with semantic-preserving augmentations before they are fed into the GAN discriminator and penalize the sensitivity of the discriminator to those augmentations.

Figure 1: An illustration of consistency regularization for GANs. Before consistency regularization, the zoomed-in dog and the zoomed-in cat (bottom left) can be closer than they are to their original images in feature space induced by the GAN discriminator. This is illustrated in the upper right (the semantic feature space), where the purple dot is closer to the blue dot than to the red dot, and so forth. After we enforce consistency regularization based on the implicit assumption that image augmentation preserves the semantics we care about, the purple dot pulled closer to the red dot.

This technique is simple to use and surprisingly effective. It is as well less computationally expensive than prior techniques. More importantly, in our experiments, consistency regularization can always further improve the model performance when spectral normalization is used, whereas the performance gains of previous regularization methods diminish in such case. In extensive ablation studies, we show that it works across a large range of GAN variants and datasets. We also show that simply applying this technique on top of existing GAN models leads to new state-of-the-art results as measured by Frechet Inception Distance (Heusel et al., 2017).

In summary, our contributions are summarized as follows:

  • We propose consistency regularization for GAN discriminators to yield a simple, effective regularizer with lower computational cost than gradient-based regularization methods.

  • We conduct extensive experiments with different GAN variants to demonstrate that our technique interacts effectively with spectral normalization. Our consistency regularized GAN (CR-GAN) achieves the best FID scores for unconditional image generation on both CIFAR-10 and CelebA.

  • We show that simply applying the proposed technique can further boost the performance of state-of-the-art GAN models. We improve FID scores for conditional image generation from 14.73 to 11.67 on CIFAR-10 and from 8.73 to 6.66 on ImageNet-2012.

2 Method

2.1 GANs

A GAN consists of a generator network and a discriminator network. The generator takes a latent variable sampled from a prior distribution and maps it to the observation space . The discriminator takes an observation and produces a decision output over possible observation sources (either from or from the empirical data distribution). In the standard GAN training procedure the generator and the discriminator are trained by minimizing the following objectives in an alternating fashion:



is usually a standard normal distribution. This formulation is originally proposed by

Goodfellow et al. (2014) as non-saturating (NS) GAN. A significant amount of research has been done on modifying this formulation in order to improve the training process. A notable example is the hinge-loss version of the adversarial loss (Lim and Ye, 2017; Tran et al., 2017):


Another commonly adopted GAN formulation is the Wassertein GAN (WGAN) (Arjovsky et al., 2017), in which the authors propose clipping the weights of the discriminator in an attempt to enforce that the GAN training procedure implicitly optimizes a bound on the Wassertein distance between the target distribution and the distribution given by the generator. The loss function of WGAN can be written as


Subsequent work has refined this technique in several ways (Gulrajani et al., 2017; Miyato et al., 2018a; Zhang et al., 2019), and the current widely-used practice is to enforce spectral normalization (Miyato et al., 2018a) on both the generator and the discriminator.

2.2 Consistency Regularization

Consistency regularization has emerged as a gold-standard technique (Sajjadi et al., 2016; Laine and Aila, 2016; Zhai et al., 2019; Xie et al., 2019; Oliver et al., 2018; Berthelot et al., 2019) for semi-supervised learning on image data. The basic idea is simple: an input image is perturbed in some semantics-preserving ways and the sensitivity of the classifier to that perturbation is penalized. The perturbation can take many forms: it can be image flipping, or cropping, or adversarial attacks. The regularization form is either the mean-squared-error (Sajjadi et al., 2016; Laine and Aila, 2016) between the model’s output for a perturbed and non-perturbed input or the KL divergence (Xie et al., 2019; Miyato et al., 2018b)

between the distribution over classes implied by the output logits.

2.3 Consistency Regularization for GANs

The goal of the discriminator in GANs is to distinguish real data from fake ones produced by the generator. The decision should be invariant to any valid domain-specific data augmentations. For example, in the image domain, the image being real or not should not change if we flip the image horizontally or translate the image by a few pixels. However, the discriminator in GANs does not guarantee this property explicitly.

To resolve this, we propose a consistency regularization on the GAN discriminator during training. In practice, we randomly augment training images as they are passed to the discriminator and penalize the sensitivity of the discriminator to those augmentations.

We use

to denote the output vector before activation of the

th layer of the discriminator given input . denotes a stochastic data augmentation function. This function can be linear or nonlinear, but aims to preserve the semantics of the input. Our proposed regularization is given by


where indexes the layers, is the starting layer and is the ending layer that consistency is enforced. is weight coefficient for th layer and denotes norm of a given vector. This consistency regularization encourages the discriminator to produce the same output for a data point under various data augmentations.

In our experiments, we find that consistency regularization on the last layer of the discriminator before the activation function is sufficient.

can be rewritten as


where from now on we will drop the layer index for brevity. This cost is added to the discriminator loss (weighted by a hyper-parameter ) when updating the discriminator parameters. The generator update remains unchanged. Thus, the overall consistency regularized GAN (CR-GAN) objective is written as


Our design of is general-purpose and thereby can work with any valid adversarial losses and for GANs (See Section 2.1 for examples). Algorithm 1 illustrates the details of CR-GAN with Wassertein loss as an example. In contrast to previous regularizers, our method does not increase much overhead. The only extra computational cost comes from feeding an additional (third) image through the discriminator forward and backward when updating the discriminator parameters.

1:generator and discriminator parameters , consistency regularization coefficient , Adam hyperparameters , batch size , number of discriminator iterations per generator iteration
2:for number of training iterations do
3:     for  do
4:         for  do
5:              Sample ,
6:              Augment to get
9:         end for
11:     end for
12:     Sample a batch of latent variables
14:end for
Algorithm 1 Consistency Regularized GAN (CR-GAN). We use by default.

3 Experiments

This section validates our proposed CR-GAN method. First we conduct a large scale study to compare consistency regularization to existing GAN regularization techniques (Kodali et al., 2017; Gulrajani et al., 2017; Roth et al., 2017) for several GAN architectures, loss functions and other hyper-parameter settings. We then apply consistency regularization to a state-of-the-art GAN model (Brock et al., 2018)

and demonstrate performance improvement. Finally, we conduct ablation studies to investigate the importance of various design choices and hyper-parameters. All our experiments are based on the open-source code from Compare GAN

(Kurach et al., 2019), which is available at

3.1 Datasets and Evaluation Metrics

We validate our proposed method on three datasets: CIFAR-10 (Krizhevsky, 2009), CELEBA-HQ-128 (Karras et al., 2018), and ImageNet-2012 (Russakovsky et al., 2015). We follow the procedure in Kurach et al. (2019) to prepare datasets. CIFAR-10 consists of 60K of images in 10 classes; 50K for training and 10K for testing. CELEBA-HQ-128 (CelebA) contains 30K images of faces at a resolution of . We use 3K images for testing and the rest of images for training. ImageNet-2012 contains roughly 1.2 million images with 1000 distinct categories and we down-sample the images to in our experiments.

We adopt the Fréchet Inception distance (FID) (Heusel et al., 2017) as primitive metric for quantitative evaluation, as FID has proved be more consistent with human evaluation. In our experiments the FID is calculated on the test dataset. In particular, we use 10K generated images vs. 10K test images on CIFAR-10, 3K vs. 3K on CelebA and 50K vs. 50K on ImageNet. We also provide the Inception Score (Salimans et al., 2016) for different methods in the Appendix F for supplementary results. By default, the augmentation used in consistency regularization is a combination of randomly shifting the image by a few pixels and randomly flipping the image horizontally. The shift size is 4 pixels for CIFAR-10 and CelebA and 16 for ImageNet.

3.2 Comparison with other GAN regularization methods

In this section, we compare our methods with three GAN regularization techniques, Gradient Penalty (GP) (Gulrajani et al., 2017), DRAGAN Regularizer (DR) (Kodali et al., 2017) and JS-Regularizer (JSR) (Roth et al., 2017) on CIFAR-10 and CelebA.

Following the procedures from (Kurach et al., 2019; Lucic et al., 2018), we evaluate these methods across different optimizer parameters, loss functions, regularization coefficient and neural architectures. For optimization, we use the Adam optimizer with batch size of 64 for all our experiments. We stop training after 200k generator update steps for CIFAR-10 and 100k steps for CelebA. By default, spectral normalization (SN) (Miyato et al., 2018a) is used in the discriminator, as this is the most effective normalization method for GANs (Kurach et al., 2019) and is becoming the standard for ‘modern’ GANs (Zhang et al., 2019; Brock et al., 2018). Results without spectral normalization can be seen in the Appendix B.

3.2.1 Impact of Loss function

Figure 2: Comparison of our method with existing regularization techniques under different GAN losses. Techniques include no regularization (W/O), Gradient Penalty (GP) (Gulrajani et al., 2017), DRAGAN (DR) (Kodali et al., 2017) and JS-Regularizer (JSR) (Roth et al., 2017). Results (a-c) are for CIFAR-10 and results (d-f) are for CelebA.

In this section, we discuss how each regularization method performs when the loss function is changed. Specifically, we evaluate regularization methods using three loss functions: the non-saturating loss (NS) (Goodfellow et al., 2014), the Wasserstein loss (WAS) (Arjovsky et al., 2017), and the hinge loss (Hinge) (Lim and Ye, 2017; Tran et al., 2017). For each loss function, we evaluate over 7 hyper-parameter settings of the Adam optimizer (more details in Section A of the appendix). For each configuration, we run each model 3 times with different random seeds. For the regularization coefficient, we use the best value reported in the corresponding paper. Specifically is set to be 10 for both GP, DR and our method and 0.1 for JSR. In this experiment, we use the SNDCGAN network architecture (Miyato et al., 2018a) for simplicity. In the end, similar as Kurach et al. (2019), we aggregate all runs and report the FID distribution of the top 15% of trained models.

The results are shown in Figure 2. The consistency regularization improves the baseline across all different loss functions and both datasets. Other techniques have more mixed results: For example, GP and DR can marginally improve the performance for settings (d) and (e) but lead to worse results for settings (a) and (b) (which is consistent with findings from Kurach et al. (2019)). In all cases, our consistency-regularized GAN models have the lowest (best) FID.

This finding is especially encouraging, considering that the consistency regularization has lower computational cost (and is simpler to implement) than the other techniques. In our experiments, the consistency regularization is around times faster than gradient based regularization techniques, including DR, GP and JSR, which need to compute the gradient of the gradient norm . Please see Table C1 in the appendix for the actual training speed.

Setting W/O GP DR JSR Ours (CR-GAN)
CIFAR-10 (SNDCGAN) 24.73 25.83 25.08 25.17 18.72
CIFAR-10 (ResNet) 19.00 19.74 18.94 19.59 14.56
CelebA (SNDCGAN) 25.95 22.57 21.91 22.17 16.97
Table 1: Best FID scores for unconditional image generation on CIFAR-10 and CelebA.

3.2.2 Impact of the regularization coefficient

Figure 3: Comparison of FID scores with different values of the regularization coefficient on CIFAR-10 and CelebA. The dotted line is a model without regularization.

Here we study the sensitivity of GAN regularization techniques to the regularization coefficient . We train SNDCGANs with non-saturating losses and fix the other hyper-parameters. is chosen among {0.1, 1, 10, 100}. The results are shown in Figure 3. From this figure, we can see consistency regularization is more robust to changes in than other GAN regularization techniques (it also has the best FID for both datasets). The results indicate that consistency regularization can be used as a plug-and-play technique to improve GAN performance in different settings without much hyper-parameter tuning.

3.2.3 Impact of Neural Architectures

Figure 4: Comparison of FID scores with ResNet structure on different loss settings on CIFAR-10.

To validate whether the above findings hold across different neural architectures, we conduct experiments on CIFAR-10 using a ResNet (He et al., 2016; Gulrajani et al., 2017) architecture instead of an SNDCGAN. All other experimental settings are same as in Section 3.2.1. The FID values are presented in Figure 4. By comparing results in Figure 4 and Figure 2, we can see that results on SNDCGAN and results on ResNet are comparable, though consistency regularization fares even better in this case: In sub-plot (c) of Figure 4, we can see that consistency regularization is the only regularization method that can generate satisfactory samples with a reasonable FID score (The FID scores for other methods are above 100). Please see Figure D3 for the actual generated samples in this setting. As in Section 3.2.1, consistency regularization has the best FID for each setting.

In Table 1, we show FID scores for the best-case settings from this section. Consistency regularization improves on the baseline by a large margin and achieves the best results across different network architectures and datasets. In particular, it achieves an FID 14.56 on CIFAR-10 16.97 on CelebA. In fact, our FID score of 14.56 on CIFAR-10 for unconditional image generation is even lower than the 14.73 reported in Brock et al. (2018) for class-conditional image-synthesis with a much larger network architecture and much bigger batch size.

3.3 Comparison with state-of-the-art GAN models

In this section, we add consistency regularization to the state-of-the-art BigGAN model (Brock et al., 2018) and perform class conditional image-synthesis on CIFAR-10 and ImageNet. Our model has exactly the same architecture and is trained under the same settings as BigGAN, the open-source implementation of BigGAN from Kurach et al. (2019). The only difference is that our model uses consistency regularization. In Table 2, we report the original FID scores without noise truncation. Consistency regularization improves the FID score of BigGAN on CIFAR-10 from 20.42 to 11.67. In addition, the FID on ImageNet is improved from 7.75 to 6.66.

Generated samples for CIFAR-10 and ImageNet with consistency regularized models and baseline models are shown in Figures E1, E2 and E3 in the appendix.

CIFAR-10 17.5 / 14.73 20.42 11.67
ImageNet 27.62 18.65 8.70 7.75 6.66

Table 2: Comparison of our technique with state-of-the-art GAN models including SNGAN (Miyato and Koyama, 2018), SAGAN (Zhang et al., 2019) and BigGAN (Brock et al., 2018) for class conditional image generation on CIFAR-10 and ImageNet in terms of FID. BigGAN is the BigGAN implementation of Kurach et al. (2019). CR-BigGAN has the exactly same architecture as BigGAN and is trained with the same settings. The only difference is CR-BigGAN adds consistency regularization.
Figure 5: A study of how much data augmentation matters by itself. Three GANs were trained on CIFAR-10: one baseline GAN, one GAN with data augmentation only, and one GAN with consistency regularization. (Left) Accuracy of the GAN discriminator on the held out test set. The accuracy is low for the baseline GAN, which indicates it suffered from over-fitting. The accuracy for the other two is basically indistinguishable for each other. This suggests that augmentation by itself is enough to reduce discriminator over-fitting, and that consistency regularization by itself does little to address over-fitting. (Right) FID scores of the three settings. The score for the GAN with only augmentation is not any better than the score for the baseline, even though its discriminator is not over-fitting. The score for the GAN with consistency regularization is better than both of the others, suggesting that the consistency regularization acts on the score through some mechanism other than by reducing discriminator over-fitting.

4 Ablation Studies and Discussion

4.1 How Much Does Augmentation Matter by Itself?

Our consistency regularization technique actually has two parts: we perform data augmentation on inputs from the training data, and then consistency is enforced between the augmented data and the original data. We are interested in whether the performance gains shown in Section 3 are merely due to data augmentation, since data augmentation reduces the over-fitting of the discriminator to the input data. Therefore, we have designed an experiment to answer this question.

First, we train three GANs:

  • A GAN trained with consistency regularization, as Algorithm 1.

  • A baseline GAN trained without augmentation or consistency regularization.

  • A GAN trained with only data augmentation, and no consistency regularization.

We then plot (Figure 5) both their FID and the accuracy of their discriminator on a held-out test set. The FID tells us how ‘good’ the resulting GAN is, and the discriminator accuracy tells us how much the GAN discriminator over-fits.

Interestingly, we find that these two measures are not well correlated in this case. The model trained with only data augmentation over-fits substantially less than the baseline GAN, but has almost the same FID. The model trained with consistency regularization has the same amount of over-fitting as the model trained with just data augmentation, but a much lower FID. This suggests an interesting hypothesis: the mechanism by which consistency regularization improves GANs is not simply reducing discriminator over-fitting.

4.2 How does the Type of Augmentation Affect Results?

To analyze how different types of data augmentation affect our results, we conduction an ablation study on the CIFAR-10 dataset comparing the results of using four different types of image augmentation:

  • Adding Gaussian noise to the image in pixel-space.

  • Randomly shifting the image by a few pixels and randomly flipping it horizontally.

  • Applying cutout (DeVries and Taylor, 2017) transformations to the image.

  • Cutout and random shifting and flipping.

Metric Gaussian Noise Random shift & flip Cutout Cutout w/ random shift & flip
FID 21.91 16.04 17.10 19.46
Table 3: FID scores on CIFAR-10 for different types of image augmentation. Gaussian noise is the worst, and random shift and flip is the best, consistent with general consensus on the best way to perform image optimization on CIFAR-10 (Zagoruyko and Komodakis, 2016). Interestingly, the most substantial augmentation does not yield the best performance.

As shown in Table 3, random flipping and shifting without cutout gives the best results (FID 16.04) among all four methods. Adding Gaussian noise in pixel-space gives the worst results. This result empirically suggests that adding Gaussian noise is not a good semantic preserving transformation in the image manifold.

It’s also noteworthy that the most extensive augmentation (random flipping and shifting with cutout) did not perform the best - this suggests that it can be useful to start with simple augmentations and gradually increase their complexity until results plateau. Notably, random flipping and shifting has been adopted as the de-facto standard data augmentation policy on the CIFAR-10 dataset (Zagoruyko and Komodakis, 2016), which is consistent with our results.

5 Conclusion

In this paper, we propose a simple, effective, and computationally cheap method – consistency regularization – to improve the performance of GANs. Consistency regularization is compatible with spectral normalization and results in improvements in all of the many contexts in which we evaluated it. Moreover, we have demonstrated consistency regularization is more effective than other regularization methods under different loss functions, neural architectures and optimizer hyper-parameter settings. We have also shown simply applying consistency regularization on top of state-of-the-art GAN models can further greatly boost the performance. Finally, we have conducted a thorough study on the design choices and hyper-parameters of consistency regularization. We hope that the proposed GAN regularizer will become a favorable add-on of advanced GANs and we also encourage future study on applying consistency regularization on other types of generative models.


We thank Colin Raffel for feedback on drafts of this article. We also thank Marvin Ritter, Michael Tschannen and Mario Lucic for answering our questions of using compare GAN codebase for large scale GAN evaluation.


  • M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein GAN. arXiv:1701.07875. Cited by: §2.1, §3.2.1.
  • D. Berthelot, N. Carlini, I. J. Goodfellow, N. Papernot, A. Oliver, and C. Raffel (2019) MixMatch: A holistic approach to semi-supervised learning. arXiv:1905.02249. Cited by: §2.2.
  • A. Brock, J. Donahue, and K. Simonyan (2018) Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096. Cited by: §1, §3.2.3, §3.2, §3.3, Table 2, §3.
  • T. DeVries and G. W. Taylor (2017)

    Improved regularization of convolutional neural networks with cutout

    arXiv preprint arXiv:1708.04552. Cited by: 3rd item.
  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio (2014) Generative adversarial nets. In NeurIPS, Cited by: §1, §2.1, §3.2.1.
  • I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein GANs. In NeurIPS, Cited by: Appendix A, §1, §2.1, Figure 2, §3.2.3, §3.2, §3.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In CVPR, Cited by: §3.2.3.
  • M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, Cited by: §1, §3.1.
  • T. Karras, T. Aila, S. Laine, and J. Lehtinen (2018) Progressive growing of GANs for improved quality, stability, and variation. In ICLR, Cited by: §3.1.
  • T. Karras, S. Laine, and T. Aila (2019) A style-based generator architecture for generative adversarial networks. In CVPR, Cited by: §1.
  • N. Kodali, J. Abernethy, J. Hays, and Z. Kira (2017) On convergence and stability of gans. arXiv preprint arXiv:1705.07215. Cited by: §1, Figure 2, §3.2, §3.
  • A. Krizhevsky (2009) Learning multiple layers of features from tiny images. Technical report Cited by: §3.1.
  • K. Kurach, M. Lucic, X. Zhai, M. Michalski, and S. Gelly (2019) A large-scale study on regularization and normalization in gans. In ICML, Cited by: Appendix A, §1, §3.1, §3.2.1, §3.2.1, §3.2, §3.3, Table 2, §3.
  • S. Laine and T. Aila (2016) Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242. Cited by: §1, §2.2.
  • J. H. Lim and J. C. Ye (2017) Geometric GAN. arXiv:1705.02894. Cited by: §2.1, §3.2.1.
  • M. Lucic, K. Kurach, M. Michalski, S. Gelly, and O. Bousquet (2018) Are gans created equal? A large-scale study. In NeurIPS, Cited by: §3.2.
  • T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida (2018a) Spectral normalization for generative adversarial networks. In ICLR, Cited by: Appendix A, §1, §2.1, §3.2.1, §3.2.
  • T. Miyato and M. Koyama (2018) cGANs with projection discriminator. In ICLR, Cited by: §1, Table 2.
  • T. Miyato, S. Maeda, S. Ishii, and M. Koyama (2018b) Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence. Cited by: §2.2.
  • A. Odena, C. Olah, and J. Shlens (2017) Conditional image synthesis with auxiliary classifier gans. In ICML, Cited by: §1.
  • A. Odena (2019) Open questions about generative adversarial networks. Distill. Note: External Links: Document Cited by: §1.
  • A. Oliver, A. Odena, C. A. Raffel, E. D. Cubuk, and I. Goodfellow (2018) Realistic evaluation of deep semi-supervised learning algorithms. In NeurIPS, pp. 3235–3246. Cited by: §2.2.
  • A. Radford, L. Metz, and S. Chintala (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, Cited by: Appendix A, §1.
  • K. Roth, A. Lucchi, S. Nowozin, and T. Hofmann (2017) Stabilizing training of generative adversarial networks through regularization. In NeurIPS, Cited by: §1, Figure 2, §3.2, §3.
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei (2015) ImageNet large scale visual recognition challenge. IJCV 115 (3), pp. 211–252. Cited by: §3.1.
  • M. Sajjadi, M. Javanmardi, and T. Tasdizen (2016) Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In NeurIPS, Cited by: §1, §2.2.
  • T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen (2016) Improved techniques for training gans. In NeurIPS, Cited by: Appendix F, §1, §3.1.
  • D. Tran, R. Ranganath, and D. M. Blei (2017) Deep and hierarchical implicit models. arXiv:1702.08896. Cited by: §2.1, §3.2.1.
  • Q. Xie, Z. Dai, E. Hovy, M. Luong, and Q. V. Le (2019) Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848. Cited by: §1, §2.2.
  • S. Zagoruyko and N. Komodakis (2016) Wide residual networks. In BMVC, Cited by: §4.2, Table 3.
  • X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer (2019) Sl: self-supervised semi-supervised learning. arXiv preprint arXiv:1905.03670. Cited by: §1, §2.2.
  • H. Zhang, I. J. Goodfellow, D. N. Metaxas, and A. Odena (2019) Self-attention generative adversarial networks. In ICML, Cited by: §2.1, §3.2, Table 2.
  • H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas (2017) StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV, Cited by: §1.
  • Z. Zhang, Y. Xie, and L. Yang (2018) Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In CVPR, Cited by: §1.


Appendix A Hyperparameter settings of optimizer

A 0.0001 0.5 0.9 5
B 0.0001 0.5 0.999 1
C 0.0002 0.5 0.999 1
D 0.0002 0.5 0.999 5
E 0.001 0.5 0.9 5
F 0.001 0.5 0.999 5
G 0.001 0.9 0.999 5
Table A1: Hyper-parameters of the optimizer used in our experiments.

Here, similar as the experiments in Miyato et al. (2018a); Kurach et al. (2019), we evaluate all regularization methods across 7 different hyperparameters settings for (1) learning rate (2) first and second order momentum parameters of Adam , (3) number of the updates of the discriminator per generator update, . The details of all the settings are shown in Table A1. Among all these 7 settings, A-D are the "good" hyperparameters used in previous publications (Radford et al., 2016; Gulrajani et al., 2017; Kurach et al., 2019); E, F, G are the "aggressive" hyperparameter settings introduced by Miyato et al. (2018a) to test model performance under noticeably large learning rate or disruptively high momentum. In practice, we find setting 3 generally works the best for SNDCGAN and setting 4 is the optimal setting for ResNet. These two settings are also the default settings in the Compare GAN codebase for the corresponding network architectures.

Figure A1: Comparison of FID scores with different optimizer settings.

Figure A1 displays the FID score of all methods with 7 settings A-G. We can observe that consistency regularization is fairly robust even for some of the aggressive hyperparameter settings. In general, the proposed consistency regularization can generate better samples with different optimizer settings compared with other regularization methods.

Appendix B Comparison of different regularization methods when spectral normalization is not used

Figure B1: Comparison of FID scores when SN is not used.

Here, we compare different regularization methods when spectral normalization (SN) is not used. As shown in Figure B1, our consistency regularization always improves the baseline model (W/O). It also achieves the best FID scores in most of the cases, which demonstrates that consistency regularization does not depend on spectral normalization. By comparing with the results in Figure 2 and Figure 4, we find adding spectral normalization will further boost the results. More importantly, the consistency regularization is only method that improve on top of spectral normalization without exception. The other regularization methods do not have this property.

Appendix C Training Speed

Here we show the actual training speed of discriminator updates for SNDCGAN on CIFAR-10 with NVIDIA Tesla V100. Consistency regularization is around times faster than gradient based regularization techniques.

Method W/O GP DR JSR Ours (CR-GAN)
Speed (step/s) 66.3 29.7 29.8 29.2 51.7
Table C1: Training speed of discriminator updates for SNDCGAN on CIFAR-10.

Appendix D Generated samples for unconditional image generation

Figure D1: Comparison of generated samples of CelebA.
Figure D2: Comparison of generated samples for unconditional image generation on CIFAR-10 with a ResNet architecture.
Figure D3: Comparison of unconditional generated samples on CIFAR-10 with a ResNet architecture, Wasserstein loss and spectral normalization. This is a hard hyperparameter setting where the baseline and previous regularization methods fail to generate reasonable samples. Consistency Regularization is the only regularization method that can generate satisfactory samples in this setting. FID scores are shown in sub-plot (c) of Figure 4.

Appendix E Generated samples for conditional image generation

Figure E1: Comparison of generated samples for conditional image generation on CIFAR-10. Each row shows the generated samples of one class.
Figure E2: Comparison of conditionally generated samples of BigGAN* and CR-BigGAN* on ImageNet. (Left) Generated samples of CR-BigGAN*. (Right) Generated samples of BigGAN*.
Figure E3: More results for conditionally generated samples of BigGAN* and CR-BigGAN* on ImageNet. (Left) Generated samples of CR-BigGAN*. (Right) Generated samples of BigGAN*.

Appendix F Comparison with inception score

Inception Score (IS) is another GAN evaluation metric introduced by

Salimans et al. (2016). Here, we compare the Inception Score of the unconditional generated samples on CIFAR-10. As shown in Table F1, Figure F1 and Figure F2, consistency regularization achieves the best IS result with both SNDCGAN and ResNet architectures.

Setting W/O GP DR JSR Ours (CR-GAN)
CIFAR-10 (SNDCGAN) 7.54 7.54 7.54 7.52 7.93
CIFAR-10 (ResNet) 8.20 8.04 8.09 8.03 8.40
Table F1: Best Inception Score for unconditional image generation on CIFAR-10.
Figure F1: Comparison of IS with a SNDCGAN architecture on different loss settings. Models are trained on CIFAR-10.
Figure F2: Comparison of IS with a ResNet architecture on different loss settings. Models are trained on CIFAR-10.