cGANs with Auxiliary Discriminative Classifier

07/21/2021 ∙ by Liang Hou, et al. ∙ Institute of Computing Technology, Chinese Academy of Sciences 9

Conditional generative models aim to learn the underlying joint distribution of data and labels, and thus realize conditional generation. Among them, auxiliary classifier generative adversarial networks (AC-GAN) have been widely used, but suffer from the issue of low intra-class diversity on generated samples. In this paper, we point out that the fundamental reason is that the classifier of AC-GAN is generator-agnostic, and thus cannot provide informative guidance to the generator to approximate the target joint distribution, leading to a minimization of conditional entropy that decreases the intra-class diversity. Based on this finding, we propose novel cGANs with auxiliary discriminative classifier (ADC-GAN) to address the issue of AC-GAN. Specifically, the auxiliary discriminative classifier becomes generator-aware by distinguishing between the real and fake data while recognizing their labels. We then optimize the generator based on the auxiliary classifier along with the original discriminator to match the joint and marginal distributions of the generated samples with those of the real samples. We provide theoretical analysis and empirical evidence on synthetic and real-world datasets to demonstrate the superiority of the proposed ADC-GAN compared to competitive cGANs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 9

page 18

page 19

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Generative adversarial networks (GANs) Goodfellow et al. (2014) have been gained great progress in learning high-dimensional, complex data distribution Karras et al. (2019, 2020, 2020). Standard GANs consist of a generator network that aims to reproduce the real data distribution by generating novel data as well as a discriminator network that attempts to distinguish between the real data and the generated data. The generator is trained in an adversarial game that is against the discriminator such that it can be guaranteed to replicate the data distribution at optima. Remarkably, the training of GANs is notoriously unstable and thereby the generator is prone to mode collapse Salimans et al. (2016); Lin et al. (2018); Chen et al. (2019). Besides, in practical applications, people pay more attention to how to control the properties of the generated samples Yan et al. (2015); Tan et al. (2020). A key solution to address the above concerns is conditioning, which leads to conditional GANs (cGANs) Mirza and Osindero (2014).

Conditional GANs is a family variant of GANs that leverages the side information from annotated labels of samples to implement and train a conditional generator, and therefore achieve conditional image generation from class-label Odena et al. (2017); Miyato and Koyama (2018); Brock et al. (2019) or text Reed et al. (2016); Xu et al. (2018); Zhu et al. (2019)

. To implement the conditional generator, common technique nowadays injects the conditional information via conditional batch normalization 

de Vries et al. (2017). To train the conditional generator, a lot of researches focus on effectively injecting the conditional information into the discriminator Odena (2016); Miyato and Koyama (2018); Zhou et al. (2018); Kavalerov et al. (2021); Kang and Park (2020); Zhou et al. (2020). Among them, the auxiliary classifier generative adversarial networks (AC-GAN) Odena et al. (2017) have been widely used due to its simplicity and extensibility. Specifically, AC-GAN utilizes an auxiliary classifier that first attempts to recognize the label of real data and then teaches the generator to produce classifiable data. However, it has been reported that AC-GAN suffers from low intra-class diversity problem on generated samples especially trained on datasets with large number of classes Odena et al. (2017); Shu et al. (2017); Gong et al. (2019).

In this paper, we find out that the fundamental reason for the low intra-class diversity issue of AC-GAN is that the classifier is agnostic to the generated data distribution. Based on this finding, we propose a novel conditional GAN framework with an auxiliary discriminative classifier, namely ADC-GAN, to solve the issue of AC-GAN by making the discriminative classifier to be aware of the generated data distribution. To this end, the discriminative classifier is trained to distinguish between the real and fake data while recognizing the label of them. This discriminative property enables the classifier to provide the discrepancy between the generated data distribution and the real data distribution analogy to the discriminator, and the classification property allows it to capture the dependencies between the data and label. We show in theory that the generator of the proposed method can recover the joint distribution under the guidance of the discriminative classifier even without the discriminator at optima. We also discuss the difference between the proposed ADC-GAN with two most related works (TAC-GAN Gong et al. (2019) and PD-GAN Miyato and Koyama (2018)) and demonstrate the superiority and rationality of ADC-GAN compared to them. The advantages over competitive baselines in experiments conducted on both synthetic and real-world datasets verify the effectiveness of the proposed ADC-GAN in conditional generative modeling.

2 Preliminaries

2.1 Generative Adversarial Networks

Generative adversarial networks (GANs) Goodfellow et al. (2014)

consist of two type of neural networks: the generator

that maps a latent code endowed with an easily sampled distribution to a data , and the discriminator that distinguishes between real data that sampled from the real data distribution and fake data that sampled from the generated data distribution implied by the generator. The goal of the generator is to confuse the discriminator by producing data that is as real as possible. Formally, the objective functions for the discriminator and the generator are defined as follows:

(1)

Theoretically, the global optimal discriminator estimates the Jensen-Shannon (JS) divergence between the real data distribution and the generated data distribution. Therefore, the learning of generator under an optimal discriminator can be considered as minimizing the JS divergence, i.e.,

. This would ideally enable the generator to recover the real data distribution at the global optima. However, the training of GANs usually suffers from instability in learning the high-dimensional, complex data manifold, e.g., nature images, especially when lacking additional supervision from conditional information, e.g., annotated labels of images. Moreover, the content of the generated images of GANs cannot be specified in advance.

2.2 Ac-Gan

Learning GANs with conditional information from class-labels of images can not only improve the training stability and generation quality of unconditional GANs but also achieve conditional generation, which has more practical value than unconditional generation.

One of the most representative conditional GANs is AC-GAN Odena et al. (2017), which utilizes an auxiliary classifier to learn the relationship between the real data and label and enforce the generator, which accepts conditional specification via conditional batch normalization de Vries et al. (2017), to synthesize classifiable data as much as possible. The objective functions for the discriminator , the auxiliary classifier , and the generator of AC-GAN are defined as follows111We follow the common practice in the literature to adopt the stable version instead of the original one.:

(2)

where indicates the weight hyper-parameter of the classifier, and denotes the joint distribution of generated data and labels implied by the generator.

Proposition 1.

The optimal classifier of AC-GAN outputs as follows:

(3)
Theorem 1.

Given the optimal classifier, at the equilibrium point, optimizing the classification task for the generator of AC-GAN is equivalent to:

(4)

where denotes the conditional entropy of generated data.

The proofs of Proposition 1 and Theorem 1 are referred to Appendix A.1 and A.2. Theorem 1 exposes two shortcomings of AC-GAN. First, maximization of the KL divergence between the marginal generator distribution and the marginal data distribution contradicts the goal of conditional generative modeling that matches with . Although this issue can be mitigated to some extent by the adversarial training objective between the discriminator and the generator that minimizes the JS divergence between the two marginal distributions , we find that it still has a negative impact on the training stability (see Section 4.1). Second, minimization of the entropy of label conditioned on data with respect to the generated distribution

will result in that the label of generated data should be completely determined by the data itself. Therefore, it will force the generated data of each class away from the classification hyperplane, explaining the low intra-class diversity of generated samples in AC-GAN especially when the distributions of different classes have non-negligible overlap, which is supported by the fact that state-of-the-art classifiers cannot achieve

accuracy on real-world datasets.

3 Methods

The goal of conditional generative modeling is to faithfully approximate the underlying joint distribution of real data and labels regardless of the shape of the target joint distribution (whether there is overlap between distributions of different classes). Note that the learning of the generator in AC-GAN is affected by the classifier. In other words, the reason for the consequence of Theorem 1 comes from Proposition 1, which indicates that the optimal classifier of AC-GAN is agnostic to the density of the generated data distribution. Therefore, the classifier cannot know the discrepancy between the real data distribution and the generated data distribution, resulting in a biased learning objective to the generator. Recall that the optimal discriminator is able to be aware of the real data density as well as the generated data density Goodfellow et al. (2014), and thus can provide the discrepancy between the real data distribution and the generated data distribution to unbiasedly optimize the generator. Intuitively, the density-aware ability on both real and generated data is caused by the fact that the discriminator attempts to distinguish between real and fake samples. Motivated by this understanding, we propose to make the classifier to be distinguishable between real and fake samples, establishing a discriminative classifier that recognizes the label of real and fake samples discriminatively. And we then optimize the generator based on the discriminator and the discriminative classifier that are expected to provide the discrepancy between the target joint distribution and the generated joint distribution. Formally, the objective functions for the discriminator , the discriminative classifier , and the generator of the proposed ADC-GAN are defined as follows:

(5)
Figure 1: Illustration of the proposed ADC-GAN. The generator generates fake samples conditioned on label . The discriminator distinguishes between real and fake samples. The discriminative classifier is trained to simultaneously recognize the classes and realness of samples.
Proposition 2.

For fixed generator, the optimal classifier of ADC-GAN outputs as follows:

(6)

The proof is referred to Appendix A.3. Proposition 2 confirms that the discriminative classifier be aware of the densities of the real and generated joint distributions, therefore it can provide the discrepancy to unbiasedly optimize the generator.

Theorem 2.

Given the optimal classifier, at the equilibrium point, optimizing the classification task for the generator of ADC-GAN is equivalent to:

(7)

The proof is referred to Appendix A.4. Theorem 2 suggests that the classifier itself can guarantee the generator to learn the joint distribution in theory regardless of the shape of the joint distribution, however it is not easy to get the optimal classifier in practice. Therefore we retain the discriminator to train the generator as illustrated in Figure 1 and Equation 5. Coupled with the adversarial training between the discriminator, the generator of the proposed ADC-GAN, under the optimal discriminator and classifier, can be regarded as minimizing the following divergences:

(8)

Since the optimal solution of conditional generative modeling is also the optimal solution of generative modeling, i.e., , the addition of training with the discriminator will not change the convergence point of the generator that approximates the joint distribution of target data and labels at optima. Besides, the hyper-parameter provides the flexibility to adjust the weight of conditional generative modeling.

4 Discussion

We admit that we are not the only ones who are aware of and attempt to solve the low intra-class diversity problem of AC-GAN. In this section, we discuss the difference between the proposed ADC-GAN with two most related works, TAC-GAN Gong et al. (2019) and PD-GAN Miyato and Koyama (2018), to demonstrate the superiority and rationality of the proposed method compared to them. Before getting into the details, we summarize the theoretical learning objective for the generator under the optimal discriminator and classifier of the proposed method and existing cGANs in Table 1 as an overview.

4.1 Tac-Gan

TAC-GAN Gong et al. (2019) proposes to resolve the low intra-class diversity issue of AC-GAN by eliminating the conditional entropy with respect to the generated data distribution via learning of the generator with another classifier . The objective functions for the discriminator , the twin classifiers and , and the generator of TAC-GAN are defined as follows:

(9)
Method Objective of the generator under the optimal discriminator and classifier
AC-GAN Odena et al. (2017)
TAC-GAN Gong et al. (2019)
ADC-GAN (ours)
PD-GAN Miyato and Koyama (2018)
Table 1: Summary of learning objective of cGANs. denotes any divergence or distance.
Proposition 3.

For fixed generator, the two optimal classifiers of TAC-GAN output as follows:

(10)
Proof.

The proof is similar to that of Proposition 1 in Appendix A.1 by considering and as two independent classifiers with respect to distribution and , respectively. ∎

Theorem 3.

Given the two optimal classifiers, at the equilibrium point, optimizing the classification tasks for the generator of TAC-GAN is equivalent to:

(11)

The proof is referred to Appendix A.5. Theorem 3 reveals that the learning of the generator of TAC-GAN, given the optimal classifier, can be considered as minimizing contradictory divergences, i.e., minimization between joint distributions but maximization between marginal distributions. Although theoretically the JS divergence or others Nowozin et al. (2016); Arjovsky et al. (2017) introduced through the adversarial training between the discriminator and the generator may remedy this issue, the optimal discriminator and classifier, in the practical optimization, are difficult to obtain to ensure that the contradiction is eliminated. We argue that the training instability of TAC-GAN reported in the literature Kocaoglu et al. (2018); Han et al. (2020) and founded in our experiments can be explained by this analysis and interpretation. And the nature of the issue of TAC-GAN is that its classifiers are indiscriminative between the real and fake samples such that they cannot provide true and reliable discrepancy between the real and generated joint distributions.

4.2 Pd-Gan

PD-GAN Miyato and Koyama (2018)

injects the conditional information into the discriminator via the inner-product between the embedding of label and the feature of data to obtain the joint discriminative score on the data-label pair. In such a way, PD-GAN could ideally inherit the property of convergence point of GAN that avoids the low intra-class diversity issue of AC-GAN as it inherits the loss function of GAN. The objective functions for the discriminator and the generator of PD-GAN are defined as follows:

(12)

Based on this minimax game, the optimal discriminator Goodfellow et al. (2014) has the form of the following:

(13)

with and . And they accordingly define

(14)

However, PD-GAN actually ignores the partition term 222The authors mistakenly argue that can be merged into . However, does not consider any label information ( and ), which should be modeled by . Therefore, it is unreasonable to merge into . in Equation 14

, and proposes to construct the logit of the discriminator in the form of:

(15)

with is the difference between two learnable embeddings of label

defined in two implicit conditional probabilities

and , is the feature extractor of data, and outputs a scalar based on the extracted feature. Discarding the partition term would make PD-GAN no longer belong to probability model that models the conditional probabilities and , leading to that PD-GAN losses the complete dependencies between data and labels. By construction, PD-GAN injects the conditional information along with the data into the feedforward computation of the discriminator to obtain the joint discriminative score. However, the discriminator constructed according to the optimal form of the minimax GAN lacks theoretical guarantee when applied on other loss function such as the hinge loss Lim and Ye (2017); Tran et al. (2017), which PD-GAN actually used, and may even limit the function space of the discriminator. The proposed ADC-GAN can be flexibly applied to any version of the loss function as we do not limit the specific form of the discriminator.

5 Experiments

In this section, we conduct extensive experiments on both synthetic and real-world datasets to validate the superiority of the proposed method on conditional generative modeling and representation learning compared to competitive cGANs.

Figure 2: Distribution learning results on one-dimensional synthetic data.

5.1 Synthetic Data

We first experiment on a one-dimensional synthetic mixture of Gaussian to validate the distribution learning ability of different methods. As shown in the left-top of Figure 2

, the real data distribution consists of three classes in which there is non-negligible overlap between them. We implement both generator and discriminator by using multi-layer perceptrons. In particular, we investigate three different settings on the GAN loss function while keeping the learning of the generator with the classifier fixed if it exists. The first row except the data part shows the learned distributions estimated by kernel density estimation 

Parzen (1962) on the generated data of AC-GAN, TAC-GAN, and ADC-GAN without the original GAN loss . The second and the third rows plot the results of these methods trained with the log loss Goodfellow et al. (2014) and the hinge loss Lim and Ye (2017); Tran et al. (2017), respectively. The poor performance of PD-GAN with hinge loss confirms that it is sensitive to the loss function. AC-GAN tends to generate classifiable data so that it decrease the intra-class diversity in all cases. TAC-GAN without GAN loss cannot accurately reproduce the data distribution, verifying the Theorem 3. And the worse performance of TAC-GAN with hinge loss compared with ADC-GAN confirms that the contradiction stated in Theorem 3 is not easy to eliminate by the discriminator. Expectedly, the proposed ADC-GAN can accurately replicate the data distribution even without the original GAN loss, verifying the Theorem 2. For quantitative results, please refer to Appendix B.1.

5.2 Overlapping MNIST

Figure 3:

Hyper-parameter robustness results on overlapping MNIST.

In this subsection, we experiment on a constructed dataset to investigate the hyper-parameter robustness of the proposed ADC-GAN compared to existing classifier-based methods, i.e., AC-GAN and TAC-GAN. We follow the practice of Gong et al. (2019) to construct a two-class handwritten digit dataset from MNIST. The first class contains an equal number of digits of ‘0’ and ‘1’ and the second class contains an equal number of digits of ‘0’ and ‘2’. In other words, the support regions of the two classes have overlap of digits ‘0’. We change the weight of classifier from to with multiplicative step of . As shown in Figure 3, AC-GAN wrongly generates digit ‘2’ in the first class when , and discards digit ‘0’ in the first class when . When , the digit ‘0’ in the generated data of AC-GAN totally disappears, indicating that its classifier encourages the generator to avoid learning the data in the overlapping region. In general, AC-GAN shows a significant decrease in intra-class diversity. TAC-GAN encounters mode collapse when , while ADC-GAN faithfully replicates the real data distribution regardless of the value of . These results suggest that the proposed method has excellent robustness on hyper-parameters compared to existing classifier-based methods since the guidances to the generator received from the discriminator and classifier are harmonious.

5.3 CIFAR-10, CIFAR-100, and Tiny-ImageNet

Subsequently, we experiment on three real-world benchmark datasets: CIFAR-10, CIFAR-100, and Tiny-ImageNet. CIFAR-10 consists of 50k images with resolution of

. CIFAR-100 runs similar samples with CIFAR-10 but has classes rather than

classes in CIFAR-10. Tiny-ImageNet contains

classes where each class contains images (100k images in total).

5.3.1 Image Generation

We implement all methods and conduct most experiments based on the TAC-GAN Gong et al. (2019) repository on GitHub333https://github.com/batmanlab/twin-auxiliary-classifiers-gan444 However, we found that they do not clean the gradients during the multiple discriminator update steps. After fixing this BUG, we cannot reproduce the results reported in their paper.. The backbone of all methods is BigGAN Brock et al. (2019). In the experimental settings, the optimizer is Adam with betas of . The learning rate is for both the generator and discriminator on CIFAR-10 and CIFAR-100, and and for the generator and discriminator, respectively, on Tiny-ImageNet. We train all methods for k iterations with batch size of . The discriminator and classifier are updated twice per generator update step. We follow the practice of Mescheder et al. (2018); Brock et al. (2019) to employ exponential moving average (EMA) for evaluating the generator after iteration k on CIFAR-10/100 and k on Tiny-ImageNet with weight of . The hyper-parameter is set as for methods if they have. We follow the practice of Miyato and Koyama (2018); Gong et al. (2019) to adopt the hinge loss Lim and Ye (2017); Tran et al. (2017) as an alternative of the original adversarial training objective

. Optimizing the hinge loss can be viewed as minimizing the Total-Variance (TV) distance between the generator distribution and the data distribution for the generator under the optimal discriminator 

Tan et al. (2019). We do not add the self-attention layer Zhang et al. (2019) for simplicity and speeding up training. In our implementations, the auxiliary discriminative classifier shares all layers except the head with the discriminator, which makes the proposed method only add negligible parameters compared to unconditional GANs.

Datasets Metrics PD-GAN AC-GAN TAC-GAN ADC-GAN
CIFAR-10 FID ()
IS ()
Accuracy ()
CIFAR-100 FID ()
IS ()
Accuracy ()
Tiny-ImageNet FID ()
IS ()
Accuracy ()
Table 2: Overall FID and IS and Top-1 accuracy on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively. We report the FID and IS of AC-GAN and TAC-GAN before the mode collapse.
(a) CIFAR-10
(b) CIFAR-100
(c) Tiny-ImageNet
Figure 4: FID curves of each method on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively.

Table 2 reports the overall FID and IS scores of the proposed ADC-GAN and competitive cGANs trained on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively. AC-GAN obtains considerable bad results as it lacks intra-class diversity (see Appendix B.3 for generated images). TAC-GAN and AC-GAN encounter mode collapse on CIFAR-100 and Tiny-ImageNet at the end of training and we report their best FID and IS scores here. For intra-class FID, i.e., the FID score calculated on each class, please refer to Appendix B.2. In general, ADC-GAN achieves comparable or even better performance than PD-GAN, validating the effectiveness of ADC-GAN in conditional generative modeling.

We show the training FID curve of different methods in Figure 4. AC-GAN quickly fell into mode collapse on all datasets. And TAC-GAN only achieves a relatively stable FID curve on CIFAR-10, but encounters mode collapse on other datasets. These results show the inherent shortcomings of the existing classifier-based cGANs that they minimize contradictory divergences as concluded in Theorem 3. The proposed ADC-GAN achieves almost the same training stability as the existing state-of-the-art PD-GAN in terms of the FID curves and do not encounter mode collapse.

5.3.2 Image Classification

In order to check whether the discriminator/classifier learns meaningful representations, we conduct image classification experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet based on the learned representations extracted from every residual block in the discriminator/classifier. Specifically, we train a linear classifier with a batch size of for epochs. The optimizer is Adam with an initial learning rate of and decayed by at epoch and as done in Chen et al. (2019). Arguably, if the discriminator/classifier effectively learns the complete and correct relationship between data and labels, then the representations extracted from it will obtain higher accuracy in this experiment Hou et al. (2021).

The Top-1 accuracy results are reported in Table 2 (indicated by the metric Accuracy). We only report here the best accuracy across all residual blocks due to the space limitation. For all the accuracy results of all residual blocks, please refer to Appendix B.4. ADC-GAN significantly outperforms baselines on all datasets, indicating that the discriminator/classifier of ADC-GAN is able to learn more meaningful features of data. The reason behind it is that the discriminator/classifier needs to distinguish between real and fake data while simultaneously recognizing the labels of samples, which forces the discriminator/classifier to have more powerful representation learning capabilities. Arguably, the more meaningful features learned by the discriminator/classifier could provide more informative and valuable guidance to learning the generator to reach a better convergence point.

5.4 Stanford Dogs

In this section, we compare the proposed ADC-GAN with the state-of-the-art PD-GAN on the Stanford Dogs dataset, which contains images of classes of dogs. This dataset is more fine-grained with smaller intra-class diversities than Tiny-ImageNet. We use random horizontal flipping with a probability of as data augmentation. The hyper-parameter of ADC-GAN is set as . The discriminator update steps are per generator update step. Other experimental settings, including optimizers and network architectures, are the same as those in experiments on Tiny-ImageNet. We report the qualitative and quantitative results in Figure 5. ADC-GAN surpasses PD-GAN in terms of FID (from to with improvement) as well as the visual quality, showing a promising conditional generative modeling ability on more fine-grained image datasets. The reason behind it is that ADC-GAN is able to model the dependencies of data and label , while PD-GAN cannot, which enables ADC-GAN to have more accurate control on conditional generation than PD-GAN.

(a) PD-GAN (FID=)
(b) ADC-GAN (FID=)
Figure 5: Samples and FID of PD-GAN and ADC-GAN on the Stanford Dogs dataset.

6 Conclusions

In this paper, we present a novel conditional generative adversarial networks (cGANs) framework with an auxiliary discriminative classifier to achieve accurate conditional generation. The discriminative classifier is able to provide the discrepancy between the joint distribution of the real data and labels and that of the generated data and labels to the generator by discriminatively predicting the label of the real and fake data. Therefore, the generator is able to faithfully learn the joint distribution of the real data and labels under the optimal classifier along with the optimal discriminator in theory. We also discuss the differences between the proposed method with the related approaches and point out their potential issues and limitations. Extensive experimental results clearly demonstrate the theoretical superiority of the proposed method compared to existing cGANs. In the future, we will explore the scalability of the proposed method to large-scale datasets and extend the proposed method to general conditional generation tasks such as text-to-image synthesis.

References

  • [1] M. Arjovsky, S. Chintala, and L. Bottou (2017-06–11 Aug) Wasserstein generative adversarial networks. In

    Proceedings of the 34th International Conference on Machine Learning

    , D. Precup and Y. W. Teh (Eds.),
    Proceedings of Machine Learning Research, Vol. 70, pp. 214–223. External Links: Link Cited by: §4.1.
  • [2] A. Brock, J. Donahue, and K. Simonyan (2019) Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, External Links: Link Cited by: §1, §5.3.1.
  • [3] T. Chen, X. Zhai, M. Ritter, M. Lucic, and N. Houlsby (2019-06) Self-supervised gans via auxiliary rotation loss. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Cited by: §1, §5.3.2.
  • [4] H. de Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, and A. C. Courville (2017) Modulating early visual processing by language. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, pp. . External Links: Link Cited by: §1, §2.2.
  • [5] M. Gong, Y. Xu, C. Li, K. Zhang, and K. Batmanghelich (2019) Twin auxilary classifiers gan. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. . External Links: Link Cited by: §1, §1, §4.1, Table 1, §4, §5.2, §5.3.1.
  • [6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger (Eds.), Vol. 27, pp. . External Links: Link Cited by: §1, §2.1, §3, §4.2, §5.1.
  • [7] L. Han, A. Stathopoulos, T. Xue, and D. Metaxas (2020) Unbiased auxiliary classifier gans with mine. arXiv preprint arXiv:2006.07567. Cited by: §4.1.
  • [8] L. Hou, H. Shen, Q. Cao, and X. Cheng (2021) Self-supervised gans with label augmentation. arXiv preprint arXiv:2106.08601. Cited by: §5.3.2.
  • [9] M. Kang and J. Park (2020) ContraGAN: contrastive learning for conditional image generation. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, pp. 21357–21369. External Links: Link Cited by: §1.
  • [10] T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila (2020) Training generative adversarial networks with limited data. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, pp. 12104–12114. External Links: Link Cited by: §1.
  • [11] T. Karras, S. Laine, and T. Aila (2019-06) A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
  • [12] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila (2020-06) Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
  • [13] I. Kavalerov, W. Czaja, and R. Chellappa (2021-01) A multi-class hinge loss for conditional gans. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1290–1299. Cited by: §1.
  • [14] M. Kocaoglu, C. Snyder, A. G. Dimakis, and S. Vishwanath (2018) CausalGAN: learning causal implicit generative models with adversarial training. In International Conference on Learning Representations, External Links: Link Cited by: §4.1.
  • [15] J. H. Lim and J. C. Ye (2017) Geometric gan. arXiv preprint arXiv:1705.02894. Cited by: §4.2, §5.1, §5.3.1.
  • [16] Z. Lin, A. Khetan, G. Fanti, and S. Oh (2018) PacGAN: the power of two samples in generative adversarial networks. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, pp. . External Links: Link Cited by: §1.
  • [17] L. Mescheder, A. Geiger, and S. Nowozin (2018-10–15 Jul) Which training methods for GANs do actually converge?. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, pp. 3481–3490. External Links: Link Cited by: §5.3.1.
  • [18] M. Mirza and S. Osindero (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. Cited by: §1.
  • [19] T. Miyato and M. Koyama (2018) CGANs with projection discriminator. In International Conference on Learning Representations, External Links: Link Cited by: §1, §1, §4.2, Table 1, §4, §5.3.1.
  • [20] S. Nowozin, B. Cseke, and R. Tomioka (2016) F-gan: training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29, pp. . External Links: Link Cited by: §4.1.
  • [21] A. Odena, C. Olah, and J. Shlens (2017-06–11 Aug) Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the 34th International Conference on Machine Learning, D. Precup and Y. W. Teh (Eds.), Proceedings of Machine Learning Research, Vol. 70, pp. 2642–2651. External Links: Link Cited by: §1, §2.2, Table 1.
  • [22] A. Odena (2016) Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583. Cited by: §1.
  • [23] E. Parzen (1962)

    On estimation of a probability density function and mode

    .
    The annals of mathematical statistics 33 (3), pp. 1065–1076. Cited by: §5.1.
  • [24] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee (2016-20–22 Jun) Generative adversarial text to image synthesis. In Proceedings of The 33rd International Conference on Machine Learning, M. F. Balcan and K. Q. Weinberger (Eds.), Proceedings of Machine Learning Research, Vol. 48, New York, New York, USA, pp. 1060–1069. External Links: Link Cited by: §1.
  • [25] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, and X. Chen (2016) Improved techniques for training gans. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29, pp. . External Links: Link Cited by: §1.
  • [26] R. Shu, H. Bui, and S. Ermon (2017) Ac-gan learns a biased distribution. In

    NIPS Workshop on Bayesian Deep Learning

    ,
    Vol. 8. Cited by: §1.
  • [27] Z. Tan, M. Chai, D. Chen, J. Liao, Q. Chu, L. Yuan, S. Tulyakov, and N. Yu (2020) MichiGAN: multi-input-conditioned hair image generation for portrait editing. arXiv preprint arXiv:2010.16417. Cited by: §1.
  • [28] Z. Tan, Y. Song, and Z. Ou (2019) Calibrated adversarial algorithms for generative modelling. Stat 8 (1), pp. e224. Cited by: §5.3.1.
  • [29] D. Tran, R. Ranganath, and D. Blei (2017) Hierarchical implicit models and likelihood-free variational inference. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, pp. . External Links: Link Cited by: §4.2, §5.1, §5.3.1.
  • [30] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He (2018-06) AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
  • [31] X. Yan, J. Yang, K. Sohn, and H. Lee (2015) Attribute2Image: conditional image generation from visual attributes. arXiv preprint arXiv:1512.00570. Cited by: §1.
  • [32] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena (2019-09–15 Jun) Self-attention generative adversarial networks. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, pp. 7354–7363. External Links: Link Cited by: §5.3.1.
  • [33] P. Zhou, L. Xie, B. Ni, C. Geng, and Q. Tian (2020) Omni-gan: on the secrets of cgans and beyond. arXiv preprint arXiv:2011.13074. Cited by: §1.
  • [34] Z. Zhou, H. Cai, S. Rong, Y. Song, K. Ren, W. Zhang, J. Wang, and Y. Yu (2018) Activation maximization generative adversarial nets. In International Conference on Learning Representations, External Links: Link Cited by: §1.
  • [35] M. Zhu, P. Pan, W. Chen, and Y. Yang (2019-06) DM-gan: dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.

Appendix A Proofs

a.1 Proof of Proposition 1

See 1

Proof.

a.2 Proof of Theorem 1

See 1

Proof.

a.3 Proof of Proposition 2

See 2

Proof.

with and .

Therefore, the optimal classifier of ADC-GAN has the form of and that concludes the proof.

a.4 Proof of Theorem 2

See 2

Proof.

a.5 Proof of Theorem 3

See 3

Proof.

Appendix B More Results

b.1 Synthetic Data

In this section, we report more results on experiments conducted on the one-dimensional synthetic data and a new two-dimensional synthetic data. The one-dimensional data consists of three Gaussian components with and , and the similar for the two-dimensional data. For implementing the generator, discriminator, and classifier, we use three-layer multi-layer perceptron with hidden size of and the Tanh non-linearity. The optimizer is Adam with learning rate and betas . We train all methods for epochs with batch size of . Table 3 reports the quantitative maximum mean discrepancy (MMD) results on the one-dimensional synthetic data conducted in Section 5.1. Lower MMD means better learning results. Figure 7 and Table 4 show the qualitative and quantitative results, respectively, conducted on the two-dimensional synthetic Gaussian data. In general, the proposed ADC-GAN consistently learns the data distribution under different loss function settings.

Figure 6: Distribution learning results on the one-dimensional synthetic data.
GAN Loss Class PD-GAN AC-GAN TAC-GAN ADC-GAN
No Class0 -
Class1 -
Class2 -
Marginal -
Log Class0
Class1
Class2
Marginal
Hinge Class0
Class1
Class2
Marginal
Table 3: MMD () results of each method on the one-dimensional synthetic data.
Figure 7: Distribution learning results on the two-dimensional synthetic data.
GAN Loss Class PD-GAN AC-GAN TAC-GAN ADC-GAN
No Class0 -
Class1 -
Class2 -
Marginal -
Log Class0
Class1
Class2
Marginal
Hinge Class0
Class1
Class2
Marginal
Table 4: MMD () results of each method on the two-dimensional synthetic data.

b.2 Intra-class FID on CIFAR-10, CIFAR-100, and Tiny-ImageNet

We calculate the FID of each class and report the results on CIFAR-10 in Figure 8 and Table 5. ADC-GAN achieves better average rank and average FID than PD-GAN and TAC-GAN. We plot in Figure 9 to compare the proposed ADC-GAN with PD-GAN on the FID score for each class on CIFAR-100 and Tiny-ImageNet. The abscissa and ordinate values of a point represent the FID scores of PD-GAN and ADC-GAN, respectively, in a class. Accordingly, the point at the bottom right of the dotted line indicates that ADC-GAN is better than PD-GAN in this class, and vice versa. Particularly, we adopt the multi-class hinge loss as an alternative of the cross entropy loss in Equation 5 when trained on Tiny-ImageNet. In general, the proposed ADC-GAN achieves comparable performance with PD-GAN on CIFAR-100 and Tiny-ImageNet.

Table 5: Averaged Rank and FID on CIFAR-10. Methods Rank-avg FID-avg PD-GAN TAC-GAN ADC-GAN px Figure 8: FID scores of PD-GAN, TAC-GAN, and ADC-GAN for each class on CIFAR-10.
(a) CIFAR-100
(b) Tiny-ImageNet
Figure 9: Intra-class FID comparison between ADC-GAN and PD-GAN on CIFAR-100 and Tiny-ImageNet. The abscissa and ordinate values of a point represent the FID scores of PD-GAN and ADC-GAN in a class respectively. Diagonal dotted line is convenient for intuitive comparison.

b.3 Generated Images on CIFAR-100 and Tiny-ImageNet

(a) AC-GAN
(b) TAC-GAN
(c) ADC-GAN
(d) PD-GAN
Figure 10: Generated samples of each method on CIFAR-100.
(a) AC-GAN
(b) TAC-GAN
(c) ADC-GAN
(d) PD-GAN
Figure 11: Generated samples of each method on Tiny-ImageNet.

b.4 Representation Learning on CIFAR-10, CIFAR-100, and Tiny-ImageNet

In this section, we report all the results of the representation learning experiment. As shown in Table 6

, PD-GAN achieves higher accuracy on features of the shallow layer, AC-GAN and TAC-GAN perform better in the middle layers, and the proposed ADC-GAN achieves the highest classification accuracy on the features extracted from the deep layers. These results indicate that PD-GAN can only retain low-level features but lose high-level features. AC-GAN and TAC-GAN capture middle-level features well attributed to learning the classifier using the supervision from the annotated labels. The significant advantage of ADC-GAN on deep layers over baselines demonstrates that the discriminator/classifier of ADC-GAN has a strong capability to learn high-level features. We believe that the reason is that the classifier of ADC-GAN needs to distinguish between the real and fake data while simultaneously recognizing the labels of data, which encourages the classifier to learn more meaningful features of data.

Dataset Res-Block PD-GAN AC-GAN TAC-GAN ADC-GAN
CIFAR-10 Block-1
Block-2
Block-3
Block-4
Best
CIFAR-100 Block-1
Block-2
Block-3
Block-4
Best
Tiny-ImageNet Block-1
Block-2
Block-3
Block-4
Block-5
Best
Table 6: Top-1 accuracy results of a linear classifier based on representations from every residual block of discriminators of each method on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively.