1 Introduction
Generative Adversarial Networks (GAN) (Goodfellow et al., 2014)
learn a function that can sample from an approximated probability distribution. Due to enormous interest, GAN have been improved substantially over the past few years
(Radford et al., 2015; Gulrajani et al., 2017; Miyato et al., 2018; Karras et al., 2017; Mescheder, 2018).GANs are designed to learn a single distribution, though multiple distributions can be modeled by treating them separately. However, this naive implementation does not consider relationships between the distributions. An interesting question is how we can model multiple distributions efficiently and discover their common and unique aspects? We explain this situation by utilizing Venn diagrams. Figure 1 depicts some cases of different interactions between 3 sets, where each set represents a distribution. In , each set has its own unique part and intersections with the other sets, whereas in , some sets are a superset of others. Each case can be useful in different scenarios, e.g. can be used in a case where a distribution is a subset of another distribution, such as a specific dog breed and its superset is many different dogbreeds.
In this paper, we propose Venn GAN, which models multiple distributions efficiently and discovers their interactions and uniqueness. Each data distribution is modeled with a mixture of generator distributions. As the generators are partially shared between the modeling of different true data distributions, shared ones captures the commonality of the distributions, while nonshared ones capture unique aspects of them. Our contributions are the following:

Introducing a novel and interesting problem setting where there exists multiple distribution various configurations (See Figure 1).

Proposing a new method that can capture commonalities and particularities of various distributions with high success rate.

Thoroughly evaluating the method on various datasets, namely MNIST, Fashion MNIST, CIFAR10, Omniglot, CelebA, with compelling results.
2 Related work
Multigenerator/discriminator GAN: There have been some attempts to use multiple generator/discriminator in order to solve various issues with GAN. Arora et al. (2017); Hoang et al. (2017); Ghosh et al. (2017)
modeled a single distribution with multiple generators to capture different modes of the distribution. In order to guide the generators into different modes, they utilized a classifier which separates each generator from one another.
Durugkar et al. (2016); Neyshabur et al. (2017); JuefeiXu et al. (2017) utilized multiple discriminators to address mode collapse and optimization stability. Similarly, Doan et al. (2018) used multiple discriminators with learned importance to ease training of GAN. Tolstikhin et al. (2017) used a metalearning algorithm analogous to AdaBoost to improve coverage of modes with multiple generators.Mixture of Distributions with GAN: Some of the earlier works considered multiple generators as mixture of distributions to model a single distribution (Arora et al., 2017; Hoang et al., 2017; Ghosh et al., 2017). Our model is different, as we model multiple data distributions and share the generator distributions as component for each data distribution.
Conditional GAN: This type of GAN uses a condition, alongside noise, to generate data (Mirza & Osindero, 2014). The conditions are desired to correlate with generated data. It has been used for ImagetoImage transformation (Isola et al., 2016; Hoang et al., 2018; Yi et al., 2017), texttoimage (Reed et al., 2016)
(Ledig et al., 2016).The way GANs are conditioned is still an active research field. We have focused on conditioning of the generator. The most common way to include conditions into the generator is to provide it as input (Mirza & Osindero, 2014; Reed et al., 2016; Odena et al., 2016). Recently, Miyato & Koyama (2018) used conditional BatchNorm (de Vries et al., 2017; Dumoulin et al., 2017) to include conditions into generator.
Other Related Works: Concurrent work of (Kaneko et al., 2018) is perhaps the most similar work to ours. However, their motivation, method and experiments are different then ours. They are motivated by ambiguous class labels, due to noisy labels, and propose a model to discover classdistinct and classmutual parts. Their method utilizes modified version of ACGAN and redesigns input of to achieve the objective. While our work scales GAN objective into distributions and models each distribution as mixture of generator distributions.
3 Method
3.1 Background
GAN is a two player zerosum game between a discriminator and generator:
(1) 
(2)  
It utilizes a discriminator to assess a peudodivergence between the true data distribution, , and the generator’s distribution, . The discriminator maximizes the divergence, while the generator minimizes it. In this way, the generator learns to mimic the data distribution implicitly. Goodfellow et al. (2014) show that, under certain assumptions, for a fixed optimal , minimizing Eq. 2 for would lead to .
3.2 Multidistribution GAN
The value function, Eq. 1, can be scaled to distributions trivially as follows:
(3) 
(4)  
where is th true data distribution and is th generator’s distribution, which are independent from one another. Note that and ^{2}^{2}2Eq. 4 does not explicitly show but which is distribution of th generator, in above equation interact with one another only when . This makes learning one distribution independent from the others. By following the proof from Goodfellow et al. (2014), we can show that, at equilibrium .
However this objective does not consider possible overlaps between the data distributions. Incorporating this can make the model more efficient and leads to interesting discoveries, e.g. commonalities and particularities of the distributions. In order to achieve this, we have reformulated the way we construct generator distributions, . It is no longer equal to the distribution of th generator, but a mixture of generator distributions, denoted by , Eq. 5. In this way, each data distribution is modeled as a mixture of generators’ distributions. As are shared for all data distributions, some of them cover common parts and others unique ones. Each generators learns only subpart of the distributions and combines them at different amounts to make the data distributions.
3.3 Conceptual Explanation: Relation to Venn Diagrams
The method in the previous section can be explained by using Venn diagrams where each set represents a distribution. We deal with a situation where multiple distributions exist. Each distribution might have a unique part and commonalities with other distributions e.g. of Figure 1. In another case, one distribution’s support might cover the others’ e.g. of Figure 1. Our proposed method models each region of a Venn diagrams as a probability distribution . Each set should capture the distribution of its corresponding data distribution, e.g. . Each set can be represented by union of its regions, e.g. of Figure 1, . Similarly, each region can be represented with set operations e.g. of Figure 1, . Set configurations can be in different forms e.g. of Figure 1 is .
type diagram can be represented by:
(6) 
Similarly type diagram can be represented by:
(7) 
In both cases we assume that each region contributes equally. Learning mixture weights is left for future study.
3.4 Implementation Details
Generator side: We can use two approaches to model the generators (). The first is the use of independent generators for each region. Each generator is modeled by , where is a generative network, is input noise and are the parameters of the th network. The second approach is a single generator with conditions. Each region is modeled with a function , is a condition whose th index used to generate region and are the network parameters. The former approach can be expensive when there are many region to model, however it has its own advantage as we will show in the experiments. Conditional generator is more efficient as the number of regions grows exponentially with distributions, e.g. distributions contain up to regions. Also, sharing weights with other generators regularizes the model and makes the training easier. Besides, using this type of generator has certain effects on modeling, namely different conditions with the same noise produce semantically related samples, as detailed in the CelebA experiments. We use both types and discuss their advantages and disadvantages in more detail in the experiment section.
Discriminator side: There should be discriminators for distribution game. As we have changed generator distribution into a mixture of distributions, each discriminator takes input from all incoming generators, which has nonzero mixture weight. Figure 2 illustrates how a type diagrams looks like in terms on connections. Other types can be constructed in a similar way by following the connection pattern from the weight matrix . When sampling from the generators to feed into , the number of samples from each generator should be proportional to th row of . The “” sign in the diagram corresponds to union operation over the incoming regions. In practice it is concatenation over batch dimensions. As each set should represent a true data distribution, , union of regions that belongs to should match to data distribution. In order to satisfy this, each discriminator, , compares a specific true data distribution, , with union of regions, , which belongs to the corresponding set e.g. . As certain regions are fed into more than one discriminator, those regions would be forced to represent common parts of the distributions. For example, will suffer a loss if its modeling does not satisfy the 3way intersection of the distributions. In other words, it will receive a negative feedback from the discriminator(s) which it could not satisfy. Similar analogies can be made to , , which are 2way intersections, whereas individual regions like , , are only used by a single discriminator, thereby they are inclined to model the unique part of its corresponding distribution. Sharing the regions between different discriminators which receive different true data distributions is the core dynamic of learning commonalities between true data distributions. We make the assumption that all the regions in a distribution have equal weights.
The objective of the model is a minimax game with discriminators for distribution game is stated in Eq. 3 and Eq. 4. In Eq. 5, we show how can be represented. From Venn diagrams perspective, it can be also represented by:
(8) 
where is number of regions in set .
In practice we observe that there is some amount of leakage between regions. In order to alleviate this issue, we include an additional objective, which aims to separate regions of the generator from one another:
(9) 
where is the category for and is a classifier which outputs probability distribution over the regions. With this objective, the classifier tries to separate the regions and the generator tries to satisfy the classifier by increasing differences between the regions. Similar losses has been used by (Hoang et al., 2018) previously. The combined objective becomes:
(10) 
where is balancing hyperparameter between the two terms.
4 Experiments
Network Architecture: Discriminator and generator architectures are similar to DCGAN (Radford et al., 2015) for MNIST, FashionMNIST, Omniglot and CIFAR10, while CelebA uses ResNet type architecture with detailed specifications given in the Appendix. The classifier architecture is the same as the discriminator except for the last layer, whose output dimensions equal the number of regions. Exponential Moving Average (EMA) (Karras et al., 2017; Yazıcı et al., 2018) has been used over generator(s) parameters out of training loop. Conditioning of is similar to that of Miyato & Koyama (2018); de Vries et al. (2017); Dumoulin et al. (2017) except that there is no normalization but scaling and addition.
Objective Details: Zero gradient penalty (Mescheder, 2018) has been applied on true data distributions for each discriminator with weight in every case but illustrative examples. We found that this improves the quality of generation, especially in CelebA.
Optimization & Hyperparameters:
We have used ADAM (Kingma & Ba, 2014) optimizer with learning rate of , and . The optimization of discriminator and generator follows alternating update rule with single discriminator update per generator update. The model has been trained for 100k iterations for CelebA, 50k for CIFAR10, 20k for MNIST, FashionMNIST and Omniglot. For each region, we use a batch size of , except for illustrative example which uses . The batch size of real data depends on the number of regions fed to each discriminator. Union over regions would corresponds to a batch size of . is selected as by searching over range of with quantitative score (will be explained shortly) over various scenarios. Classifier’s optimization is the same with the discriminators’. For the conditional generator, we have used the same noise for different conditions during training. The illustrative example does not use a classifier.Quantification of Results: In case of artificial datasets, we can quantify the rate of correct generation (accuracy) for different regions. In order to achieve this, we have trained a separate classifier on MNIST, fashionMNIST (Xiao et al., 2017) and CIFAR10 (Krizhevsky et al., ) by using their training data split. This model is used to assess if the generated images from each region belongs to the correct class. We use 10k generated samples from each region to assess the quantity. The accuracy of the classifier on each region is used as the metric. The details about architecture, optimization etc. for the classifier can be found in the Appendix. The accuracy of the classifier on test sets for MNIST, fashionMNIST and CIFAR10 are 99.12, 91.20 and 84.20 respectively. During the VennGAN training, we have measured the model at every 2k iterations and report the best average results.
4.1 Illustrative Examples
We use mixture of Gaussians illustrative example to show that the method works as anticipated. The nature of the dataset and its dimensionality make it easier to spot subtle behaviours of the method. For this experiment we generate 3 different data distributions where each data distribution equally mixes 4 out of 7 Gaussians as in Figure 3.
In order to model these distributions, we have used type with . The experiment is conducted with independent generators for 5k iterations. Further details about the training, architecture etc. are in Appendix. Figure 4 shows the results. All the regions are generated at the correct position, e.g. the pink samples generated by , which is the common mode of all the distributions. We have conducted this experiment multiple times with no notable differences which shows stability of the method.
4.2 Main Experiments
We have designed multiple artificial datasets as well as natural datasets to investigate the working dynamic of the method. For artificial datasets, MNIST, fashionMNIST (Xiao et al., 2017) and CIFAR10 (Krizhevsky et al., ) have been used. By using these datasets, we have designed 2 and 3 distribution games with , and type Venn diagrams. The distributions are constructed by using the class information from the datasets as per Table 2. For all types, each distribution contains 2000 samples from the classes it includes. We never use the same sample twice for different distributions, which could lead to trivial solutions.
Case  Venn Type  Distributions  Sets 

A  2  ,  
B  3  , ,  
C  3  , , 
FashionMNIST  Tshirt/top  Trouser  Pullover  Dress  Coat  Sandal  Shirt  Sneaker  Bag  Ankle boot 

Cifar10  Airplane  Automobile  Bird  Cat  Deer  Dog  Frog  Horse  Ship  Truck 
Figure 5, 7, 6 shows the results for cases A, B, and C respectively. In case A of MNIST, , and are correctly modeled. Similarly, for FashionMNIST, , and are correctly modeled. CIFAR10 image quality is not as good as the others, so it is not easy to make a judgment. However, from the recognizable classes we see that “Automobile”, “Horse”, “Ship”, “Truck” appears in the right region. In case B of MNIST and FashionMNIST, the vast majority of the object appears in the right region with good image quality. For CIFAR10, the results are decent for “Airplane”, “Automobile”, “Deer”. For other regions the quality is not satisfactory and there seems to be some amount of leaking. In case C, we see near perfect performance in case of MNIST and fashionMNIST. Objects are placed in the right regions and image quality is good enough to recognize the objects. Image quality of CIFAR10 is again not very good, but the objects seems to be placed in the right regions. For example only includes “Airplane”, “Automobile” and “Bird”, while only includes “Frog”, “Horse”, “Ship” and “Truck”.
Dataset  Case  Classifier  IG  Avg  
MNIST  A  Yes  Yes  99.76  99.11  81.72  n/a  n/a  n/a  n/a  93.53 
MNIST  A  Yes  No  99.69  98.86  83.40  n/a  n/a  n/a  n/a  93.98 
FMNIST  A  Yes  Yes  91.37  87.75  80.17  n/a  n/a  n/a  n/a  86.43 
FMNIST  A  Yes  No  90.15  86.48  80.92  n/a  n/a  n/a  n/a  85.85 
CIFAR10  A  Yes  Yes  78.03  75.19  58.07  n/a  n/a  n/a  n/a  70.42 
CIFAR10  A  Yes  No  72.23  71.65  52.78  n/a  n/a  n/a  n/a  65.55 
MNIST  B  Yes  Yes  99.33  100.0  95.67  98.44  98.22  99.64  99.58  98.70 
MNIST  B  Yes  No  99.32  100.0  96.05  98.75  98.14  99.56  99.36  98.74 
FMNIST  B  Yes  Yes  73.03  97.36  70.43  68.33  91.02  92.09  18.59  72.97 
FMNIST  B  Yes  No  71.86  98.07  68.45  71.04  93.17  91.54  18.02  73.16 
CIFAR10  B  Yes  Yes  83.57  58.71  10.63  53.14  2.81  51.93  28.28  41.30 
CIFAR10  B  Yes  No  88.3  52.84  11.43  51.99  2.98  52.78  35.29  42.23 
MNIST  C  Yes  Yes  99.5  n/a  n/a  n/a  n/a  93.85  94.19  95.85 
MNIST  C  Yes  No  99.12  n/a  n/a  n/a  n/a  93.08  93.64  95.28 
FMNIST  C  Yes  Yes  94.88  n/a  n/a  n/a  n/a  85.25  67.49  82.54 
FMNIST  C  Yes  No  94.41  n/a  n/a  n/a  n/a  83.17  67.5  81.69 
CIFAR10  C  Yes  Yes  85.63  n/a  n/a  n/a  n/a  70.57  63.83  73.34 
CIFAR10  C  Yes  No  77.39  n/a  n/a  n/a  n/a  66.82  61.85  68.69 
MNIST  A  No  No  99.54  98.88  81.45  n/a  n/a  n/a  n/a  93.29 
FMNIST  A  No  No  90.48  86.59  80.12  n/a  n/a  n/a  n/a  85.73 
CIFAR10  A  No  Yes  76.4  73.32  60.93  n/a  n/a  n/a  n/a  70.22 
MNIST  B  No  No  98.72  99.99  95.28  99.08  97.40  99.29  99.29  98.43 
FMNIST  B  No  No  67.89  97.81  63.79  68.18  88.98  91.91  15.68  70.61 
CIFAR10  B  No  Yes  85.17  51.64  9.27  51.44  2.89  46.48  22.97  38.55 
MNIST  C  No  No  98.49  n/a  n/a  n/a  n/a  92.88  93.71  95.03 
FMNIST  C  No  No  92.57  n/a  n/a  n/a  n/a  84.04  67.24  81.28 
CIFAR10  C  No  Yes  86.14  n/a  n/a  n/a  n/a  71.85  61.77  73.25 
Table 3 lists quantitative results for the experiments above. Interestingly, MNIST performs best in case B, while the same case is the hardest for FashionMNIST and CIFAR10. We believe this is due to the clear separation between the classes in MNIST, while there are a few hard to distinguish classes in FashionMNIST such as “Pullover”, “Coat”, “Shirt”. As expected, average accuracy drops as the dataset becomes harder ().
Conditional Generator vs. Independent Generators: In case of MNIST and FashionMNIST, conditional generator produces comparable or slightly better results, while independent generators are better for CIFAR10. We postulate that in case of simple datasets, single conditional generator has sufficient capacity to match the quality of multiple generators. Besides, sharing most of the weights with different regions regularizes the training, as there are many common features between regions. However when it comes to CIFAR10, sharing weights might be a burden for the representation of different regions rather than a regularization.
Effect of the Classifier: As explained in the method section, we have utilized a classifier to alleviate leaking issues between regions. In this section we evaluate its effectiveness on various datasets. In order to reduce the number of setting we use conditional generators for MNIST and FashionMNIST and independent generators for CIFAR10 due to reasons explained in the previous section. The bottom section of Table 3 belongs to 9 different settings without classifier term in the objective. At all settings there are slight but consistent improvements. For CIFAR10, improvements are more significant than for the other datasets.
Omniglot (Lake et al., 2015) contains letters from many alphabets. Each alphabet contains a certain number of letters, and there are samples per letter, which make this dataset hard to model. We have selected “Cyrillic”, “Greek” and “Latin” alphabets as 3 different distributions. As these alphabets include both unique and common letters, we aim to model it with type modeling to discover both unique and common letters.
In Figure 8, the first three regions corresponds to only “Cyrillic”, “Greek” and “Latin” in order. The majority of the letters in each of these regions belongs to their own alphabet and not in others. For other regions there are more mistakes like the letter “o” appearing in multiple regions.
CelebA (Liu et al., 2015): For this dataset, we use both and types with two distributions. In case of , the first distribution contains only male faces while the second one contains females. In case of , the first distribution contains only female faces while the second one contains both genders. Our aim is to see whether semantic commonalities and differences of the distributions can be captured successfully. In setting, there should be no overlap in genders but we are interested in what type of commonalities our method can find. We have used conditional generator for this experiment to see the semantic relations between the regions more clearly.
In CelebA (Figure 9), depicts stereotype masculine faces with short hair and masculine faces, whereas exhibits predominantly feminine features like long hair etc. On the other hand, features faces which are neither predominantly male nor female. As the images in different regions are generated with the same noise, pose and background of an image at different regions remain similar, while the facial attributes change. Similarly, CelebA (Figure 10) shows that the model can capture commonalities of the distributions well, , correctly with all male faces, while the difference, , are female faces as it should be. Again, due to the same noise, generations between different regions can be compared. Both experiments show that Venn GAN can capture high level semantic commonality between high dimensional complex distributions.
5 Discussion & Conclusion
In this paper, we have used prior knowledge to choose the Venn type or matrix. When we know that the distributions have intersections and unique parts, or type has been used; if a distribution is subset of another one, then we have utilized . We note that certain distributions may not fall under either one of those two types. If we have a prior knowledge about the type of the distributions, then this method can be utilized easily. In case we have no prior knowledge about it, the ideal situation would be learning it, which we leave for future work.
The main limitation of the method is that it takes union over each region with equal probability, which is a strong assumption in many cases. In an ideal situation we should optimize
endtoend with the model parameters. One challenge is that the mixture weights are discrete, as in practice we use the number of samples to approximate them. However this can be handled with a reinforcement learning algorithm. Another bigger challenge is to find a meaningful reward signal for the training of
. This reward should negatively correlate with “leaks” between the regions. We think this is also an important future research direction.In conclusion, we have proposed a novel multidistribution GAN method which can discover particularities and commonalities between distributions. Our method models each data distribution with a mixture of generator distributions. As the generators are partially shared between the modeling of different true data distributions, shared ones captures the commonality of the distributions, while nonshared ones capture unique aspects of them. We have successfully trained it on various datasets to show its effectiveness. We believe this method has good potential for new applications and better data modeling.
Acknowledgments
Yasin Yazıcı was supported by a SINGA scholarship from the Agency for Science, Technology and Research (A*STAR). Georgios Piliouras would like to acknowledge SUTD grant SRG ESD 2015 097, MOE AcRF Tier 2 Grant 2016T21170 and NRF 2018 Fellowship NRFNRFF201807. This research is partially supported by the Agency for Science, Technology and Research (A*STAR) under its AME Programmatic Funds (Project No.A1892b0026). This research was carried out at Advanced Digital Sciences Center (ADSC), Institute for Infocomm Research (I2R) and at the RapidRich Object Search (ROSE) Lab at the Nanyang Technological University, Singapore. The ROSE Lab is supported by the National Research Foundation, Singapore, and the Infocomm Media Development Authority, Singapore. Research at I2R was partially supported by A*STAR SERC Strategic Funding (A1718g0045). The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg).
References
 Arora et al. (2017) Arora, S., Ge, R., Liang, Y., Ma, T., and Zhang, Y. Generalization and equilibrium in generative adversarial nets (GANs). CoRR, abs/1703.00573, 2017. URL http://arxiv.org/abs/1703.00573.
 de Vries et al. (2017) de Vries, H., Strub, F., Mary, J., Larochelle, H., Pietquin, O., and Courville, A. Modulating early visual processing by language. In Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 6597–6607, December 2017. arxiv: 1707.00683.
 Doan et al. (2018) Doan, T., Monteiro, J., Albuquerque, I., Mazoure, B., Durand, A., Pineau, J., and Devon Hjelm, R. Online adaptative curriculum learning for GANs. ArXiv eprints, July 2018.
 Dumoulin et al. (2017) Dumoulin, V., Shlens, J., and Kudlur, M. A learned representation for artistic style. In International Conference on Learning Representations 2017 (Conference Track), 2017. URL https://openreview.net/forum?id=BJOBuT1g.
 Durugkar et al. (2016) Durugkar, I. P., Gemp, I., and Mahadevan, S. Generative multiadversarial networks. CoRR, abs/1611.01673, 2016. URL http://arxiv.org/abs/1611.01673.
 Ghosh et al. (2017) Ghosh, A., Kulharia, V., Namboodiri, V. P., Torr, P. H. S., and Dokania, P. K. Multiagent diverse generative adversarial networks. CoRR, abs/1704.02906, 2017. URL http://arxiv.org/abs/1704.02906.
 Goodfellow et al. (2014) Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pp. 2672–2680. 2014. URL http://papers.nips.cc/paper/5423generativeadversarialnets.pdf.
 Gulrajani et al. (2017) Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. Improved training of Wasserstein GANs. pp. 5769–5779, December 2017. arxiv: 1704.00028.
 Hoang et al. (2017) Hoang, Q., Nguyen, T. D., Le, T., and Phung, D. Q. Multigenerator generative adversarial nets. CoRR, abs/1708.02556, 2017. URL http://arxiv.org/abs/1708.02556.
 Hoang et al. (2018) Hoang, Q., Nguyen, T. D., Le, T., and Phung, D. MGAN: Training generative adversarial nets with multiple generators. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rkmu5b0a.
 Isola et al. (2016) Isola, P., Zhu, J., Zhou, T., and Efros, A. A. Imagetoimage translation with conditional adversarial networks. CoRR, abs/1611.07004, 2016. URL http://arxiv.org/abs/1611.07004.
 JuefeiXu et al. (2017) JuefeiXu, F., Boddeti, V. N., and Savvides, M. Gang of gans: Generative adversarial networks with maximum margin ranking. CoRR, abs/1704.04865, 2017. URL http://arxiv.org/abs/1704.04865.
 Kaneko et al. (2018) Kaneko, T., Ushiku, Y., and Harada, T. Classdistinct and classmutual image generation with gans. CoRR, abs/1811.11163, 2018. URL http://arxiv.org/abs/1811.11163.
 Karras et al. (2017) Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. CoRR, abs/1710.10196, 2017. URL http://arxiv.org/abs/1710.10196.
 Kingma & Ba (2014) Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. URL http://arxiv.org/abs/1412.6980.
 (16) Krizhevsky, A., Nair, V., and Hinton, G. Cifar10 (canadian institute for advanced research). URL http://www.cs.toronto.edu/~kriz/cifar.html.
 Lake et al. (2015) Lake, B. M., Salakhutdinov, R., and Tenenbaum, J. B. Humanlevel concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015. doi: 10.1126/science.aab3050.
 Ledig et al. (2016) Ledig, C., Theis, L., Huszar, F., Caballero, J., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., and Shi, W. Photorealistic single image superresolution using a generative adversarial network. CoRR, abs/1609.04802, 2016. URL http://arxiv.org/abs/1609.04802.

Liu et al. (2015)
Liu, Z., Luo, P., Wang, X., and Tang, X.
Deep learning face attributes in the wild.
In
Proc. International Conference on Computer Vision (ICCV)
, 2015.  Mescheder (2018) Mescheder, L. M. On the convergence properties of GAN training. CoRR, abs/1801.04406, 2018. URL http://arxiv.org/abs/1801.04406.
 Mirza & Osindero (2014) Mirza, M. and Osindero, S. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014. URL http://arxiv.org/abs/1411.1784.
 Miyato & Koyama (2018) Miyato, T. and Koyama, M. cGANs with projection discriminator. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=ByS1VpgRZ.
 Miyato et al. (2018) Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral normalization for generative adversarial networks. CoRR, abs/1802.05957, 2018. URL http://arxiv.org/abs/1802.05957.
 Neyshabur et al. (2017) Neyshabur, B., Bhojanapalli, S., and Chakrabarti, A. Stabilizing GAN training with multiple random projections. CoRR, abs/1705.07831, 2017. URL http://arxiv.org/abs/1705.07831.
 Odena et al. (2016) Odena, A., Olah, C., and Shlens, J. Conditional Image Synthesis With Auxiliary Classifier GANs. ArXiv eprints, October 2016.
 Radford et al. (2015) Radford, A., Metz, L., and Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. CoRR, abs/1511.06434, 2015. URL https://arxiv.org/abs/1511.06434.
 Reed et al. (2016) Reed, S. E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. Generative adversarial text to image synthesis. CoRR, abs/1605.05396, 2016. URL http://arxiv.org/abs/1605.05396.
 Tolstikhin et al. (2017) Tolstikhin, I., Gelly, S., Bousquet, O., SimonGabriel, C.J., and Schölkopf, B. AdaGAN: Boosting generative models. ArXiv eprints, January 2017.
 Xiao et al. (2017) Xiao, H., Rasul, K., and Vollgraf, R. FashionMNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017. URL https://arxiv.org/abs/1708.07747.
 Yazıcı et al. (2018) Yazıcı, Y., Foo, C.S., Winkler, S., Yap, K.H., Piliouras, G., and Chandrasekhar, V. The unusual effectiveness of averaging in GAN training. CoRR, abs/1806.04498, June 2018. URL https://arxiv.org/abs/1806.04498.
 Yi et al. (2017) Yi, Z., Zhang, H., Tan, P., and Gong, M. Dualgan: Unsupervised dual learning for imagetoimage translation. CoRR, abs/1704.02510, 2017. URL http://arxiv.org/abs/1704.02510.
Appendix A Network Architectures
Prior distribution for the generator(s) is a 128dimensional isotropic Gaussian distribution. If not mentioned, stride and padding of the convolution is
. “Cond” is conditioning which is linear scaling and addition for each feature channel. It is not used when multiple generators utilized. “LReLU” is LeakyReLU with .Layers  Act.  Output Shape 

Latent vector 
  128 x 1 x 1 
Conv 4 x 4, pad=3  Cond  LReLU  128 x 4 x 4 
Conv 4 x 4, pad=3  Cond  LReLU  128 x 7 x 7 
Upsample    128 x 14 x 14 
Conv 3 x 3, pad=1  Cond  LReLU  64 x 14 x 14 
Upsample    64 x 28 x 28 
Conv 3 x 3, pad=1  Cond  LReLU  32 x 28 x 28 
Conv 3 x 3, pad=1  Tanh  1 x 28 x 28 
Layers  Act.  Output Shape 

Input image    3 x 28 x 28 
Conv 4 x 4, st=3  LReLU  64 x 14 x 14 
Conv 4 x 4, st=3  LReLU  128 x 7 x 7 
Conv 4 x 4, st=3  LReLU  256 x 3 x 3 
Conv 3 x 3, st=1, pad=0  Squeeze  1 
Layers  Act.  Output Shape 

Latent vector    128 x 1 x 1 
Conv 4 x 4, pad=3  Cond  LReLU  512 x 4 x 4 
Upsample    512 x 8 x 8 
Conv 3 x 3  Cond  LReLU  256 x 8 x 8 
Upsample    256 x 16 x 16 
Conv 3 x 3  Cond  LReLU  128 x 16 x 16 
Upsample    128 x 32 x 32 
Conv 3 x 3  Cond  LReLU  64 x 32 x 32 
Conv 3 x 3  Tanh  3 x 32 x 32 
Layers  Act.  Output Shape 

Input image    3 x 32 x 32 
Conv 3 x 3  LReLU  64 x 32 x 32 
Conv 3 x 3  LReLU  128 x 32 x 32 
Downsample    128 x 16 x 16 
Conv 3 x 3  LReLU  128 x 16 x 16 
Conv 3 x 3  LReLU  256 x 16 x 16 
Downsample    256 x 8 x 8 
Conv 3 x 3  LReLU  256 x 8 x 8 
Conv 3 x 3  LReLU  512 x 8 x 8 
Downsample    512 x 4 x 4 
Conv 4 x 4, st=1, pad=0  Squeeze  1 
Layers  Act.  Output Shape 

Latent vector    128 x 1 x 1 
Conv 4 x 4, pad=3  Cond  512 x 4 x 4 
ResBlock    512 x 4 x 4 
Upsample  Cond  512 x 8 x 8 
ResBlock    512 x 8 x 8 
Upsample  Cond  512 x 16 x 16 
ResBlock    256 x 16 x 16 
Upsample  Cond  256 x 32 x 32 
ResBlock    128 x 32 x 32 
Upsample  Cond  128 x 64 x 64 
ResBlock  LReLU  Cond  64 x 64 x 64 
Conv 3 x 3  Tanh  3 x 64 x 64 
Layers  Act.  Output Shape 

Input image    3 x 64 x 64 
Conv 3 x 3    64 x 64 x 64 
ResBlock    64 x 64 x 64 
Downsample    64 x 32 x 32 
ResBlock    128 x 32 x 32 
Downsample    128 x 16 x 16 
ResBlock    256 x 16 x 16 
Downsample    256 x 8 x 8 
ResBlock    512 x 8 x 8 
Downsample    512 x 4 x 4 
ResBlock  LReLU  512 x 4 x 4 
Conv 4 x 4, st=1, pad=0  Squeeze  1 
Appendix B Training of the classifiers for Quantification
For MNIST, FashionMNIST and CIFAR10, we have trained 3 separate classifier to assess quality of the method. For each dataset, the architecture is the same with the discriminator used for that dataset except the last layer which outputs logits value instead of . We have used ADAM optimizer with learning rate of , and . Each model has been trained for 50k iterations with a batch size of . The accuracy of the classifier on test sets for MNIST, fashionMNIST and CIFAR10 are 99.12, 91.20 and 84.20 respectively.
Appendix C Illustrative Examples
For this experiments, we have used generators and discriminators. The network architecture for generators is 4 fully connected layer followed by LeakyReLU except the last one which is linear. The discriminators’ are constructed from 4 fully connected layers followed by LeakyReLU except the last one which is linear. In both networks, each layer has 256 units while last layer of generator has and last layer of the discriminator has . Prior distribution for the generators is a 128dimensional isotropic Gaussian distribution. We have used ADAM (Kingma & Ba, 2014) optimizer with learning rate of , and . The optimization of discriminator and generator follows alternating update rule with single discriminator update per generator update. The model has been trained for 5k iterations. For each region (generator), we use a batch size of . of regularizer is .
Comments
There are no comments yet.