1 Introduction
Generative adversarial networks (GAN’s) goodfellow2014generative are a recent popular technique for learning generative models for highdimensional unstructured data (typically images). GAN’s employ two networks  a generator G that is tasked with producing samples from the data distribution, and a discriminator D that aims to distinguish real samples from the samples produced by G. The two networks alternatively try to best each other, ultimately resulting in the generator G converging to the true data distribution.
While most of the research on GAN’s is focused on the unsupervised setting, where the data is comprised of unlabeled images, there has been research on conditional GAN’s gauthier2014conditional where the goal is to learn a conditional model of the data, i.e. to build a conditional model that can generate images given a particular attribute setting. In one approach gauthier2014conditional , both the generator and discriminator are fed attributes as side information so as to enable the generator to generate images conditioned on attributes. In an alternative approach proposed in odena2016conditional
, the authors build auxiliary classifier GAN’s (ACGAN’s) where side information is reconstructed by the discriminator instead. Irrespective of the specific approach, this line of research focuses on the supervised setting where it is assumed that all the images have attribute tags.
Given that labels are expensive, it is of interest to explore semisupervised settings where only a small fraction of the images have attribute tags, while a majority of the images are unlabeled. There has been some work on using GAN’s in the semisupervised setting. salimans2017improved and springenberg2015unsupervised use GAN’s to perform semisupervised classification by using a generatordiscriminator pair to learn an unconditional model of the data and finetune the discriminator using the small amount of labeled data for prediction. However, we are not aware of work on building conditional models in the semisupervised setting (see 2.1 for details). The closest work we found was ACGAN’s, which can be extended to the semisupervised setting in a straightforward manner (as was alluded to briefly by the authors in their paper).
In the proposed semisupervised GAN (SSGAN) approach, we take a different route. We instead supply the side attribute information to the discriminator as is the case with supervised GAN’s. We partition the discriminator’s task of evaluating if the joint samples of images and attributes are real or fake into two separate tasks: (i) evaluating if the images are real or fake, and (ii) evaluating if the attributes given an image are real or fake. We subsequently use all the labeled and unlabeled data to assist the discriminator with the first task, and only the labeled images for the second task. The intuition behind this approach is that the marginal distribution of the images is much harder to model relative to the conditional distribution of the attributes given an image, and by separately evaluating the marginal and conditional samples, we can exploit the larger unlabeled pool to accurately estimate the marginal distribution.
Our main contributions in this work are as follows:

We present the first extensive discussion of the semisupervised conditional generation problem using GAN’s.

Related to (1), we apply the ACGAN approach to the semisupervised setting and present experimental results.

Finally, our main contribution is a new model called SSGAN to effectively address the semisupervised conditional generative modeling problem, which outperforms existing approaches including ACGAN’s for this problem.
The rest of this paper is organized as follows: In Section 2, we describe existing work on GAN’s including details about the unsupervised, supervised and semisupervised settings. Next, in Section 3, we describe the proposed SSGAN models, and contrast the model against existing semisupervised GAN solutions. We present experimental results in Section 4, and finally, we give our conclusions in Section 5.
2 Existing GAN’s
2.1 Framework
We assume that our dataset is comprised of images
where the first images are accompanied by attributes
Each is assumed to be of dimension , where is the number of channels. The attribute tags are assumed to be discrete variables of dimension  i.e., each attribute is assumed to be dimensional and each individual dimension of an attribute tag can belong to one of different classes. Observe that this accommodates class variables (), and binary attributes (
). Finally, denote the joint distribution of images and attributes by
, the marginal distribution of images by , and the conditional distribution of attributes given images by . Our goal is to learn a generative model that can sample from for a given by exploiting information from both the labeled and unlabeled sets.2.2 Unsupervised GAN’s
In the unsupervised setting , the goal is to learn a generative model that samples from the marginal image distribution
, by transforming vectors of noise
as . In order for to learn this marginal distribution, a discriminator is trained jointly goodfellow2014generative. The unsupervised loss functions for the generator and discriminator are as follows:
(1) 
and
(2) 
The above equations are alternatively optimized with respect to and respectively. The unsupervised GAN model is illustrated in 2.
2.3 Supervised GAN’s
In the supervised setting (i.e., ), the goal is to learn a generative model that samples from the conditional image distribution , by transforming vectors of noise as . There are two proposed approaches for solving this problem:
2.3.1 Conditional GAN’s
In order for to learn this conditional distribution, a discriminator is trained jointly. The goal of the discriminator is to distinguish whether the joint samples are samples from the data or from the generator. The supervised loss functions for the generator and discriminator for conditional GAN (CGAN) are as follows:
(3) 
and
(4) 
The above equations are alternatively optimized with respect to and respectively. The conditional GAN model is illustrated in 3.
2.3.2 Auxiliaryclassifier GAN’s
An alternative approach odena2016conditional to supervised conditional generation is to only supply the images to the discriminator, and ask the discriminator to additionally recover the true attribute information. In particular, the discriminator produces two outputs: (i) and (ii)
, where the first output is the probability of
being real or fake, and the second output is the estimated conditional probability of given . In addition to the unsupervised loss functions, the generator and discriminator are jointly trained to recover the true attributes for any given images . In particular, define the attribute loss function as(5) 
The loss function for the discriminator is given by
(6) 
and for the generator is given by
(7) 
2.3.3 Comparison between CGAN and ACGAN
The key difference between CGAN and ACGAN is that instead of asking the discriminator to estimate the probability distribution of the attribute given the image as is the case in ACGAN, CGAN instead supplies discriminator
with both and asks it to estimate the probability that is consistent with the true joint distribution .While both models are designed to learn a conditional generative model, we did not find extensive comparisons between the two approaches in literature. To this end, we compared the performance of the two architectures using a suite of qualitative and quantitative experiments on a collection of data sets, and through our analysis (see Section 4), determined that CGAN typicaly outperforms ACGAN in performance.
2.4 Semisupervised GAN’s
We now consider the the semisupervised setting where , and typically . In this case, both CGAN and ACGAN can be applied to the problem. Because CGAN required the attribute information to be fed to the discriminator, it can be applied only by trivially training it only on the labeled data, and throwing away the unlabeled data. We will call this model SCGAN.
On the other hand, ACGAN can be applied to this semisupervised setting in a far more useful manner as alluded to by the authors in 2017arXiv170403971X . In particular, the adversarial loss terms and are evaluated over all the images in , while the attribute estimation loss term is evaluated over only the real images with attributes. We will call this model SAGAN. This model is illustrated in 4.
3 Proposed Semisupervised GAN
We will now propose a new model for learning conditional generator models in a semisupervised setting. This model aims to extend the CGAN architecture to the semisupervised setting that can exploit the unlabeled data unlike SCGAN, by overcoming the difficulty of having to provide side information to the discriminator. By extending the CGAN architecture, we aim to enjoy the same performance advantages over SAGAN that CGAN enjoys over ACGAN.
In particular, we consider a stacked discriminator architecture comprising of a pair of discriminators and , with tasked with with distinguishing real and fake images , and tasked with distinguishing real and fake (image, attribute) pairs . Unlike in CGAN, will separately estimate the probability that is real using both the labeled and unlabeled instances, and will separately estimate the probability that given is real using only the labeled instances. The intuition behind this approach is that the marginal distribution is much harder to model relative to the conditional distribution , and by separately evaluating the marginal and conditional samples, we can exploit the larger unlabeled pool to accurately estimate the marginal distribution.
3.1 Model description
Let denote the discriminator, which is comprised of two stacked discriminators: (i) outputs the probability that the marginal image is real or fake, and (ii) outputs the probability that the conditional attribute given the image is real or fake. The generator is identical to the generator in CGAN and ACGAN. The loss functions for the generator and the pair of discriminators are defined below:
(8) 
(9) 
and
(10) 
where controls the effect of the conditional term relative to the unsupervised term.
Model architecture:
We design the model so that depends only on the argument, and produces an intermediate output (last but one layer of unsupervised discriminator) , to which the argument is subsequently appended and fed to the supervised discriminator to produce the probability that the joint samples are real/fake. The specific architecture is shown in Figure 5.
The advantage of this proposed model which supplies to via the features learned by over directly providing the argument to is that can not overfit to the few labeled examples, and instead must rely on the features general to the whole population in order to uncover the dependency between and .
For illustration, consider the problem of conditional face generation where one of the attributes of interest is eyeglasses. Also, assume that in the limited set of labeled images, only one style of eyeglasses (e.g., glasses with thick rims) are encountered. If so, then the conditional discriminator can learn features specific to rims to detect glasses if the entire image is available to the supervised discriminator. On the other hand, the features learned by the unsupervised discriminator would have to generalize over all kinds of eyeglasses and not just rimmed eyeglasses specifically. In our stacked model, by restricting the supervised discriminator to access to the image through the features learned by the unsupervised discriminator, we ensure that the supervised discriminator now generalizes to all different types of eyeglasses when assessing the conditional fit of the glasses attribute.
3.2 Convergence analysis of model
Denote the distribution of the samples provided by the generator as . Provided that the discriminator has sufficient modeling power, following Section 4.2 in goodfellow2014generative , it follows that if we have sufficient data , and if the discriminator is trained to convergence, will converge to , and consequently, the generator will adapt its output so that will converge to .
Because is finite and typically small, we are not similarly guaranteed that will converge to , and that consequently, the generator will adapt its output so that will converge to . However, we make the key observation that because converges to though the use of , will equivalently look to converge to , and given that these distributions are discrete, plus the fact that the supervised discriminator operates on via the lowdimensional embedding , we hypothesize that will successfully learn to closely approximate even when is small. The joint use of and will therefore ensure that the joint distribution of the samples produced by the generator will converge to the true distribution .
4 Experimental results
We propose a number of different experiments to illustrate the performance of the proposed SSGAN over existing GAN approaches.
4.1 Models and datasets
We compare the results of the proposed SSGAN model against three other models:

Standard GAN model applied to the full dataset (called CGAN)

Standard GAN model applied to only the labeled dataset (called SCGAN)

Supervised ACGAN model applied to the full dataset (called ACGAN)

Semisupervised ACGAN model (called SAGAN)
We illustrate our results on 3 different datasets: (i) MNIST, (ii) celebA, and (iii) CIFAR10.
In all our experiments, we use the DCGAN architecture proposed in radford2015unsupervised , with slight modifications to the generator and discriminator to accommodate the different variants described in the paper. These modifications primarily take the form of (i) concatenating the inputs and for the supervised generator and discriminator respectively, and adding an additional output layer to the discriminator in the case of ACGAN, and connecting the last but one layer of to in the proposed SSGAN. In particular, we use the same DCGAN architecture as in radford2015unsupervised for MNIST and celebA, and a slightly modified version of the celebA architectures to accommodate the smaller 32x32 resolutions of the cifar10 dataset. The stacked DCGAN discriminator model for the celebA faces dataset is shown in Figure 6.
4.2 Evaluation criteria
We use a variety of different evaluation criteria to contrast SSGAN against the models CGAN, ACGAN, SCGAN and SAGAN listed earlier.

Visual inspection of samples: We visually display a large collection of samples from each of the models and highlight differences in samples from the different models.

Reconstruction error: We optimize the inputs to the generator to reconstruct the original samples in the dataset (see Section 5.2 in 2017arXiv170403971X ) with respect to squared reconstruction error. Given the drawbacks of reconstruction loss, we also compute the structural similarity metric (SSIM) wang2004image in addition to the reconstruction error.

Attribute/class prediction from pretrained classifier (for generator): We pretrain an attribute/class predictor from the entire training data set, and apply this predictor to the samples generated from the different models, and report the accuracy (RMSE for attribute prediction, 01 loss for class prediction).

Supervised learning error (for discriminator): We use the features from the discriminator and build classifiers on these features to predict attributes, and report the accuracy.

Sample diversity: To ensure that the samples being produced are representative of the entire population, and not just the labeled samples, we first train a classifier than can distinguish between the labeled samples (class label 0) and the unlabeled samples (class label 1). We then apply this classifier to the samples generated by each of the generators, and compute the mean probability of the samples belonging to class 0. The closer this number is to 0, the better the unlabeled samples are represented.
4.3 Mnist
The MNIST dataset contains 60,000 labeled images of digits. We perform semisupervised training with a small randomly picked fraction of these, considering setups with 10, 20, and 40 labeled examples. We ensure that each setup has a balanced number of examples from each class. The remaining training images are provided without labels.
4.3.1 Visual sample inspection
In Figure 7, we show representative samples form the 5 different models for the case with labeled examples. In addition, in figures 9, 10, 11, 12, 13, we show more detailed results for this case with 20 labeled example (two examples per digit). In these detailed results, each row corresponds to a particular digit. Both CGAN and ACGAN successfully learn to model both the digits and the association between the digits and their class label. From the results, it is clear that SCGAN learns to predict only the digit styles of each digit made available in the labeled set. While SAGAN produces greater diversity of samples, it suffers in producing the correct digits for each label. SSGAN on the other hand both produces diverse digits while also being accurate. In particular, its performance closely matches the performance of the fully supervised CGAN and ACGAN models. This is additionally borne out by the quantitative results shown in Tables 1, 2 and 3 for the cases and respectively, as shown below.
Samples source  Class pred. error  Recon. error  Sample diversity  Discrim. error 

True samples  0.0327  N/A  0.992  N/A 
Fake samples  N/A  N/A  1.14e05  N/A 
CGAN  0.0153  0.0144  1.42e06  0.1015 
ACGAN  0.0380  0.0149  1.49e06  0.1140 
SCGAN  0.0001  0.1084  0.999  0.095 
SAGAN  0.3091  0.0308  8.62e06  0.1062 
SSGAN  0.1084  0.0320  0.0833  0.1024 
Samples source  Class pred. error  Recon. error  Sample diversity  Discrim. error 

True samples  0.0390  N/A  0.994  N/A 
Fake samples  N/A  N/A  2.86e05  N/A 
CGAN  0.0148  0.01289  8.74e06  0.1031 
ACGAN  0.0189  0.01398  9.10e06  0.1031 
SCGAN  0.0131  0.0889  0.998  0.1080 
SAGAN  0.2398  0.02487  2.18e05  0.1010 
SSGAN  0.1044  0.0160  2.14e05  0.1014 
Samples source  Class pred. error  Recon. error  Sample diversity  Discrim. error 

True samples  0.0390  N/A  0.993  N/A 
Fake samples  N/A  N/A  1.63e05  N/A 
CGAN  0.0186  0.0131  1.36e05  0.1023 
ACGAN  0.0141  0.0139  6.84e06  0.1054 
SCGAN  0.0228  0.080  0.976  0.1100 
SAGAN  0.1141  0.00175  1.389e05  0.1076 
SSGAN  0.0492  0.0135  3.54e05  0.1054 
4.3.2 Discussion of quantitative results
The fraction of incorrectly classified points for each source, the reconstruction error, the sample diversity metric and the discriminator error is shown in Tables 1, 2 and 3 below. SSGAN comfortably outperforms SAGAN with respect to classification accuracy, and comfortably beats SCGAN with respect to reconstruction error (due to the limited sample diversity of SCGAN). The sample diversity metric for SSGAN is slightly worse compared to SAGAN, but significantly better than SCGAN. Taken together, in conjunction with the visual analysis of the samples, these results conclusively demonstrate that SSGAN is superior to SAGAN and SCGAN in the semisupervised setting.
From the three sets of results for the different labeled sample sizes ( and ), we see that the performance of all the models increases smoothly with increasing sample size, but with SSGAN still outperforming the other two semisupervised models for each of the settings for the number of labeled samples.
4.3.3 Semisupervised learning error
For MNIST, we run an additional experiment, where we draw samples from the various generators, train a classifier using each set of samples, and record the test error performance of this classifier. On MNIST, with 20 labeled examples, we show the accuracy of classifiers trained using samples generated from different models using MNIST in Table 4.
Samples source  10fold 01 error 

CGAN  5.1 
ACGAN  5.2 
SCGAN  12.9 
SAGAN  24.3 
SSGAN  5.4 
From the results in table 4, we see that our model SSGAN is performing close to the supervised models. In particular, we note that these results are the stateoftheart for MNIST given just 20 labeled examples (please see salimans2017improved for comparison). However, the performance as the number of labeled examples increases remains fairly stationary, and furthermore is not very effective for more complex datasets such as CIFAR10 and celebA, indicating that this approach of using samples from GAN’s to train classifiers should be restricted to very low sample settings for simpler data sets like MNIST.
4.4 celebA dataset results
CelebFaces Attributes Dataset (CelebA) liu2015faceattributes is a largescale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. Of the 40 attributes, we subselect the following 18 attributes: 0: ’Bald’, 1: ’Bangs’, 2: ’Black Hair’, 3: ’Blond Hair’, 4: ’Brown Hair’, 5: ’Bushy Eyebrows’, 6: ’Eyeglasses’, 7: ’Gray Hair’, 8: ’Heavy Makeup’, 9: ’Male’, 10: ’Mouth Slightly Open’, 11: ’Mustache’, 12: ’Pale Skin’, 13: ’Receding Hairline’, 14: ’Smiling’, 15: ’Straight Hair’, 16: ’Wavy Hair’, 17:’Wearing Hat’.
4.4.1 Visual sample inspection
In Figure 8, we show representative samples form the 5 different models for the case with labeled examples for the celebA dataset. Each row correponds to an individual model, and each column corresponds to one of the 18 different attributes listed above. In addition, we show more detailed samples generated by the 5 different models in figures 15, 14, 16, 17, and 18. In each of these figures, each row corresponds to a particular attribute type while all the other attributes are set to 0. From the generated samples, we can once again see that the visual samples produced by SSGAN are close to the quality of the samples generated by the fully supervised models CGAN and ACGAN. SCGAN when applied to the subset of data produces very poor results (significant mode collapse + poor quality of the generated images), while SAGAN is relatively worse when compared to SSGAN. For instance, SAGAN produces images with incorrect attributes for attributes 0 (faces turned to a side instead of bald), 7 (faces with hats instead of gray hair), and 12 (generic faces instead of faces with pale skin).
Samples source  Attribute RMSE  Recon. error  SSIM  Sample diversity  Disc. error 

True samples  0.04  N/A  N/A  0.99  N/A 
Fake samples  N/A  N/A  N/A  0.001  N/A 
CGAN  0.25  0.036  0.497  0.002  0.07 
ACGAN  0.29  0.047  0.076  0.005  0.06 
SCGAN  0.26  0.343  0.143  0.454  0.01 
SAGAN  0.36  0.042  0.167  0.006  0.07 
SSGAN  0.31  0.040  0.217  0.004  0.03 
4.4.2 Discussion of quantitative results
The four different quantitative metrics  The attribute prediction error, the reconstruction error, the sample diversity metric and the discriminator error  are shown in Table 5.
SSGAN comfortably outperforms SAGAN and achieves results close to the fully supervised models for the attribute prediction error metric. It is interesting to note that SCGAN produces better attribute prediction error numbers than the SAGAN model, while producing notably worse samples. We also find that with respect to reconstruction error and the SSIM metric, SSGAN marginally out performs SAGAN while coming close to the performance of the supervised CGAN and ACGAN models. As expected, SCGAN performs poorly in this case. We also find that SSGAN has a fairly low sample diversity score, marginally higher than CGAN, but better than SAGAN, and better even than the fully supervised ACGAN. Finally, SSGAN comfortably outperforms SAGAN and achieves results close to the fully supervised model with respect to the discriminator feature error.
4.5 cifar10 dataset
The CIFAR10 dataset krizhevsky2009learning consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The following are the 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.
4.5.1 Visual sample inspection
From the generated samples in figures 19, 20, 21, 22 and 23, we can see that the visual samples produced by SSGAN are close to the quality of the samples generated by CGAN. All the other three models, ACGAN, SAGAN, and SCGAN suffer from significant mode collapse. We especially found the poor results of ACGAN in the fully supervised case surprising, especially given the good performance of CGAN on cifar10, and the good performance of ACGAN on the MNIST and celebA datasets.
Samples source  Class pred. error  Recon. error  SSIM  Sample diversity  Disc. error 

True samples  0.098  N/A  N/A  1.00  N/A 
Fake samples  N/A  N/A  N/A  1.21e07  N/A 
CGAN  0.198  0.041  0.501  1.39e07  0.874 
ACGAN  0.391  0.204  0.024  1.41e06  0.872 
SCGAN  0.355  0.213  0.026  0.999  0.870 
SAGAN  0.0.468  0.173  0.021  2.30e06  0.874 
SSGAN  0.299  0.061  0.042  6.54e06  0.891 
4.5.2 Discussion of quantitative results
The different quantitative metrics computed against the cifar10 datasets are shown in Table 6. In our experiments, we find that the samples generated by SSGAN are correctly classified 70 percent of the time, which is second best after CGAN and is off from the true samples by 15 percent. We also find that the reconstruction error for SSGAN comes close to the performance of CGAN and comfortably out performs the other three models. This result is consistent with the visual inspection of the samples. The sample diversity metric for SSGAN is significantly better than SCGAN, and comparable to the other three models.
5 Conclusion and discussion
We proposed a new GAN based framework for learning conditional models in a semisupervised setting. Compared to the only existing semisupervised GAN approaches (i.e., SCGAN and SAGAN), our approach shows a marked improvement in performance over several datasets including MNIST, celebA and CIFAR10 with respect to visual quality of the samples as well as several other quantitative metrics. In addition, the proposed technique comes with theoretical convergence properties even in the semisupervised case where the number of labeled samples is finite.
From our results on all three of these datasets, we can conclude that the proposed SSGAN performs almost as well as the fully supervised CGAN and ACGAN models, even when provided with very low number of labeled samples (down to the extreme limit of just one sample per class in the case of MNIST). In particular, it comfortably outperforms the semisupervised variants of CGAN and ACGAN (SCGAN and SAGAN respectively). While the superior performance over SCGAN is clearly explained by the fact that SCGAN is only trained on the labeled data set, the performance advantage of SSGAN over SAGAN is not readily apparent. We explicitly discuss the reasons for this below:
5.1 Why does SSGAN work better than SAGAN?

Unlike ACGAN where the discriminator is tasked with recovering the attributes, in CGAN, the discriminator is asked to estimate if the pair is real or fake. This use of adversarial loss that classifies pairs as real or fake over the crossentropy loss that asks the discriminator to recover from seems to work far better as demonstrated by our experimental results. Our proposed SSGAN model learns the association between and using an adversarial loss as is the case with CGAN, while SAGAN uses the crossentropy loss over the labeled samples.

The stacked architecture in SSGAN where the intermediate features of are fed to ensures that , and in turn the generator does not overfit to the labeled samples. In particular, is forced to learn discriminative features that characterize the association between and based on the features over the entire unlabeled set learned by , which ensures generalization to the complete set of images.
References

[1]
Jon Gauthier.
Conditional generative adversarial nets for convolutional face
generation.
Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester
, 2014(5):2, 2014.  [2] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 [3] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009.

[4]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang.
Deep learning face attributes in the wild.
In
Proceedings of International Conference on Computer Vision (ICCV)
, 2015.  [5] Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585, 2016.
 [6] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
 [7] T Salimans, I Goodfellow, W Zaremba, V Cheung, A Radford, and X Chen. Improved techniques for training gans. nips, 2016. Yan, Y. Ding, P. Li, Q. Wang, Y. Xu, W. Zuo, Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation, CVPR, 2017.
 [8] Jost Tobias Springenberg. Unsupervised and semisupervised learning with categorical generative adversarial networks. arXiv preprint arXiv:1511.06390, 2015.
 [9] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
 [10] S. Xiang and H. Li. On the Effects of Batch and Weight Normalization in Generative Adversarial Networks. ArXiv eprints, April 2017.