Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs

10/09/2018 ∙ by Yogesh Balaji, et al. ∙ 10

Building on the success of deep learning, two modern approaches to learn a probability model of the observed data are Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs). VAEs consider an explicit probability model for the data and compute a generative distribution by maximizing a variational lower-bound on the log-likelihood function. GANs, however, compute a generative model by minimizing a distance between observed and generated probability distributions without considering an explicit model for the observed data. The lack of having explicit probability models in GANs prohibits computation of sample likelihoods in their frameworks and limits their use in statistical inference problems. In this work, we show that an optimal transport GAN with the entropy regularization can be viewed as a generative model that maximizes a lower-bound on average sample likelihoods, an approach that VAEs are based on. In particular, our proof constructs an explicit probability model for GANs that can be used to compute likelihood statistics within GAN's framework. Our numerical results on several datasets demonstrate consistent trends with the proposed theory.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 7

page 8

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Learning generative models is becoming an increasingly important problem in machine learning and statistics with a wide range of applications in self-driving cars

(Santana & Hotz, 2016), robotics (Hirose et al., 2017)

, natural language processing

(Lee & Tsao, 2018), domain-transfer (Sankaranarayanan et al., 2018), computational biology (Ghahramani et al., 2018), etc. Two modern approaches to deal with this problem are Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) and Variational AutoEncoders (VAEs) (Makhzani et al., 2015; Rosca et al., 2017; Tolstikhin et al., 2017; Mescheder et al., 2017b).

VAEs compute a generative model by maximizing a variational lower-bound on average sample likelihoods using an explicit probability distribution for the data. GANs, however, learn a generative model by minimizing a distance between observed and generated distributions without considering an explicit probability model for the data. Empirically, GANs have been shown to produce higher-quality generative samples than that of VAEs (Karras et al., 2017). However, since GANs do not consider an explicit probability model for the data, we are unable to compute sample likelihoods using their generative models. Computations of sample likelihoods and posterior distributions of latent variables are critical in several statistical inference. Inability to obtain such statistics within GAN’s framework severely limits their applications in such statistical inference problems.

Figure 1: A statistical framework for GANs. By training a GAN architecture, we first compute optimal generator function and optimal coupling between the observed variable and the latent variable . The likelihood of a test sample can then be lower-bounded using a combination of three terms: (1) the expected distance of to the distribution learnt by the generative model, (2) the entropy of the coupled latent variable given and (3) the likelihood of the coupled latent variable with .

In this paper, we resolve these issues for a general formulation of GANs by providing a theoretically-justified approach to compute sample likelihoods using GAN’s generative model. Our results can open new directions to use GANs in massive-data applications such as model selection, sample selection, hypothesis-testing, etc (see more details in Section 5). Below, we state our main results informally without going into technical conditions while precise statements of our results are presented in Section 2.

Let and represent observed (i.e. real) and generative (i.e. fake or synthetic) variables, respectively. (i.e. the latent variable) is the randomness used as the input to the generator . Consider the following explicit probability model of the data given a latent sample :

(1.1)

where

is a loss function. Under this explicit probability model, we show that minimizing the objective of an

optimal transport GAN (e.g. Wasserstein GAN Arjovsky et al. (2017)) with the cost function and an entropy regularization (Cuturi, 2013; Seguy et al., 2017) maximizes a variational lower-bound on average sample likelihoods. I.e.

(1.2)

If , the optimal transport (OT) GAN simplifies to WGAN (Arjovsky et al., 2017) while if , the OT GAN simplifies to the quadratic GAN (or, W2GAN) (Feizi et al., 2017). The precise statement of this result can be found in Theorem 1. This result provides a statistical justification for GAN’s optimization and puts it in par with VAEs whose goal is to maximize a lower bound on sample likelihoods. We note that the entropy regularization has been proposed primarily to improve computational aspects of GANs (Cuturi, 2013). Our results provide an additional statistical justification for this regularization term. Moreover, using GAN’s training, we obtain a coupling between the observed variable and the latent variable . This coupling provides the conditional distribution of the latent variable given an observed sample . The explicit model of equation 1.1 acts similar to the decoder in the VAE framework, while the coupling computed using GANs acts as an encoder.

Connections between GANs and VAEs have been investigated in some of the recent works as well  (Hu et al., 2018; Mescheder et al., 2017a). In  Hu et al. (2018), GANs are interpreted as methods performing variational inference on a generative model in the label space. In their framework, observed data samples are treated as latent variables while the generative variable is the indicator of whether data is real or fake. The method in  Mescheder et al. (2017a), on the other hand, uses an auxiliary discriminator network to rephrase the maximum-likelihood objective of a VAE as a two-player game similar to the objective of a GAN. Our method is different from both these approaches as we consider an explicit probability model for the data, and show that the entropic GAN objective maximizes a variational lower bound under this probability model, thus allowing sample likelihood computation in GANs similar to VAEs.

Another key question that we address here is how to estimate the likelihood of a new sample

given the generative model trained using GANs. For instance, if we train a GAN on stop-sign images, upon receiving a new image, one may wish to compute the likelihood of the new sample according to the trained generative model. In standard GAN formulations, the support of the generative distribution lies on the range of the optimal generator function. Thus, if the observed sample does not lie on that range (which is very likely in practice), there is no way to assign a sensible likelihood score to that sample. Below, we show that using the explicit probability model of equation 1.1, we can lower-bound the likelihood of this sample . This is similar to the variational lower-bound on sample likelihoods used in VAEs. Our numerical results show that this lower-bound well-reflect the expected trends of the true sample likelihoods.

Let and be the optimal generator function and the optimal coupling between real and latent variables, respectively. The optimal coupling can be computed efficiently for entropic GANs as we explain in Section 3. For other GAN architectures, one may approximate such optimal couplings as we explain in Section 4. Let be a new test sample. We can lower-bound the log likelihood of this sample as

(1.3)

We present the precise statement of this result in Corollary 2. This result combines three components in order to approximate the likelihood of a sample given a trained generative model:

  • The distance between to the generative model. If this distance is large, the likelihood of observing from the generative model is small.

  • The entropy of the coupled latent variable. If the entropy term is large, it means that the coupled latent variable has a large randomness. This contributes positively to the sample likelihood.

  • The likelihood of the coupled latent variable. If latent samples have large likelihoods, the likelihood of the observed test sample will be large as well.

Figure 1(a) provides a pictorial illustration of these components.

In what follows, we explain the technical ingredients of our main results. In Section 3, we present computational methods for GANs and entropic GANs, while in Section 4, we provide numerical experiments on MNIST(LeCun et al., 1998) and CIFAR-10 (Krizhevsky, 2009) datasets.

2 Main Results

Let

represent the real-data random variable with a probability density function

. GAN’s goal is to find a generator function such that has a similar distribution to . Let be an -dimensional random variable with a fixed probability density function . Here, we assume

is the density of a normal distribution. In practice, we observe

samples from and generate samples from , i.e., where for . We represent these empirical distributions by and , respectively. Note that the number of generative samples can be arbitrarily large. Finally, we assume that the generator function is injective which is often the case in practice since maps from an dimensional space to a dimensional one where .

GAN computes the optimal generator by minimizing a distance between the observed distribution and the generative one . Common distance measures include optimal transport measures (e.g. Wasserstein GAN (Arjovsky et al., 2017), WGAN+Gradient Penalty (Gulrajani et al., 2017), GAN+Spectral Normalization (Miyato et al., 2018), WGAN+Truncated Gradient Penalty (Petzka et al., 2017), relaxed WGAN (Guo et al., 2017)), and divergence measures (e.g. the original GAN’s formulation (Goodfellow et al., 2014), -GAN (Nowozin et al., 2016)), etc.

In this paper, we focus on GANs based on optimal transport (OT) distance (Villani, 2008; Arjovsky et al., 2017) defined for a general loss function as follows

(2.1)

is the joint distribution whose marginal distributions are equal to

and , respectively. If , this distance is called the first-order Wasserstein distance and is referred to by , while if , this measure is referred to by where is the second-order Wasserstein distance (Villani, 2008).

The optimal transport (OT) GAN is formulated using the following optimization (Arjovsky et al., 2017; Villani, 2008):

(2.2)

where is the set of generator functions. Examples of the OT GAN are WGAN (Arjovsky et al., 2017) corresponding to the first-order Wasserstein distance and the quadratic GAN (or, the W2GAN) (Feizi et al., 2017) corresponding to the second-order Wasserstein distance .

Note that optimization 2.2 is a min-min optimization. The objective of this optimization is not smooth in and it is often computationally expensive to obtain a solution (Sanjabi et al., 2018). One approach to improve computational aspects of this optimization is to add a regularization term to make its objective strictly convex (Cuturi, 2013; Seguy et al., 2017). A common strictly-convex regularization term is the negative Shannon entropy function defined as . This leads to the following optimal transport GAN formulation with the entropy regularization, or for simplicity, the entropic GAN formulation:

(2.3)

where is the regularization parameter.

There are two approaches to solve the optimization problem 2.3. The first approach uses an iterative method to solve the min-min formulation (Genevay et al., 2017). Another approach is to solve an equivelent min-max formulation by writing the dual of the inner minimization (Seguy et al., 2017; Sanjabi et al., 2018). The latter is often referred to as a GAN formulation since the min-max optimization is over a set of generator functions and a set of discriminator functions. The details of this approach are further explained in Section 3.

In the following, we present an explicit probability model for entropic GANs under which their objective can be viewed as maximizing a lower bound on average sample likelihoods.

Theorem 1

Let the loss function be shift invariant, i.e., . Let

(2.4)

be an explicit probability model for given for a well-defined normalization

(2.5)

Then, we have

(2.6)

In words, the entropic GAN maximizes a lower bound on sample likelihoods according to the explicit probability model of equation 2.4.

The proof of this theorem is presented in Section A. This result has a similar flavor to that of VAEs (Makhzani et al., 2015; Rosca et al., 2017; Tolstikhin et al., 2017; Mescheder et al., 2017b) where a generative model is computed by maximizing a lower bound on sample likelihoods.

Having a shift invariant loss function is critical for Theorem 1 as this makes the normalization term independent from and (to see this, one can define in equation 2.6). The most standard OT GAN loss functions such as the for WGAN (Arjovsky et al., 2017) and the quadratic loss for W2GAN (Feizi et al., 2017) satisfy this property.

One can further simplify this result by considering specific loss functions. For example, we have the following result for the entropic GAN with the quadratic loss function.

Corollary 1

Let . Then, of equation 2.4 corresponds to the multivariate Gaussian density function and . In this case, the constant term in equation 2.6 is equal to .

Let and be optimal solutions of an entropic GAN optimization 2.3 (note that the optimal coupling can be computed efficiently for entropic GAN using equation 3.7). Let be a newly observed sample. An important question is what the likelihood of this sample is given the trained generative model. Using the explicit probability model of equation 2.4 and the result of Theorem 1, we can (approximately) compute sample likelihoods using the trained generative model. We explain this result in the following corollary.

Corollary 2

Let and (or, alternatively ) be optimal solutions of the entropic GAN equation 2.3. Let be a new observed sample. We have

(2.7)

The inequality becomes tight iff where

is the Kullback-Leibler divergence between two distributions.

3 GAN’s Dual Formulation

In this section, we discuss dual formulations for OT GAN (equation 2.2) and entropic GAN (equation 2.3) optimizations. These dual formulations are min-max optimizations over two function classes, namely the generator and the discriminator. Often local search methods such as alternating gradient descent (GD) are used to compute a solution for these min-max optimizations.

First, we discuss the dual formulation of OT GAN optimization 2.2

. Using the duality of the inner minimization, which is a linear program, we can re-write optimization

2.2 as follows (Villani, 2008):

(3.1)

where for all . The maximization is over two sets of functions and which are coupled using the loss function. Using the Kantorovich duality Villani (2008), we can further simplify this optimization as follows:

(3.2)

where is the -conjugate function of and is restricted to -convex functions (Villani, 2008). The above optimization provides a general formulation for OT GANs. If the loss function is , then the optimal transport distance is referred to as the first order Wasserstein distance. In this case, the min-max optimization 3.2 simplifies to the following optimization (Arjovsky et al., 2017):

(3.3)

This is often referred to as Wasserstein GAN, or simply WGAN (Arjovsky et al., 2017). If the loss function is quadratic, then the OT optimization is referred to as the quadratic GAN (or, W2GAN) (Feizi et al., 2017).

Similarly, the dual formulation of the entropic GAN equation 2.3 can be written as the following optimization (Cuturi, 2013; Seguy et al., 2017) 111Note that optimization 3.4 is dual of optimization 2.3 when the terms have been added to its objective. Since we have assumed that is injective, these terms are constants and thus can be ignored from the optimization objective without loss of generality. :

(3.4)

where

(3.5)

Note that the hard constraint of optimization 3.1 is being replaced by a soft constraint in optimization 3.2. In this case, optimal primal variables can be computed according to the following lemma (Seguy et al., 2017):

Lemma 1

Let and be the optimal discriminator functions for a given generator function according to optimization 3.4. Let

(3.6)

Then,

(3.7)

This lemma is important for our results since it provides an efficient way to compute the optimal coupling between real and generative variables (i.e. ) using the optimal generator () and discriminators ( and ) of optimization 3.4. It is worth to note that without the entropy regularization term, computing the optimal coupling in OT GAN using the optimal generator and discriminator functions is not straightforward in general (unless in some special cases such as that of the W2GAN (Villani, 2008; Feizi et al., 2017)). This is another additional computational benefit of using entropic GAN. We use the algorithm presented in Sanjabi et al. (2018) to solve optimization 3.4.

4 Experimental Results

In this section, we supplement our theoretical results with experimental validations. One of the main objectives of our work is to provide a framework to compute sample likelihoods in GANs. Such likelihood statistics can then be used in several statistical inference applications that we discuss in Section 5. With a trained entropic WGAN, the likelihood of a test sample can be lower-bounded using Corollary 2. As shown in Lemma 1, WGAN with entropy regularization provides a closed-form solution to the conditional density of the latent variable. From equation 3.7, we have

By change of variables (and under the assumption that the generator is injective), we have

(4.1)

In order to compute our proposed surrogate likelihood of Corollary 2, we need to draw samples from the distribution

. One approach is to use a Markov chain Monte Carlo (MCMC) method to sample from this distribution. In our experiments, however, we found that MCMC demonstrates poor performance owing to the high dimensional nature of

. A similar issue with MCMC has been reported for VAEs in  Kingma & Welling (2013). Thus, we use a different estimator to compute the likelihood surrogate which provides a better exploration of the latent space. We present our sampling procedure in Alg. 1.

1:Sample points
2:Compute
3:Normalize to get probabilities
4:Compute
5:Return
Algorithm 1 Estimating sample likelihoods in GANs
(a)
(b)
Figure 2: (a) Distributions of surrogate sample likelihoods at different iterations of entropic WGAN’s training using MNIST dataset. (b) Distributions of surrogate sample likelihoods of MNIST, MNIST-1 and SVHM datasets using a GAN trained on MNIST-1.

4.1 Likelihood Evolution in GAN’s Training

In the experiments of this section, we study how sample likelihoods vary during GAN’s training. An entropic WGAN is first trained on MNIST dataset. Then, we randomly choose samples from MNIST test-set to compute the surrogate likelihoods using Algorithm 1 at different training iterations. We expect sample likelihoods to increase over training iterations as the quality of the generative model improves. A proper surrogate likelihood function should capture this trend.

Fig. 1(a) demonstrates the evolution of sample likelihood distributions at different training iterations of the entropic WGAN. At iteration , surrogate likelihood values are very low as GAN’s generated images are merely random noise. The likelihood distribution shifts towards high values during the training and saturates beyond a point. Details of this experiment are presented in Appendix D.

4.2 Likelihood Comparison Across different datasets

In this section, we perform experiments across different datasets. An entropic WGAN is first trained on a subset of samples from the MNIST dataset containing digit (which we call the MNIST-1 dataset). With this trained model, likelihood estimates are computed for (1) samples from the entire MNIST dataset, and (2) samples from the Street View House Numbers (SVHN) dataset (Netzer et al., 2011) (Fig. 1(b)). In each experiment, the likelihood estimates are computed for samples. We note that highest likelihood estimates are obtained for samples from MNIST-1 dataset, the same dataset on which the GAN was trained. The likelihood distribution for the MNIST dataset is bimodal with one mode peaking inline with the MNIST-1 mode. Samples from this mode correspond to digit

in the MNIST dataset. The other mode, which is the dominant one, contains the rest of the digits and has relatively low likelihood estimates. The SVHN dataset, on the other hand, has much smaller likelihoods as its distribution is significantly different than that of MNIST. Furthermore, we observe that the likelihood distribution of SVHN samples has a large spread (variance). This is because samples of the SVHN dataset is more diverse with varying backgrounds and styles than samples from MNIST. We note that SVHN samples with high likelihood estimates correspond to images that are similar to MNIST digits, while samples with low scores are different than MNIST samples. Details of this experiment are presented in Appendix 

D.

4.3 Approximate Likelihood Computation in Un-regularized GANs

Most standard GAN architectures do not have the entropy regularization. Likelihood lower bounds of Theorem 1 and Corollary 2 hold even for those GANs as long as we obtain the optimal coupling in addition to the optimal generator from GAN’s training. Computation of optimal coupling from the dual formulation of OT GAN can be done when the loss function is quadratic (Feizi et al., 2017). In this case, the gradient of the optimal discriminator provides the optimal coupling between and (Villani, 2008) (see Lemma. 2 in Appendix B).

For a general GAN architecture, however, the exact computation of optimal coupling may be difficult. One sensible approximation is to couple with a single latent sample (we are assuming the conditional distribution is an impulse function). To compute corresponding to a , we sample latent samples and select the whose is closest to

. This heuristic takes into account both the likelihood of the latent variable as well as the distance between

and the model (similarly to equation 3.7). We can then use Corollary 2 to approximate sample likelihoods for various GAN architectures.

We use this approach to compute likelihood estimates for CIFAR-10 (Krizhevsky, 2009) and LSUN-Bedrooms (Yu et al., 2015) datasets. For CIFAR-10, we train DCGAN while for LSUN, we train WGAN (details of these experiments can be found in Appendix D).

Fig. 2(a) demonstrates sample likelihood estimates of different datasets using a GAN trained on CIFAR-10. Likelihoods assigned to samples from MNIST and Office datasets are lower than that of the CIFAR dataset. Samples from the Office dataset, however, are assigned to higher likelihood values than MNIST samples. We note that the Office dataset is indeed more similar to the CIFAR dataset than MNIST. A similar experiment has been repeated for LSUN-Bedrooms (Yu et al., 2015) dataset. We observe similar performance trends in this experiment (Fig. 2(b)).

(a)
(b)
Figure 3: (a) Sample likelihood estimates of MNIST, Office and CIFAR datasets using a GAN trained on the CIFAR dataset. (b) Sample likelihood estimates of MNIST, Office and LSUN datasets using a GAN trained on the LSUN dataset.

5 Conclusion

In this paper, we have provided a statistical framework for a family of GANs. Our main result shows that the entropic GAN optimization can be viewed as maximization of a variational lower-bound on average log-likelihoods, an approach that VAEs are based upon. This result makes a connection between two most-popular generative models, namely GANs and VAEs. More importantly, our result constructs an explicit probability model for GANs that can be used to compute a lower-bound on sample likelihoods. Our experimental results on various datasets demonstrate that this likelihood surrogate can be a good approximation of the true likelihood function. Although in this paper we mainly focus on understanding the behavior of the sample likelihood surrogate in different datasets, the proposed statistical framework of GANs can be used in various statistical inference applications. For example, our proposed likelihood surrogate can be used as a quantitative measure to evaluate the performance of different GAN architectures, it can be used to quantify the domain shifts, it can be used to select a proper generator class by balancing the bias term vs. variance, it can be used to detect outlier samples, it can be used in statistical tests such as hypothesis testing, etc. We leave exploring these directions for future work.

References

Appendix A Proof of Theorem 1

Using the Baye’s rule, one can compute the -likelihood of an observed sample as follows:

(A.1)

where the second step follows from equation 2.4.

Consider a joint density function such that its marginal distributions match and . Note that the equation A.1 is true for every . Thus, we can take the expectation of both sides with respect to a distribution . This leads to the following equation:

(A.2)

where is the Shannon-entropy function.

Next we take the expectation of both sides with respect to :

(A.3)

Here, we replaced the expectation over with the expectation over since one can generate an arbitrarily large number of samples from the generator. Since the KL divergence is always non-negative, we have

(A.4)

This inequality is true for every satisfying the marginal conditions. Thus, similar to VAEs, we can pick to maximize the lower bound on average sample -likelihoods. This leads to the entropic GAN optimization 2.3.

Appendix B Optimal Coupling for W2GAN

Optimal coupling for the W2GAN (quadratic GAN (Feizi et al., 2017)) can be computed using the gradient of the optimal discriminator (Villani, 2008) as follows.

Lemma 2

Let be absolutely continuous whose support contained in a convex set in . Let be the optimal discriminator for a given generator in W2GAN. This solution is unique. Moreover, we have

(B.1)

where means matching distributions.

Appendix C Sinkhorn Loss

In practice, it has been observed that a slightly modified version of the entropic GAN demonstrates improved computational properties (Genevay et al., 2017; Sanjabi et al., 2018). We explain this modification in this section. Let

(C.1)

where is the Kullback–Leibler divergence. Note that the objective of this optimization differs from that of the entropic GAN optimization 2.3 by a constant term . A sinkhorn distance function is then defined as (Genevay et al., 2017):

(C.2)

is called the Sinkhorn loss function. Reference Genevay et al. (2017) has shown that as , approaches . For a general , we have the following upper and lower bounds:

Lemma 3

For a given , we have

(C.3)

Proof From the definition equation C.2, we have . Moreover, since (this can be seen by using an identity coupling as a feasible solution for optimization C.1) and similarly , we have .  

Since is constant in our setup, optimizing the GAN with the Sinkhorn loss is equivalent to optimizing the entropic GAN. So, our likelihood estimation framework can be used with models trained using Sinkhorn loss as well. This is particularly important from a practical standpoint as training models with Sinkhorn loss tends to be more stable in practice.

Appendix D Training Entropic GANs

In this section, we discuss how WGANs with entropic regularization is trained. As discussed in Section 3, the dual of the entropic GAN formulation can be written as

where

We can optimize this min-max problem using alternating optimization. A better approach would be to take into account the smoothness introduced in the problem due to the entropic regularizer, and solve the generator problem to stationarity using first-order methods. Please refer to  Sanjabi et al. (2018) for more details. In all our experiments, we use Algorithm 1 of  Sanjabi et al. (2018) to train our GAN model.

d.1 GAN’s Training on MNIST

MNIST dataset constains grayscale images. As a pre-processing step, all images were resized in the range . The Discriminator and the Generator architectures used in our experiments are given in Tables. 1,2. Note that the dual formulation of GANs employ two discriminators - and

, and we use the same architecture for both. The hyperparameter details are given in Table 

3. Some sample generations are shown in Fig. 4

Layer Output size Filters
Input -
Fully connected
Reshape -

BatchNorm+ReLU

-
Deconv2d (, str )
BatchNorm+ReLU -
Remove border row and col. -
Deconv2d (, str )
BatchNorm+ReLU -
Deconv2d (, str )
Sigmoid -
Table 1: Generator architecture
Layer Output size Filters
Input -
Conv2D(, str )
LeakyReLU() -
Conv2D(, str )
LeakyReLU() -
Conv2d (, str )
LeakyRelU() -
Reshape -
Fully connected
Table 2: Discriminator architecture
Parameter Config
Generator learning rate
Discriminator learning rate
Batch size
Optimizer Adam
Optimizer params ,
Number of critic iters / gen iter 5
Number of training iterations 10000
Table 3: Hyper-parameter details for MNIST experiment
Figure 4: Samples generated by Entropic GAN trained on MNIST
Figure 5: Samples generated by Entropic GAN trained on MNIST-1 dataset

d.2 GAN’s Training on CIFAR

We trained a DCGAN model on CIFAR dataset using the discriminator and generator architecture used in  Radford et al. (2015). The hyperparamer details are mentioned in Table. 4. Some sample generations are provided in Figure 6

d.3 GAN’s Training on LSUN-Bedrooms dataset

We trained a WGAN model on LSUN-Bedrooms dataset with DCGAN architectures for generator and discriminator networks (Arjovsky et al., 2017). The hyperparameter details are given in Table. 5, and some sample generations are provided in Fig. 7

Parameter Config
Generator learning rate
Discriminator learning rate
Batch size
Optimizer Adam
Optimizer params ,

Number of training epochs

100
Table 4: Hyper-parameter details for CIFAR-10 experiment
Figure 6: Samples generated by DCGAN model trained on CIFAR dataset
Parameter Config
Generator learning rate
Discriminator learning rate
Clipping parameter 0.01
Number of critic iters per gen iter 5
Batch size
Optimizer RMSProp
Number of training iterations 70000
Table 5: Hyper-parameter details for LSUN-Bedrooms experiment
Figure 7: Samples generated by WGAN model trained on LSUN-Bedrooms dataset