Log In Sign Up

LatentGAN Autoencoder: Learning Disentangled Latent Distribution

by   Sanket Kalwar, et al.

In autoencoder, the encoder generally approximates the latent distribution over the dataset, and the decoder generates samples using this learned latent distribution. There is very little control over the latent vector as using the random latent vector for generation will lead to trivial outputs. This work tries to address this issue by using the LatentGAN generator to directly learn to approximate the latent distribution of the autoencoder and show meaningful results on MNIST, 3D Chair, and CelebA datasets, an additional information-theoretic constrain is used which successfully learns to control autoencoder latent distribution. With this, our model also achieves an error rate of 2.38 on MNIST unsupervised image classification, which is better as compared to InfoGAN and AAE.


page 2

page 3

page 4


Disentangled Representation Learning with Information Maximizing Autoencoder

Learning disentangled representation from any unlabelled data is a non-t...

Variance Constrained Autoencoding

Recent state-of-the-art autoencoder based generative models have an enco...

A Robust Classification-autoencoder to Defend Outliers and Adversaries

In this paper, we present a robust classification-autoencoder (CAE) whic...

Toward A Neuro-inspired Creative Decoder

Creativity, a process that generates novel and valuable ideas, involves ...

Geometric instability of out of distribution data across autoencoder architecture

We study the map learned by a family of autoencoders trained on MNIST, a...

Out-of-distribution Detection and Generation using Soft Brownian Offset Sampling and Autoencoders

Deep neural networks often suffer from overconfidence which can be partl...

Imagining the Unseen: Learning a Distribution over Incomplete Images with Dense Latent Trees

Images are composed as a hierarchy of object parts. We use this insight ...


Generative models like GANGoodfellow et al. (2014) and VAEKingma and Welling (2014)

have shown remarkable progress in recent years.Generative adversarial networks have shown state-of-the-art performance in a variety of tasks like Image-To-Image translation

Isola et al. (2018), video predictionLiang et al. (2017), Text-to-Image translationZhang et al. (2017), drug discoveryHong et al. (2019), and privacy-preservingShi et al. (2018). VAE has shown state-of-the-art performance in a variety of tasks like image generationGregor et al. (2015)

, semi-supervised learning

Maaløe et al. (2016)

, and interpolating between sentences

Bowman et al. (2016)

. VAE approximates its latent distribution by ELBO method and uses Gaussian or uniform distribution as a marginal latent distribution which leads to huge reconstruction error. Adversarial autoencoder(AAE)

Makhzani et al. (2016) uses adversarial training to train VAE, and LatentGAN is directly inspired by the AAE strategy of training VAE. -VAEBurgess et al. (2018) can be used for learning disentangled representation, but it needs hyperparameter search and also huge would lead to huge reconstruction error. InfoGANChen et al. (2016) have shown to learn disentangled representation in GAN. It learns to disentangle representation by maximizing the mutual information between a subset of generated sample and the output of the recognition network. Learning disentangled representation and understanding generative factors might help in a variety of tasks and domainsBengio et al. (2013); Burgess et al. (2018). Disentangled representation can be defined as one where single latent units are sensitive to changes in single generative factors, while being relatively invariant to changes in other factors. Disentangled representations could boost the performance of state-of-the-art AI approaches in situations where they still struggle but where humans excelLake et al. (2016). In this paper, we have addressed the issue of learning disentangled control by learning the latent distribution of autoencoder directly using LatentGAN generator, and LatentGAN discriminator which tries to discriminate whether the sample belongs to real or fake latent distribution. Additionally, we also use mutual information inspired directly from InfoGAN to learn disentangled representation.

Related Work

In this work, we present a new way to learn control over autoencoder latent distribution with the help of AAE Makhzani et al. (2016) which approximates posterior of the latent distribution of autoencoder using any arbitrary prior distribution and using Chen et al. (2016) for learning disentangled representation. The previous work by Wang et al. (2019) had used a similar method of learning the latent prior using AAE along with a perceptual loss and Information maximization regularizer to train the decoder with the help of an extra discriminator.


In this work, we are trying to learn control over autoencoder and in doing so following are our contribution:

  • We have shown that it is possible to approximate autoencoder latent distribution directly using LatentGAN generator without using extra discriminator as in Wang et al. (2019).

  • We are also able to learn disentangled control over autoencoder latent distribution by using same LatentGAN Discriminator and have shown some results in the experiment section.

And in doing so we are also able to get 2.38 error rate on MNIST unsupervised classification which is far less then InfoGAN and Adversarial autoencoder.


We train autoencoder i.e. and LatentGAN generator and discriminator simultanously,this helps generator to compete with discriminator.Autoencoder training objective is as follows:

LatentGAN tries to model data distribution where is the latent embedding of the autoencoder, and denotes latent code which generated by concatenating Gaussian and code , as show in the Figure 1. LatentGAN training objective is as follows:


For learning the control over latent distribution additional mutual information based objective inspired by(Chen et al. 2016) is used,

by defining variational lower bound over second term of the equation (Method) can be further written as:

term insures that GAN loss and differential entropy loss are on same scale.

Figure 1: LatentGAN Autoencoder model

Implementation Details

We train autoencoder which consist of encoder which models , where is latent distribution and is the input image and decoder which models using equation (Method). LatentGAN generator and discriminator as shown in Figure 1 can be implemented using linear layer with non-linearity.The network also has linear layer,and shares parameters with .Then , and are trained using follow process:

  • 1) Random Gaussian noise is sampled from along with latent code, which can be uniformly continuous or categorically discrete where is the number of categories.

  • 2) Latent code and Gaussian noise are passed through , which generates where can be or code.

  • 3) gets input from and , which outputs whether the sample belongs to or not.

  • 4) receives input from , and outputs code ,which will be used for optimizing equation (Method).

It has been observed that using layer initializations from DCGAN Radford et al. (2016) hurts LatentGAN training process,but the suggestion of using noisy labels for training helps during training.


MNIST Dataset

Figure 2: Varying changes the digit category.Every row coressponds to different , and every column have same but different Gaussian noise .
Figure 3: Varying changes the thickness of the digit.Every row have different and Gaussian noise ,and every column have different but same Gaussian noise and .
Figure 4: Varying rotates the digit.Every row have different and Gaussian noise ,and every column have different but same Gaussian noise and .

In MNIST LeCun and Cortes (2010) dataset, we use one categorical discrete variable with , and two continous uniform variable and as an input latent code to . After training, random samples are generated by choosing , , and and output is shown in the Figure 2.This shows that LatentGAN generator is able to approximate latent distribution of autoencoder directly.

Method K Test error (↓)
InfoGAN 10 5 ± 0.01
AAE 32 4.10 ± 1.13
ours 10 2.38 ± 0.38
Table 1: MNIST Unsupervised Classification test error.

We have also shown learned disentangled rotation control as shown in the Figure 4, and disentangled thickness control in the Figure 3.This shows that we can directly learn control on latent distribution of autoencoder.Also we can use discriminator for unsupervised classification, and Table 1 shows that our method has better test error than InfoGAN and AAE, and (↓) denotes lower the score better the performance.

Encoder Decoder Discriminator / Generator
Input(1X32x32) Input Input Input
c4-o16-s2-r r-u4-c3-o128-p1-bn128-r fc-o1000-r fc-o1000-r
c4-o32-s2-r u2-c3-o64-p1-bn64-r fc-o1000-r fc-o1000-r
c4-o64-s2-r u2-c3-o32-p1-bn32-r fc-o512-r fc-o1000-r
c4-o128-s2-r u2-c3-o3-p1-tanh fc-o1-sig,fc-o2,fc-o10 fc-o64
c4-o128-s2-r - - -
fc-o64 - - -
Table 2: MNIST Network Architecture

For training LatentGAN Autoencoder on MNIST dataset, we have used architecture as mentioned in the Table 2 with batchsize of 128, lr of 0.0002 and Adam optimizer with beta1 of 0.5 and beta2 of 0.9 for both autoencoder and PriorGAN. In the Table 2 following are the conventions used, c is convolution, o is output channels, s is stride , u is bilinear upsampling, p is padding , r is relu, sig is sigmoid and bn is batchnorm. In column 3 of the table final output have 3 sections, 1st section belongs to discriminator and other sections belong to network. and is set to 1 and 0.1 respectively.

3D Chair Dataset

Figure 5: 3D Chair Generated Samples
Figure 6: Varying will rotate the chair.

In 3D Chair Aubry et al. (2014)

dataset, we use 3 discrete categorical variable

where and 1 continuous uniform variable as an input to . We are able to generate meaningful sample as show in the Figure 5, that means is able to approximate autoencoder latent distribution. We are also able to learn disentangled rotational control on the chair dataset as shown in the Figure 6. Hyperparameter setting for 3D Chair dataset is same as MNIST Dataset except and is set to 1 and 10 respectively.

Encoder Decoder Discriminator / Generator
Input(1X64x64) Input Input Input
c4-o64-s2-r r-u4-c3-o512-p1-bn128-r fc-o3000-r fc-o3000-r
c4-o128-s2-r u2-c3-o256-p1-bn64-r fc-o3000-r fc-o3000-r
c4-o256-s2-r u2-c3-o128-p1-bn32-r fc-o3000-r fc-o3000-r
c4-o512-s2-r u2-c3-o64-p1-bn32-r fc-o512-r fc-o3000-r
c4-o1024-s2-r u2-c3-o1-p1-tanh fc-o1-sig,(fc-o2),3x(fc-o20) fc-o128
c4-o128-s2-r - -
fc-o128 - - -
Table 3: 3D Chair Network Architecture

For training LatentGAN Autoencoder on 3D Chair dataset, we have used architecture as mentioned in the Table 3.

CelebA Dataset

Figure 7: CelebA Generated samples

In CelebA Liu et al. (2015) dataset, we use 10 discrete categorical variable only where as an input to .After training Generator and are able to generate meaningful samples shown in the Figure 7.

Figure 8: Varying changes the smile.

Also we are able to learn smile feature control on the CelebA dataset as shown in the Figure 8.

Encoder Decoder Discriminator / Generator
Input(3X32x32) Input Input Input
c4-o64-s2-r r-u4-c3-o512-p1-bn128-r fc-o3000-r fc-o3000-r
c4-o128-s2-r u2-c3-o256-p1-bn64-r fc-o3000-r fc-o3000-r
c4-o256-s2-r u2-c3-o128-p1-bn32-r fc-o3000-r fc-o3000-r
c4-o512-s2-r u2-c3-o3-p1-tanh fc-o512-r fc-o3000-r
c4-o128-s2-r - fc-o1-sig,10x(fc-o10) fc-o128
fc-o128 - - -
Table 4: CelebA Network Architecture

For training LatentGAN Autoencoder on CelebA dataset, we have used architecture as mentioned in the Table 4.Hyperparameter setting for CelebA dataset is same as MNIST Dataset except is set to 1.


We proposed LatentGAN autoencoder which can learn to control directly over autoencoder latent distribution and in doing so it is able to generate meaningful samples. Experimentally, we are able to verify that LatentGAN autoencoder can be used to learn meaningful disentangled representation over latent distribution and in unsupervised MNIST classification task performs better then InfoGAN and AAE. This further suggests that rather than making generator learn image distribution which may be challenging, we can approximate latent distribution which is less challenging and easy for generator to learn.


We thank Wobot Intelligence Inc. for providing NVIDIA GPU hardware resource for conducting this research which significantly boosted our experimentation cycle.


  • M. Aubry, D. Maturana, A. Efros, B. Russell, and J. Sivic (2014) Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In CVPR, Cited by: 3D Chair Dataset.
  • Y. Bengio, A. Courville, and P. Vincent (2013) Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8), pp. 1798–1828. External Links: Document Cited by: Introduction.
  • S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, and S. Bengio (2016) Generating sentences from a continuous space. External Links: 1511.06349 Cited by: Introduction.
  • C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner (2018) Understanding disentangling in -vae. External Links: 1804.03599 Cited by: Introduction.
  • X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel (2016) InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. External Links: 1606.03657 Cited by: Introduction, Related Work.
  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial networks. External Links: 1406.2661 Cited by: Introduction.
  • K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra (2015)

    DRAW: a recurrent neural network for image generation

    External Links: 1502.04623 Cited by: Introduction.
  • S. H. Hong, J. Lim, S. Ryu, and W. Y. Kim (2019) Molecular generative model based on adversarially regularized autoencoder. External Links: 1912.05617 Cited by: Introduction.
  • P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2018) Image-to-image translation with conditional adversarial networks. External Links: 1611.07004 Cited by: Introduction.
  • D. P. Kingma and M. Welling (2014) Auto-encoding variational bayes. External Links: 1312.6114 Cited by: Introduction.
  • B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman (2016) Building machines that learn and think like people. External Links: 1604.00289 Cited by: Introduction.
  • Y. LeCun and C. Cortes (2010) MNIST handwritten digit database. Note: External Links: Link Cited by: MNIST Dataset.
  • X. Liang, L. Lee, W. Dai, and E. P. Xing (2017) Dual motion gan for future-flow embedded video prediction. External Links: 1708.00284 Cited by: Introduction.
  • Z. Liu, P. Luo, X. Wang, and X. Tang (2015) Deep learning face attributes in the wild. In

    Proceedings of International Conference on Computer Vision (ICCV)

    Cited by: CelebA Dataset.
  • L. Maaløe, C. K. Sønderby, S. K. Sønderby, and O. Winther (2016) Auxiliary deep generative models. External Links: 1602.05473 Cited by: Introduction.
  • A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey (2016) Adversarial autoencoders. External Links: 1511.05644 Cited by: Introduction, Related Work.
  • A. Radford, L. Metz, and S. Chintala (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. External Links: 1511.06434 Cited by: Implementation Details.
  • H. Shi, J. Dong, W. Wang, Y. Qian, and X. Zhang (2018) SSGAN: secure steganography based on generative adversarial networks. External Links: 1707.01613 Cited by: Introduction.
  • H. Wang, W. Peng, and W. Ko (2019) Learning priors for adversarial autoencoders. External Links: 1909.04443 Cited by: Related Work, 1st item.
  • H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas (2017) StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. External Links: 1612.03242 Cited by: Introduction.