Pairwise Augmented GANs with Adversarial Reconstruction Loss

10/11/2018 ∙ by Aibek Alanov, et al. ∙ HSE University Moscow Institute of Physics and Technology 8

We propose a novel autoencoding model called Pairwise Augmented GANs. We train a generator and an encoder jointly and in an adversarial manner. The generator network learns to sample realistic objects. In turn, the encoder network at the same time is trained to map the true data distribution to the prior in latent space. To ensure good reconstructions, we introduce an augmented adversarial reconstruction loss. Here we train a discriminator to distinguish two types of pairs: an object with its augmentation and the one with its reconstruction. We show that such adversarial loss compares objects based on the content rather than on the exact match. We experimentally demonstrate that our model generates samples and reconstructions of quality competitive with state-of-the-art on datasets MNIST, CIFAR10, CelebA and achieves good quantitative results on CIFAR10.



There are no comments yet.


page 7

page 13

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep generative models are a powerful tool to sample complex high dimensional objects from a low dimensional manifold. The dominant approaches for learning such generative models are variational autoencoders (VAEs) (Kingma & Welling, 2014; Rezende et al., 2014) and generative adversarial networks (GANs) (Goodfellow et al., 2014)

. VAEs allow not only to generate samples from the data distribution, but also to encode the objects into the latent space. However, VAE-like models require a careful likelihood choice. Misspecifying one may lead to undesirable effects in samples and reconstructions (e.g., blurry images). On the contrary, GANs do not rely on an explicit likelihood and utilize more complex loss function provided by a discriminator. As a result, they produce higher quality images. However, the original formulation of GANs

(Goodfellow et al., 2014)

lacks an important encoding property that allows many practical applications. For example, it is used in semi-supervised learning

(Kingma et al., 2014), in a manipulation of object properties using low dimensional manifold (Creswell et al., 2017) and in an optimization utilizing the known structure of embeddings (Gómez-Bombarelli et al., 2018).

VAE-GAN hybrids are of great interest due to their potential ability to learn latent representations like VAEs, while generating high-quality objects like GANs. In such generative models with a bidirectional mapping between the data space and the latent space one of the desired properties is to have good reconstructions (). In many hybrid approaches (Rosca et al., 2017; Ulyanov et al., 2018; Zhu et al., 2017; Brock et al., 2017; Tolstikhin et al., 2017) as well as in VAE-like methods it is achieved by minimizing or pixel-wise norm between and . However, the main drawback of using these standard reconstruction losses is that they enforce the generative model to recover too many unnecessary details of the source object . For example, to reconstruct a bird picture we do not need an exact position of the bird on an image, but the pixel-wise loss penalizes a lot for shifted reconstructions. Recently, Li et al. (2017) improved ALI model (Dumoulin et al., 2017; Donahue et al., 2017)

by introducing a reconstruction loss in the form of a discriminator which classifies pairs

and . However, in such approach, the discriminator tends to detect the fake pair just by checking the identity of and which leads to vanishing gradients.

In this paper, we propose a novel autoencoding model which matches the distributions in the data space and in the latent space independently as in Zhu et al. (2017). To ensure good reconstructions, we introduce an augmented adversarial reconstruction loss as a discriminator which classifies pairs and where is a stochastic augmentation function. This enforces the discriminator to take into account content invariant to the augmentation, thus making training more robust. We call this approach Pairwise Augmented Generative Adversarial Networks (PAGANs). Measuring a reconstruction quality of autoencoding models is challenging. A standard reconstruction metric RMSE does not perform the content-based comparison. To deal with this problem we propose a novel metric Reconstruction Inception Dissimilarity (RID) which is robust to content-preserving transformations (e.g., small shifts of an image). We show qualitative results on common datasets such as MNIST (LeCun & Cortes, 2010), CIFAR10 (Krizhevsky et al., 2009) and CelebA (Liu et al., 2015). PAGANs outperform existing VAE-GAN hybrids in Inception Score (Salimans et al., 2016) and Fréchet Inception Distance (Heusel et al., 2017) except for the recently announced method PD-WGAN (Gemici et al., 2018) on CIFAR10 dataset.

Figure 1: The PAGAN model.

2 Preliminaries

Let us consider an adversarial learning framework where our goal is to match the true distribution to the model distribution . As it was proposed in the original paper Goodfellow et al. (2014), the model distribution is induced by the generator where is sampled from a prior . To match the distributions and in an adversarial manner, we introduce a discriminator . It takes an object

and predicts the probability that this object is sampled from the true distribution

. The training procedure of GANs (Goodfellow et al., 2014) is based on the minimax game of two players: the generator and the discriminator . This game is defined as follows


where is a value function for this game.

The optimal discriminator given fixed generator is


and then the value function for the generator given the optimal discriminator is equivalent to the Jensen-Shanon divergence between the model distribution and the true distribution , i.e.


However, in practice, the gradient of the value function with respect to the generator’s parameters vanishes to zero. Therefore, Goodfellow et al. (2014) proposed to train the generator by minimizing instead of . This loss for the generator provides much more stable gradients and has the same fixed point as the minimax game of and .

3 Pairwise Augmented Generative Adversarial Networks

In PAGANs model our aim is not only to learn how to generate real objects with the generator where is sampled from prior but at the same time learn an inverse mapping (encoder) . Additionally, we use the third stochastic transformation without parameters which is called augmenter. It produces the augmentation of the source object .

Let us consider the distributions which are induced by these three mappings

  • - the conditional distribution of outputs of the generator given ;

  • - the conditional distribution of outputs of the encoder given ;

  • - the conditional distribution over the augmentations given a source object .

Within the PAGANs model our goal is to find such optimal parameters and that ensure

  1. generator matching: where , i.e. the generator samples objects from the true distribution ;

  2. encoder matching: where , i.e. the encoder generates embeddings as the prior ;

  3. reconstruction matching: where


    i.e. reconstructions are distributed as augmentations of the source object .

3.1 Generator & Encoder Matching

In order to deal with generator and encoder matching problems we can use the framework of the vanilla GANs (Goodfellow et al., 2014). We introduce two discriminators and for two minimax games:

  • generator matching:

  • encoder matching:


Then the value functions and given the optimal discriminators and are equivalent to Jensen-Shanon divergence:


3.2 Reconstruction Matching: Augmented Adversarial Reconstruction Loss

The solution of the reconstruction matching problem ensures that reconstructions correspond to the source object up to defined random augmentations . In PAGANs model we introduce the minimax game for training the adversarial distance between the reconstructions and augmentations of the source object . We consider the discriminator which takes a pair and classifies it into one of the following classes:

  • the real class: pairs from the distribution , i.e. the object is taken from the true distribution and the second is obtained from the by the random augmentation ;

  • the fake class: pairs from the distribution


    i.e. is sampled from then is generated from the conditional distribution by the encoder and is produced by the generator from the conditional model distribution .

Then the minimax problem is




Let us prove that such minimax game will match the distributions and . At first, we find the optimal discriminator:

Proposition 1.

Given a fixed generator and a fixed encoder , the optimal discriminator is


Given in Appendix A.1. ∎

Then we can prove that given an optimal discriminator the value function is equivalent to the expected Jensen-Shanon divergence between the distributions and .

Proposition 2.

The minimization of the value function under an optimal discriminator is equivalent to the minimization of the expected Jensen-Shanon divergence between and , i.e.


Given in Appendix A.2. ∎

If then the optimal discriminator will learn an indicator as was proved in Li et al. (2017). As a consequence, the objectives of the generator and the encoder are very unstable and have vanishing gradients in practice. On the contrary, if the distribution is non-degenerate as in our model then the value function will be well-behaved and much more stable which we observed in practice.

3.3 Training Objectives

We obtain that for the generator and the encoder we should optimize the sum of two value functions:

  • the generator’s objective:

  • the encoder’s objective:


In practice in order to speed up the training we follow Goodfellow et al. (2014) and use more stable objectives replacing with . See Figure 1 for the description of our model and Algorithm 1 for an algorithmic illustration of the training procedure.

We can straightforwardly extend the definition of PAGANs model to -PAGANs which minimize the -divergence and to WPAGANs which optimize the Wasserstein-1 distance. More detailed analysis of these models is placed in Appendix B.

      Draw samples from the dataset and the prior
      Sample from the conditionals
      Compute discriminator loss
      Compute generator loss
      Compute encoder loss
      Gradient update on discriminator networks
      Gradient update on generator-encoder networks
until convergence
Algorithm 1 The PAGAN training algorithm.

4 Related Work

Recent papers on VAE-GAN hybrids explore different ways to build a generative model with an encoder part. One direction is to apply adversarial training in the VAE framework to match the variational posterior distribution and the prior distribution (Mescheder et al., 2017) or to match the marginal and (Makhzani et al., 2016; Tolstikhin et al., 2017). Another way within the VAE model is to introduce the discriminator as a part of a data likelihood (Larsen et al., 2015; Brock et al., 2017). Within the GANs framework, a common technique is to regularize the model with the reconstruction loss term (Che et al., 2017; Rosca et al., 2017; Ulyanov et al., 2018).

Another principal approach is to train the generator and the encoder (Donahue et al., 2017; Dumoulin et al., 2017; Li et al., 2017)

simultaneously in a fully adversarial way. These methods match the joint distributions

and by training the discriminator which classifies the pairs . ALICE model (Li et al., 2017) introduces an additional entropy loss for dealing with the non-identifiability issues in ALI model. Li et al. (2017) approximated the entropy loss with the cycle-consistency term which is equivalent to the adversarial reconstruction loss. The model of Pu et al. (2017a) puts ALI to the VAE framework where the same joint distributions are matched in an adversarial manner. As an alternative, Ulyanov et al. (2018) train generator and encoder by optimizing the minimax game without the discriminator. Optimal transport approach is also explored, Gemici et al. (2018) introduce an algorithm based on primal and dual formulations of an optimal transport problem.

In PAGANs model the marginal distributions in the data space and and in the latent space and are matched independently as in Zhu et al. (2017). Additionally, the augmented adversarial reconstruction loss is minimized by fooling the discriminator which classifies the pairs and .

5 Experiments

In this section, we validate our model experimentally. At first, we compare PAGAN with other similar methods that allow performing both inference and generation using Inception Score and Fréchet Inception Distance. Secondly, to measure reconstruction quality, we introduce Reconstruction Inception Dissimilarity (RID) and prove its usability. In the last two experiments we show the importance of the adversarial loss and augmentations.

For the architecture choice we used deterministic DCGAN111DCGAN architecture is a common choice for GANs, other works use similar architecture generator and discriminator networks provided by pfnet-research222

, the encoder network has the same architecture as the discriminator except for the output dimension. The encoder’s output is a factorized normal distribution. Thus

, where are outputs of the encoder network. The discriminator

architecture is chosen to be a 2 layer MLP with 512, 256 hidden units. We also used the same default hyperparameters as provided in the repository and applied a spectral normalization following

Miyato et al. (2018). For the augmentation defined in Section 3

we used a combination of reflecting 10% pad and the random crop to the same image size. The prior distribution

is chosen to be a standard distribution . To evaluate Inception Score and Fréchet Inception Distance we used the official implementation provided in tensorflow 1.10.1 (Abadi et al., 2015).

To optimize objectives (16), (14), we need to have a discriminator working on pairs . This can be done using special network architectures like siam networks (Bromley et al., 1993) or via an image concatenation. The latter approach can be implemented in two concurrent ways: concatenating channel or widthwise. Empirically we found that the siam architecture does not lead to significant improvement and concatenating width wise to be the most stable. We use this configuration in all the experiments.

Sampling Quality
To see whether our method provides good quality samples from the prior, we compared our model to related works that allow an inverse mapping. We performed our evaluations on CIFAR10 dataset since quantitative metrics are available there. Considering Fréchet Inception Distance (FID), our model outperforms all other methods. Inception Score shows that PAGANs significantly better than others except for recently announced PD-WGAN. Quantitative results are given in Table 1. Plots with samples and reconstructions for CIFAR10 dataset are provided in Figure 2. Additional visual results for more datasets can be found in Appendix D.3.

Model FID Inception Score
WAE-GAN (Tolstikhin et al., 2017) 87.7 4.18 0.04
ALI (Dumoulin et al., 2017) 5.34 0.04
AGE (Ulyanov et al., 2018) 39.51 5.9 0.04
ALICE (Li et al., 2017) 6.02 0.03
-GANs (Rosca et al., 2017) 6.2
AS-VAE (Pu et al., 2017b) 6.3
PD-WGAN, (Gemici et al., 2018) 33.0 6.70 0.09
PAGAN (ours) 32.84 6.56 0.06
Table 1: Inception Score and Fréchet Inception Distance for different methods. IS and FID for other methods were taken from literature (if possible). For AGE we got FID using a pretrained model.
(a) PAGAN samples
(b) AGE samples
(c) PAGAN reconstructions
(d) AGE reconstructions
Figure 2:

Evaluation of Generator and Encoder on CIFAR10 dataset, on plots (c), (d) odd columns denote original images, even stand for corresponding reconstructions on test partition.

Reconstruction Inception Dissimilarity

The traditional approach to estimate the reconstruction quality is to compute RMSE distance from source images to reconstructed ones. However, this metric suffers from focusing on exact reconstruction and is not content aware. RMSE penalizes content-preserving transformations while allows such undesirable effect as blurriness which degrades visual quality significantly. We propose a novel metric

Reconstruction Inception Dissimilarity (RID) which is based on a pre-trained classification network and is defined as follows:


where is a pre-trained classifier that estimates the label distribution given an image. Similar to Salimans et al. (2016) we use a pre-trained Inception Network (Szegedy et al., 2016) to calculate softmax outputs.

AUG 8.89 1.57 0.02
VAE 5.85 44.33 2.27
AGE 6.675 19.02 0.84
PAGANs 8.12 13.01 0.82
Table 2: Evaluation of RMSE an RID metrics on CIFAR10 dataset.

Low RID indicates that the content did not change after reconstruction. To calculate standard deviations, we use the same approach as for IS and split test set on 10 equal parts

333Split is done sequentially without shuffling. Moreover RID is robust to augmentations that do not change the visual content and in this sense is much better than RMSE. To compare new metric with RMSE, we train a vanilla VAE with resnet-like architecture on CIFAR10. We compute RID for its reconstructions and real images with the augmentation (mirror 10% pad + random crop). In Table 2 we show that RMSE for VAE is better in comparison to augmented images (AUG), but we are not satisfied with its reconstructions (see Figure 8 in Appendix D.4), Figure 3 provides even more convincing results. RID allows a fair comparison, for VAE it is dramatically higher (44.33) than for AUG (1.57). Value 1.57 for AUG says that KL divergence is close to zero and thus content is almost not changed. We also provide estimated RID and RMSE for AGE that was publicly available444Pretrained AGE: From Table 2 we see that PAGANs outperform AGE which reflects that our model has better reconstruction quality.

Figure 3: Reconstruction Inception Dissimilarity compared to RMSE. Unlike RMSE, RID captures distortions in image content much more better. Having same RMSE, augmentation has much more lower RID compared to a set of other methods.

Importance of adversarial loss
To prove the importance of an adversarial loss, we experiment replacing adversarial loss with the standard pixel-wise distance between source images and corresponding reconstructions and compared FID, IS and RID metrics. Using an augmentation in this setting is ambiguous. Thus we did not use any augmentation in training of the changed model. Quantitative results for the experiment are provided in Table 3. IS and FID results suggest that our model without adversarial loss performed worse in generation. Reconstruction quality significantly dropped considering RID. Visual results in Appendix D.1 confirm our quantitative findings.

PAGAN 32.84 6.56 0.06 13.01 0.82
PAGAN-L1 76.73 4.46 0.03 30.94 1.58
PAGAN-NOAUG 111.151 4.23 0.06 50.15 2.71
Table 3: Reconstruction Inception Dissimilarity, Inception Score and Fréchet Inception Distance calculated for three setups: 1) the proposed model, PAGAN; 2) PAGAN with for reconstruction loss; 3) PAGAN with augmentation removed. Model without adversarial loss or without augmentation performed worse in both generation and reconstruction tasks.

Importance of augmentation
In ALICE model (Li et al., 2017) an adversarial reconstruction loss was implemented without an augmentation. As we discussed in Section 1 its absence leads to undesirable effects. Here we run an experiment to show that our model without augmentation performs worse. Quantitative results provided in Table 3 illustrate that our model without an augmentation fails to recover both good reconstruction and generation properties. Visual comparisons can be found in Appendix D.2. Using the results obtained from the last two experiments we conclude that adversarial reconstruction loss works significantly better with augmentation.

6 Conclusions

In this paper, we proposed a novel framework with an augmented adversarial reconstruction loss. We introduced RID to estimate reconstructions quality for images. It was empirically shown that this metric could perform content-based comparison of reconstructed images. Using RID, we proved the value of augmentation in our experiments. We showed that the augmented adversarial loss in this framework plays a key role in getting not only good reconstructions but good generated images.

Some open questions are still left for future work. More complex architectures may be used to achieve better IS and RID. The random shift augmentation may not the only possible choice, and other choices remained undiscovered.


  • Abadi et al. (2015) Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.

    TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.

    URL Software available from
  • Ali & Silvey (1966) Syed Mumtaz Ali and Samuel D Silvey. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society. Series B (Methodological), pp. 131–142, 1966.
  • Arjovsky et al. (2017) Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 214–223, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL
  • Brock et al. (2017) Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. Neural photo editing with introspective adversarial networks. ICLR, 2017.
  • Bromley et al. (1993) Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah.

    Signature verification using a "siamese" time delay neural network.

    In Proceedings of the 6th International Conference on Neural Information Processing Systems, NIPS’93, pp. 737–744, San Francisco, CA, USA, 1993. Morgan Kaufmann Publishers Inc. URL
  • Che et al. (2017) Tong Che, Yanran Li, Athul Paul Jacob, Yoshua Bengio, and Wenjie Li. Mode regularized generative adversarial networks. ICLR, 2017.
  • Creswell et al. (2017) Antonia Creswell, Anil A Bharath, and Biswa Sengupta. Conditional autoencoders with adversarial information factorization. arXiv preprint arXiv:1711.05175, 2017.
  • Donahue et al. (2017) Jeff Donahue, Philipp Krähenbühl, and Trevor Darrell. Adversarial feature learning. ICLR, 2017.
  • Dumoulin et al. (2017) Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, and Aaron Courville. Adversarially learned inference. ICLR, 2017.
  • Gemici et al. (2018) Mevlana Gemici, Zeynep Akata, and Max Welling. Primal-dual wasserstein gan. arXiv preprint arXiv:1805.09575, 2018.
  • Gómez-Bombarelli et al. (2018) Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
  • Goodfellow et al. (2014) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
  • Gulrajani et al. (2017) Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved training of wasserstein gans. Mar 2017. URL
  • Heusel et al. (2017) Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Jun 2017. URL Advances in Neural Information Processing Systems 30 (NIPS 2017).
  • Kingma & Welling (2014) Diederik P Kingma and Max Welling. Auto-encoding variational bayes. ICLR, 2014.
  • Kingma et al. (2014) Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems, pp. 3581–3589, 2014.
  • Krizhevsky et al. (2009) Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 (canadian institute for advanced research). 2009. URL
  • Larsen et al. (2015) Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. CoRR, 2015.
  • LeCun & Cortes (2010) Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010. URL
  • Li et al. (2017) Chunyuan Li, Hao Liu, Changyou Chen, Yuchen Pu, Liqun Chen, Ricardo Henao, and Lawrence Carin. Alice: Towards understanding adversarial learning for joint distribution matching. In Advances in Neural Information Processing Systems, pp. 5495–5503, 2017.
  • Liu et al. (2015) Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In

    Proceedings of International Conference on Computer Vision (ICCV)

    , December 2015.
  • Makhzani et al. (2016) Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. Adversarial autoencoders. ICLR, 2016.
  • Mescheder et al. (2017) Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 2391–2400, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL
  • Miyato et al. (2018) Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. ICLR, 2018.
  • Nguyen et al. (2008) XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization. In Advances in neural information processing systems, pp. 1089–1096, 2008.
  • Nowozin et al. (2016) Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-gan: Training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems, pp. 271–279, 2016.
  • Pu et al. (2017a) Yuchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li, and Lawrence Carin. Adversarial symmetric variational autoencoder. In Advances in Neural Information Processing Systems, pp. 4330–4339, 2017a.
  • Pu et al. (2017b) Yuchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li, and Lawrence Carin. Adversarial symmetric variational autoencoder. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems 30, pp. 4330–4339. Curran Associates, Inc., 2017b. URL
  • Rezende et al. (2014) Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra.

    Stochastic backpropagation and approximate inference in deep generative models.

    ICML, 2014.
  • Rosca et al. (2017) Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, and Shakir Mohamed. Variational approaches for auto-encoding generative adversarial networks. arXiv preprint arXiv:1706.04987, 2017.
  • Salimans et al. (2016) Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pp. 2234–2242, 2016.
  • Szegedy et al. (2016) Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016

    , pp. 2818–2826, 2016.
    doi: 10.1109/CVPR.2016.308. URL
  • Tolstikhin et al. (2017) Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein auto-encoders. Nov 2017. URL
  • Ulyanov et al. (2018) Dmitry Ulyanov, Andrea Vedaldi, and Victor S. Lempitsky. It takes (only) two: Adversarial generator-encoder networks. In AAAI. AAAI Press, 2018.
  • Villani (2008) Cédric Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
  • Zhu et al. (2017) Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros.

    Unpaired image-to-image translation using cycle-consistent adversarial networks.

    In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.

Appendix A Proofs

a.1 Proof of Proposition 1 (optimal discriminator)

See 1


For fixed generator and encoder, the value function with respect to the discriminator is


Let us introduce new variables and notations




Using the results of the paper Goodfellow et al. (2014) we obtain


a.2 Proof of Proposition 2

See 2


As in the paper Goodfellow et al. (2014) we rewrite the value function for the optimal discriminator as follows


Appendix B Extending PAGANs

b.1 -divergence PAGANs

-GANs (Nowozin et al., 2016) are the generalization of GAN approach. Nowozin et al. (2016) introduces the model which minimizes the -divergence (Ali & Silvey, 1966) between the true distribution and the model distibution , i.e. it solves the optimization problem


where is a convex, lower-semicontinuous function satisfying .

The minimax game for -GANs is defined as


where is a value function and is a Fenchel conjugate of (Nguyen et al., 2008). For fixed parameters , the optimal is . Then the value function for optimal parameters equals to -divergence between the distributions and (Nguyen et al., 2008), i.e.


We can straightforwardly extend the definition of PAGANs model to -PAGANs. We just introduce for each matching problem the -GAN value function, i.e.

  • generator matching:

  • encoder matching:

  • reconstruction matching:


b.2 Wasserstein PAGANs

Arjovsky et al. (2017) proposed WGANs model for minimizing the Wasserstein-1 distance between the distributions and , i.e.


Because the distance is intractable they consider solving the Kantorovich-Rubinstein dual problem (Villani, 2008)


As in Section B.1 we can easily extend the PAGANs model to WPAGANs. In each matching problem the corresponding distance between distributions will be Wasserstein-1 distance.

Appendix C Other models and experiment details

c.1 Training Wasserstein PAGAN

As another concurrent approach to match implicit distributions we can use Wasserstein distance. Recent empirical works showed promising results (Gulrajani et al., 2017; Gemici et al., 2018) and thus they are interesting to compare with. As mentioned above we still need a critic to work on pairs of images. Unlike GAN frameworks it is desirable to have a strong critic. A channel wise concatenation for pairs worked the best in sense of visual quality and training stability. As a default choice to improve Wasserstein distance optimization we applied the gradient penalty proposed in Gulrajani et al. (2017)

. To apply the gradient penalty for a critic on pairs we have to interpolate between pairs

and . There are still two choices:

  • shared alpha

  • independent alpha for each part


Empirically we found no differences in results and in further experiments used shared alpha as a default choice. The gradient penalty strength parameter was set to 10 as recommended by Gulrajani et al. (2017). We used 10 discriminator steps per 1 generator/encoder step for WPAGAN to slightly improve quality in this setting, other parameters were unchanged. In Table 4 we present results for Wasserstein loss used instead of standard GAN objective in PAGAN model. While having good reconstructions this type of loss failed to achieve good generation results.

WPAGAN 52.29 5.62 0.09 13.44 0.44
Table 4: Inception Score and Fréchet Inception Distance for Wasserstein PAGAN.

Appendix D Images

d.1 PAGAN-L1 Visual Results

(a) CIFAR10 samples from PAGAN-L1
(b) CIFAR10 reconstructions from PAGAN-L1
Figure 4: Evaluation of Generator and Encoder trained on CIFAR10 dataset with adversarial loss replaced with loss. On plot (b) odd columns denote original images, even stand for corresponding reconstructions on test partition

d.2 PAGAN-NOAUG Visual Results

(a) CIFAR10 samples from PAGAN-NOAUG
(b) CIFAR10 reconstructions from PAGAN-NOAUG
Figure 5: Evaluation of Generator and Encoder trained on CIFAR10 dataset with removed augmentation. On plot (b) odd columns denote original images, even stand for corresponding reconstructions on test partition

d.3 PAGAN Visual Results

(a) MNIST samples from PAGAN
(b) MNIST reconstructions from PAGAN
Figure 6: Evaluation of Generator and Encoder trained on MNIST dataset. On plot (b) odd columns denote original images, even stand for corresponding reconstructions on test partition
(a) celebA samples from PAGAN
(b) celebA reconstructions from PAGAN
Figure 7: Evaluation of Generator and Encoder trained on celebA dataset. On plot (b) odd columns denote original images, even stand for corresponding reconstructions on test partition

d.4 VAE for Reconstruction Inception Score

Figure 8: Reconstructions from VAE used to compute RIS