1 Introduction
The variational autoencoder (VAE) [22, 35]
is a powerful method for generative modeling and unsupervised learning. It consists of a generator model that transforms a noise vector to a signal such as image via a topdown convolutional network (also called deconvoluitional network due to its topdown nature). It also consists of an inference model that infers the latent vector from the image via a bottomup network. The VAE has seen many applications in image and video synthesis
[12, 3]and unsupervised and semisupervised learning
[37, 23].Despite its success, the VAE suffers from relatively weak synthesis quality compared to methods such as generative adversarial net (GANs) [11, 34] that are based on adversarial learning. While combing VAE objective function with the GAN objective function can improve the synthesis quality, such a combination is rather ad hoc. In this paper, we shall pursue a more systematic integration of variational learning and adversarial learning. Specifically, instead of employing a discriminator as in GANs, we recruit a latent energybased model (EBM) to mesh with VAE seamlessly in a joint training scheme.
The generator model in VAE is a directed model, with a known prior distribution on the latent vector, such as Gaussian white noise distribution, and a conditional distribution of the image given the latent vector. The advantage of such a model is that it can generate synthesized examples by direct ancestral sampling. The generator model defines a joint probability density of the latent vector and the image in a topdown scheme. We may call this joint density the generator density.
VAE also has an inference model, which defines the conditional distribution of the latent vector given the image. Together with the data distribution that generates the observed images, they define a joint probability density of the latent vector and the image in a bottomup scheme. We may call this joint density the joint data density, where the latent vector may be considered as the missing data in the language of the EM algorithm [6].
As we shall explain later, the VAE amounts to joint minimization of the KullbackLeibler divergence from data density to the generator density, where the joint minimization is over the parameters of both the generator model and the inference model. In this minimization, the generator density seeks to cover the modes of the data density, and as a result, the generator density can be overly dispersed. This may partially explain VAE’s lack of synthesis quality.
Unlike the generator network, the latent EBM is an undirected model. It defines an unnormalized joint density on the latent vector and the image via a joint energy function. Such undirected form enables the latent EBM to better approximate the data density than the generator network. However, the maximum likelihood learning of latent EBM requires (1) inference sampling: sampling from the conditional density of the latent vector given the observed example and (2) synthesis sampling: sampling from the joint density of the latent vector and the image. Both inference sampling and synthesis sampling require time consuming Markov chain Monte Carlo (MCMC).
In this paper, we propose to jointly train the VAE and the latent EBM so that these two models can borrow strength from each other. The objective function of the joint training method consists of the KullbackLeibler divergences between three joint densities of the latent vector and the image, namely the data density, the generator density, and the latent EBM density. The three KullbackLeilber divergences form an elegant symmetric and antisymmetric form of divergence triangle that integrates the variational learning and the adversarial learning seamlessly.
The joint training is beneficial to both the VAE and the latent EBM. The latent EBM has a more flexible form and can approximate the data density better than the generator model. It serves as a critic of the generator model by judging it against the data density. To the generator model, the latent EBM serves as a surrogate of the data density and a target density for the generator model to approximate. The generator model and the associated inference model, in return, serve as approximate synthesis sampler and inference sampler of the latent EBM, thus relieving the latent EBM of the burden of MCMC sampling.
Our experiments show that the joint training method can learn the generator model with strong synthesis ability. It can also learn energy function that is capable of anomaly detection.
2 Contributions and related work
The following are contributions of our work. (1) We propose a joint training method to learn both VAE and latent EBM. The objective function is of a symmetric and antisymmetric form of divergence triangle. (2) The proposed method integrates variational and adversarial learning. (3) The proposed method integrates the research themes initiated by the Boltzmann machine and Helmholtz machine.
The following are the themes that are related to our work.
(1) Variational and adversarial learning. Over the past few years, there has been an explosion in research on variational learning and adversarial learning, inspired by VAE [22, 35, 37, 12] and GAN [11, 34, 2, 41] respectively. One aim of our work is to find a natural integration of variational and adversarial learning. We also compare with some prominent methods along this theme in our experiments. Notably, adversarially learned inference (ALI) [9, 7] combines the learning of the generator model and inference model in an adversarial framework. It can be improved by adding conditional entropy regularization as in more recent methods ALICE [26] and SVAE [4]. Though these methods are trained using joint discriminator on image and latent vector, such a discriminator is not a probability density, thus it is not a latent EBM.
(2) Helmholtz machine and Boltzmann machine. Before VAE and GAN took over, Boltzmann machine [1, 17, 36] and Helmholtz machine [16] are two classes of models for generative modeling and unsupervised learning. Boltzmann machine is the most prominent example of latent EBM. The learning consists of two phases. The positive phase samples from the conditional distribution of the latent variables given the observed example. The negative phase samples from the joint distribution of the latent variables and the image. The parameters are updated based on the differences between statistical properties of positive and negative phases. Helmholtz machine can be considered a precursor of VAE. It consists of a topdown generation model and the bottomup recognition model. The learning also consists of two phases. The wake phase infers the latent variables based on the recognition model and updates the parameters of the generation model. The sleep phase generates synthesized data from the generation model and updates the parameters of the recognition model. Our work seeks to integrate the two classes of models.
(3) Divergence triangle for joint training of generator network and energybased model. The generator and energybased model can be trained separately using maximum likelihood criteria as in [13, 33], and they can also be trained jointly as recently explored by [20, 38, 14, 24]. In particular, [14] proposes a divergence triangle criterion for joint training. Our training criterion is also in the form of divergence triangle. However, the EBM in these papers is only defined on the image and there is no latent vector in the EBM. In our work, we employ latent EBM which defines a joint density on the latent vector and the image, thus this undirected joint density is more natural match to the generator density and the data density, both are directed joint densities of the latent vector and the image.
3 Models and learning
3.1 Generator network
Let be the dimensional latent vector. Let be the dimensional signal, such as an image. VAE consists of a generator model, which defines a joint probability density
(1) 
in a topdown direction, where is the known prior distribution of the latent vector
, such as uniform distribution or Gaussian white noise, i.e.,
, where is thedimensional identity matrix.
is the conditional distribution of given . A typical form of is such that where is parametrized by a topdown convolutional network (also called the deconvolutional network due to the topdown direction), with collecting all the weight and bias terms of the network. is the residual noise, and usually it is assumed .The generator network is a directed model. We call the generator density. can be sampled directly by first sampling and then sampling given . This is sometimes called ancestral sampling in the literature [30].
The marginal distribution . It is not in closed form. Thus the generator network is sometimes called the implicit generative model in the literature.
The inference of can be based on the posterior distribution of given , i.e., . is not in closed form.
3.2 Inference model
VAE assumes an inference model with a separate set of parameters . One example of is , where is the dimensional mean vector, and is the
dimensional diagonal variancecovariance matrix. Both
and can be parametrized by bottomup convolutional networks, whose parameters are denoted by . The inference model is a closed form approximation to the true posterior .3.3 Data density
Let be the distribution that generates the observed images. In practice, expectation with respect to can be approximated by the average over the observed training examples.
The reason we use the notation to denote the data distribution is that can be naturally combined with the inference model , so that we have the joint density
(2) 
The above is also a directional density in that it can be factorized in a bottomup direction. We may call the joint density the data density, where in the terminology of the EM algorithm, we may consider as the missing data, and
as the imputation model of the missing data.
3.4 Vae
The topdown generator density and the bottomup data density form a natural pair. As noted by [14], VAE can be viewed as the following joint minimization
(3) 
where for two densities and in general, is the KullbackLeibler divergence between and .
To connect the above joint minimization to the usual form of VAE,
(4)  
(5) 
where for two joint densities and , we define .
Since , the joint minimization problem is equivalent to the joint maximization of
(6)  
which is the lower bound of the loglikelihood used in VAE [22].
It is worth of noting that the wakesleep algorithm [16] for training the Helmholtz machine consists of (1) wake phase: , and (2) sleep phase: . The sleep phase reverses the order of KLdivergence.
3.5 Latent EBM
Unliked the directed joint densities in the generator network, and in the data density, the latent EBM defines an undirected joint density, albeit an unnormalized one:
(7) 
where is the energy function (a term originated from statistical physics) defined on the image and the latent vector . is the normalizing constant. It is usually intractable, and is an unnormalized density. The most prominent example of latent EBM is the Boltzmann machine [1, 17], where consists of pairwise potentials. In our work, we first encode into a vector and then concatenate this vector with the vector , and then get by a network defined on the concatenated vector.
3.6 Inference and synthesis sampling
Let be the marginal density of latent EBM. The maximum likelihood learning of is based on because minimizing is equivalent to maximizing the loglikelihood . The learning gradient is
(8) 
This is a well known result in latent EBM [1, 25]. On the right hand of the above equation, the two expectations can be approximated by Monte Carlo sampling. For each observed image, sampling from is to infer from . We call it the inference sampling. In the literature, it is called the positive phase [1, 17]. It is also called clamped sampling where is an observed image and is fixed. Sampling from is to generate synthesized examples from the model. We call it the synthesis sampling. In the literature, it is called the negative phase. It is also called unclamped sampling where is also generated from the model.
4 Joint training
4.1 Objective function of joint training
We have the following three joint densities.
(1) The generator density .
(2) The data density .
(3) The latent EBM density .
We propose to learn the generator model parametrized by , the inference model parametrized by , and the latent EBM parametrized by by the following divergence triangle:
(9)  
(10) 
where all the densities , , and are joint densities of .
The above objective function is in an symmetric and antisymmetric form. The antisymmetry is caused by the negative sign in front of and the maximization over .
4.2 Learning of latent EBM
For learning the latent EBM, is equivalent to minimizing
(11) 
In the above minimization, seeks to get close to the data density and get away from . Thus serves as a critic of by comparing against . Because of the undirected form of , it can be more flexible than the directional in approximating .
The gradient of the above minimization is
(12) 
Comparing Eqn.12 to Eqn.8, we replace by in the inference sampling in the positive phase, and we replace by in the synthesis sampling in the negative phase. Both and can be sampled directly. Thus the joint training enables the latent EBM to avoid MCMC in both inference sampling and synthesis sampling. In other words, the inference model serves as an approximate inference sampler for latent EBM, and the generator network serves as an approximate synthesis sampler for latent EBM.
4.3 Learning of generator network
For learning the generator network, is equivalent to minimizing
(13) 
where the gradient can be computed as:
(14) 
In , appears on the right hand side of KLdivergence. Minimizing this KLdivergence with respect to requires to cover all the major modes of . If is not flexible enough, it will strain itself to cover all the major modes, and as a result, it will make overdispersed than . This may be the reason that VAE tends to suffer in synthesis quality.
However, in the second term, , appears on the left hand side of KLdivergence, and , which seeks to get close to and get away from in its dynamics, serves as a surrogate for data density , and a target for . Because appears on the left hand side of KLdivergence, it has the mode chasing behavior, i.e., it may chase some major modes of (surrogate of ), while it does not need to cover all the modes. Also note that in , we do not need to know because it is a constant as far as is concerned.
Combine the above two KLdivergences, approximately, we minimize a symmetrized version of KLdivergence (assuming is close to ). This will correct the overdispersion of VAE, and improve the synthesis quality of VAE.
4.4 An adversarial chasing game
The dynamics of is that it seeks to get close to the data density and get away from . But the dynamics of is that it seeks to get close to (and at the same time also get close to the data density ). This defines an adversarial chasing game, i.e., runs toward and runs from , while chases . As a result, leads toward . and form an actorcritic pair.
4.5 Learning of inference model
The learning of the inference model can be based on , which is equivalent to minimizing
(15) 
seeks to be close to relative to . That is, seeks to be the inference model for . Meanwhile, seeks to be close to . This is also a chasing game. leads to be close to .
The gradient of in Eqn.15 can be readily computed as:
(16) 
We may also learn by minimizing
(17) 
where we let to be close to both and in variational approximation.
4.6 Algorithm
The latent EBM, generator and inference model can be jointly trained using stochastic gradient descent based on Eqn.
12, Eqn.4.3 and Eqn.4.5. In practice, we use sample average to approximate the expectation.Synthesis and inference sampling. The expectations for gradient computation are based on generator density and data density . To approximate the expectation of generator density , we perform synthesis sampling through , to get samples . To approximate the expectation of data density , we perform inference sampling through to get samples . Both and are assumed to be Gaussian, therefore we have:
(18) 
where is the topdown deconvolutional network for the generator model (see Sec 3.1), and and are bottomup convolutional networks for the mean vector and the diagonal variancecovariance matrix of the inference model (see Sec 3.2). We follow the common practice [11] to have directly from generator network, i.e., . Note that the synthesis sample and the inference sample are functions of the generator parameter and the inference parameter respectively which ensure gradient backpropagation.
Model learning. The obtained synthesis samples and inference samples can be used to approximate the expectations in model learning. Specifically, for latent EBM learning, the gradient in Eqn.12 can be approximated by:
(19) 
For inference model, the gradient in Eqn.4.5 can be approximated by:
(20) 
For generator model, the gradient in Eqn.4.3 can be approximated by:
(21) 
Notice that the gradients in Eqn.4.6 and Eqn.4.6 on synthesis samples and inference samples can be easily backpropagated using Eqn.4.6. The detailed training procedure is presented in Algorithm 1.
5 Experiments
In this section, we evaluate the proposed model on four tasks: image generation, test image reconstruction, outofdistribution generalization and anomaly detection. The learning of inference model is based on Eqn.15, and we also tested the alternative way to train infernce model using Eqn.17 for generation and reconstruction. We mainly consider 4 datasets which includes CIFAR10, CelebA [27], Largescale Scene Understand (LSUN) dataset [39] and MNIST. We will describe the datasets in more detail in the following relevant subsections. All the training image datasets are resized and scaled to
with no further preprocessing. All network parameters are initialized with zeromean gaussian with standard deviation 0.02 and optimized using Adam
[21]. We adopt the similar deconvolutional network structure as in [34]for generator model and the “mirror” convolutional structure for inference model. Both structures involve batch normalization
[19]. For joint energy model , we use multiple layers of convolution to transform the observation and the latent factor , then concatenate them at the higher layer which shares similarity as in [26]. Spectral normalization is used as suggested in [29]. We refer to our implementation ^{1}^{1}1https://hthth0801.github.io/jointLearning/ for details.5.1 Image generation
In this experiment, we evaluate the visual quality of the generated samples. The welllearned generator network could generate samples that are realistic and share visual similarities as the training data. We mainly consider three commonused datasets including CIFAR10, CelebA [27] and LSUN [39] for generation and reconstruction evaluation. The CIFAR10 contains 60,000 color images of size of which 50,000 images are for training and 10,000 are for testing. For CelebA dataset, we resize them to be and randomly select 10,000 images of which 9,000 are for training and 1,000 are for testing. For LSUN dataset, we select the bedroom category which contains roughly 3 million images and resize them to . We separate 10,000 images for testing and use the rest for training. The qualitative results are shown in Figure 1.
We further evaluate our model quantitatively by using Frechet Inception Distance (FID) [28] in Table 1. We compare with baseline models including VAE [22], DCGAN [34], WGAN [2], CoopNet [38], ALICE [26], SVAE [4] and SNGAN [29]. The FID scores are from the relevant papers and for missing evaluations, we reevaluate them by utilizing their released codes or reimplement using the similar structures and optimal parameters as indicated in their papers. From Table 1, our model achieves competitive generation performance compared to listed baseline models. Further compared to [14] which has 7.23 Inception Score (IS) on CIFAR10 and 31.9 FID on CelebA, our model has 7.17 IS and 24.7 FID respectively. It can be shown that our joint training can greatly improve the synthesis quality compared to VAE alone. Note that the SNGAN [29] get better generation on CIFAR10 which has relatively small resolution, while on other datasets that have relatively high resolution and diverse patterns, our model obtains more favorable results and has more stable training.
Model  VAE  DCGAN  WGAN  CoopNet  ALICE  SVAE  SNGAN  Ours(+)  Ours 
CIFAR10  109.5  37.7  40.2  33.6  48.6  43.5  29.3  33.3  30.1 
CelebA  99.09  38.4  36.4  56.6  46.1  40.7  50.4  29.5  24.7 
LSUN  175.2  70.4  67.7  35.4  72    67.8  31.4  27.3 
5.2 Testing image reconstruction
In this experiment, we evaluate the accuracy of the learned inference model by testing image reconstruction. The well trained inference model should not only help to learn the latent EBM model but also learn to match the true posterior of the generator model. Therefore, in practice, the welllearned inference model can be balanced to render both realistic generation as we show in previous section as well as faithful reconstruction on testing images.
We evaluate the model on holdout testing set of CIFAR10, CelebA and LSUNbedroom. Specifically, we use its own 10,000 testing images for CIFAR10, 1,000 and 10,000 holdout testing images for CelebA and LSUNbedroom. The testing images and the corresponding reconstructions are shown in Figure 2. We also quantitatively compare with baseline models (ALI [9], ALICE [26], SVAE [4]) using Rooted Mean Square Error (RMSE). Note that for this experiment, we only compare with the relevant baseline models that contain joint discriminator on and could achieve the decent generation quality. Besides, we do not consider the GANs and their variants because they have no inference model involved and are infeasible for image reconstruction. Table 2 shows the results. VAE is naturally integrated into our probabilistic model for joint learning. However, using VAE alone can be extremely ineffective on complex dataset. Our model instead achieves both the high generation quality as well as the accurate reconstruction.
Model  CIFAR10  CelebA  LSUNbedroom 

VAE  0.192  0.197  0.164 
ALI  0.558  0.720   
ALICE  0.185  0.214  0.181 
SVAE  0.258  0.209   
Ours(+)  0.184  0.208  0.169 
Ours  0.177  0.190  0.169 
5.3 Outofdistribution generalization
In this experiment, we evaluate the outofdistribution (OOD) detection using the learned latent EBM . If the energy model is welllearned, then the training image, together with its inferred latent factor, should forms the local energy minima. Unseen images from other distribution other than training ones should be assigned to relatively higher energies. This is closely related to the model of associative memory as observed by Hopfield [18].
We learn the proposed model on CIFAR10 training images, then utilize the learned energy model to classify the CIFAR10 testing images from other OOD images using energy value (i.e., negative log likelihood). We use area under the ROC curve (AUROC) scores as our OOD metric following
[15] and we use Textures [5], uniform noise, SVHN [32] and CelebA images as OOD distributions (Figure 4 provides CIFAR10 test images and examples of OOD images). We compare with ALICE [26], SVAE [4] and the recent EBM [8]as our baseline models. The CIFAR10 training for ALICE, SVAE follow their optimal networks and hyperparameters and scores for EBM are taken directly from
[8]. Table 3 shows the AUROC scores. We also provide histograms of relative likelihoods for OOD distributions in Figure 3 which can further verify that images from OOD distributions are assigned to relatively low log likelihood (i.e., high energy) compared to the training distribution. Our latent EBM could be learned to assign low energy for training distribution and high energy for data that comes from OOD distributions.Model  SVHN  Uniform  Texture  CelebA 
EBM  0.63  1.0  0.48   
ALICE  0.29  0.0  0.40  0.48 
SVAE  0.42  0.29  0.5  0.52 
Ours  0.68  1.0  0.56  0.56 
5.4 Anomaly detection
In this experiment, we take a closer and more general view of the learned latent EBM with applications to anomaly detection. Unsupervised anomaly detection is one of the most important problems in machine learning and offers great potentials in many areas including cybersecurity, medical analysis and surveillance etc. It is similar to the outofdistribution detection discussed before, but can be more challenging in practice because the anomaly data may come from the distribution that is similar to and not entirely apart from the training distribution. We evaluate our model on MNIST benchmark dataset.
MNIST The dataset contains 60,000 grayscale images of size depicting handwritten digits. Following the same experiment setting as [24, 40], we make each digit class an anomaly and consider the remaining 9 digits as normal examples. Our model is trained with only normal data and tested with both normal and anomalous data. We use energy function as our decision function and compare with the BiGANbased anomaly detection model [40], the recent MEG [24] and the VAE model using area under the precisionrecall curve (AUPRC) as in [40]. Table 4 shows the results.
Holdout  VAE  MEG  BiGAN  Ours 

1  0.063  0.281 0.035  0.287 0.023  0.297 0.033 
4  0.337  0.401 0.061  0.443 0.029  0.723 0.042 
5  0.325  0.402 0.062  0.514 0.029  0.676 0.041 
7  0.148  0.290 0.040  0.347 0.017  0.490 0.041 
9  0.104  0.342 0.034  0.307 0.028  0.383 0.025 
and results for our model are averaged over last 10 epochs to account for variance.
6 Conclusion
This paper proposes a joint training method to learn both the VAE and the latent EBM simultaneously, where the VAE serves as an actor and the latent EBM serves as a critic. The objective function is of a simple and compact form of divergence triangle that involves three KLdivergences between three joint densities on the latent vector and the image. This objective function integrates both variational learning and adversarial learning. Our experiments show that the joint training improves the synthesis quality of VAE, and it learns reasonable energy function that is capable of anomaly detection.
Learning wellformed energy landscape remains a challenging problem, and our experience suggests that the learned energy function can be sensitive to the setting of hyperparameters and within the training algorithm. In our further work, we shall further improve the learning of the energy function. We shall also explore joint training of models with multiple layers of latent variables in the styles of Helmholtz machine and Boltzmann machine.
Acknowledgment
The work is supported by DARPA XAI project N660011724029; ARO project W911NF1810296; and ONR MURI project N000141612007; and Extreme Science and Engineering Discovery Environment (XSEDE) grant ASC170063.
References
 [1] (1985) A learning algorithm for boltzmann machines. Cognitive science 9 (1), pp. 147–169. Cited by: §2, §3.5, §3.6.
 [2] (2017) Wasserstein generative adversarial networks. In International Conference on Machine Learning, pp. 214–223. Cited by: §2, §5.1.
 [3] (2017) Stochastic variational video prediction. arXiv preprint arXiv:1710.11252. Cited by: §1.

[4]
(2018)
Symmetric variational autoencoder and connections to adversarial learning
. InInternational Conference on Artificial Intelligence and Statistics
, pp. 661–669. Cited by: §2, §5.1, §5.2, §5.3. 
[5]
(2014)
Describing textures in the wild.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 3606–3613. Cited by: §5.3.  [6] (1977) Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological), pp. 1–38. Cited by: §1.
 [7] (2016) Adversarial feature learning. arXiv preprint arXiv:1605.09782. Cited by: §2.
 [8] (2019) Implicit generation and generalization in energybased models. arXiv preprint arXiv:1903.08689. Cited by: §5.3.
 [9] (2016) Adversarially learned inference. arXiv preprint arXiv:1606.00704. Cited by: §2, §5.2.
 [10] (2016) Deep learning. MIT press. Cited by: §4.3.
 [11] (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1, §2, §4.6.

[12]
(2015)
Draw: a recurrent neural network for image generation
. arXiv preprint arXiv:1502.04623. Cited by: §1, §2.  [13] (2017) Alternating backpropagation for generator network.. In AAAI, Vol. 3, pp. 13. Cited by: §2.
 [14] (2019) Divergence triangle for joint training of generator model, energybased model, and inferential model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8670–8679. Cited by: §2, §3.4, §5.1.
 [15] (2016) A baseline for detecting misclassified and outofdistribution examples in neural networks. arXiv preprint arXiv:1610.02136. Cited by: §5.3.
 [16] (1995) The” wakesleep” algorithm for unsupervised neural networks. Science 268 (5214), pp. 1158–1161. Cited by: §2, §3.4.
 [17] (2006) A fast learning algorithm for deep belief nets. Neural computation 18 (7), pp. 1527–1554. Cited by: §2, §3.5, §3.6.
 [18] (1982) Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences 79 (8), pp. 2554–2558. Cited by: §5.3.
 [19] (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §5.
 [20] (2016) Deep directed generative models with energybased probability estimation. arXiv preprint arXiv:1606.03439. Cited by: §2.
 [21] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.
 [22] (2013) Autoencoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §1, §2, §3.4, §5.1.
 [23] (2014) Semisupervised learning with deep generative models. In Advances in neural information processing systems, pp. 3581–3589. Cited by: §1.
 [24] (2019) Maximum entropy generators for energybased models. arXiv preprint arXiv:1901.08508. Cited by: §2, §5.4, Table 4.
 [25] (2006) A tutorial on energybased learning. Predicting structured data 1 (0). Cited by: §3.6.
 [26] (2017) Alice: towards understanding adversarial learning for joint distribution matching. In Advances in Neural Information Processing Systems, pp. 5495–5503. Cited by: §2, §5.1, §5.2, §5.3, §5.
 [27] (201512) Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), Cited by: §5.1, §5.
 [28] (2017) Are gans created equal? a largescale study. arXiv preprint arXiv:1711.10337. Cited by: §5.1.
 [29] (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957. Cited by: §5.1, §5.
 [30] (2016) Learning in implicit generative models. arXiv preprint arXiv:1610.03483. Cited by: §3.1.
 [31] (2012) Machine learning: a probabilistic perspective. MIT press. Cited by: §4.3.
 [32] (2011) Reading digits in natural images with unsupervised feature learning. Cited by: §5.3.
 [33] (2019) On the anatomy of mcmcbased maximum likelihood learning of energybased models. arXiv preprint arXiv:1903.12370. Cited by: §2.
 [34] (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: §1, §2, §5.1, §5.

[35]
(2014)
Stochastic backpropagation and approximate inference in deep generative models
. arXiv preprint arXiv:1401.4082. Cited by: §1, §2.  [36] (2010) Efficient learning of deep boltzmann machines. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 693–700. Cited by: §2.
 [37] (2016) Ladder variational autoencoders. In Advances in neural information processing systems, pp. 3738–3746. Cited by: §1, §2.
 [38] (2016) Cooperative training of descriptor and generator networks. arXiv preprint arXiv:1609.09408. Cited by: §2, §5.1.
 [39] (2015) Lsun: construction of a largescale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365. Cited by: §5.1, §5.
 [40] (2018) Efficient ganbased anomaly detection. arXiv preprint arXiv:1802.06222. Cited by: §5.4.
 [41] (2016) Energybased generative adversarial network. arXiv preprint arXiv:1609.03126. Cited by: §2.
Comments
There are no comments yet.