1 Introduction
AutoEncoders (AEs) have demonstrated the capability of learning a subspace for dimensionality reduction [22]. However, AEs are not generative. Mathematically speaking, there exist zones where a latent code is not in the support of the latent representation of the input [36]. In order to address this problem, Variational AutoEncoders (VAEs) [16] enforce the latentspace data distribution to be close to a simple distribution, e.g., a unit Gaussian, such that a randomly sampled latent code lies in the support of the latent representation of the given dataset. In practice, VAEs minimize the KLdivergence between the latentspace data distribution and a unit Gaussian [17]. For a similar purpose, instead of measuring the KLdivergence, the Adversarial AutoEncoder (AAE) [25] adopts adversarial training in the latent space to enforce the latentspace data distribution to be a unit Gaussian [3]. The Wasserstein AutoEncoder (WAE) [36] with a GAN penalty (WAEGAN) is the generalization of the AAE with the reconstruction cost being any cost function. When the cost function is quadratic, WAEGAN reproduces AAE.
Existing VAE based methods [26, 18], AAE and WAE transform the distribution of the latent representations of data to a simple distribution such as an isotropic Gaussian. However, many real world datasets, such as facial images, lie in lower dimensional manifolds which are quite different from a simple isotropic Gaussian [9, 1]. Changing the important manifold structure of data into a simple distribution, e.g. a unit Gaussian, obliterates the structure of latentspace data distribution, causes the posterior collapse problem [37] and leads to generating unrealistic images.
In contrast to VAEbased methods in which the distribution of the latent representation of data is transformed to an isotropic Gaussian implicitly using KLdivergence, Generative Adversarial Nets (GANs) [7] are able to transform any given distribution to another distribution theoretically. GAN based methods are receiving increased attention in various applications [34, 29, 39, 21, 11, 10] as well as methodology improvements [32, 2, 8, 28, 27]. A GAN model consists of a generator and a discriminator (or critic): the generator synthesizes data from a simple distribution to fool the discriminator while the discriminator tries to distinguish between the real data and synthetic data. However, training a GAN is solving a MinMax optimization problem [31], which is difficult and unstable in practice [26]. Furthermore, the balance between the discriminator and generator is difficult to control [3].
In this paper, we address the posterior collapse problem in VAE by learning a transformation from a simple distribution (e.g. a unit Gaussian) to the latentspace data distribution. This preserves the structure of the data in the latent space. Traditionally, in order to achieve this goal, a GAN with two networks (a generator and a discriminator) is used. However, the adversarial training process is not well understood theoretically. In contrast, our proposed method computes the Optimal Transport (OT) map directly based on the theoretical analysis presented in [20]. We only train a discriminator in the latent space and the OT which transforms a sample from a simple distribution to a latentspace data distribution is explicitly derived from the discriminator output. As we use an AutoEncoder (AE) to find the latent space and use OT to perform distribution transformation in the latent space, we name our method as AEOT. The OT has a well understood theory, and hence we can use transparent theoretic model to transform a distribution. Figure 1 shows the workflow of AEOT.
In contrast to the Wasserstein AutoEncoder (WAE) in which the Wasserstein Distance (WD) is defined in the original image space, in AEOT, the WD is defined in the latent space. Both the objective and the training protocol are different. WAE requires solving a difficult MinMax optimization problem, while AEOT only needs to solve a minimization problem in the latent space. So, training AEOT is much easier than training WAE. It is worthy noting that several OT methods have been proposed in literature. [35, 6] compute discrete OT from OT primal formulation, and thus are not suitable for generative models. [30] learns the OT mapping using kernel methods, whose parameters are difficult to choose. [33] needs to train three networks to learn the transport mapping, which is harder than AEOT.
The contributions of this paper are the following:
1) We propose a novel generative AutoEncoder. Different from existing VAE based methods which map a latentspace data distribution to a simple distribution, the proposed generative model transforms a simple distribution to the latentspace data distribution. In this way, the intrinsic data structure in the latent space is preserved, and thus the posterior collapse problem [37] is addressed.
2) We show that if the cost function is quadratic, then once the optimal discriminator is obtained, the generator can be explicitly obtained from the discriminator output. AEOT can achieve the same goal of a GAN in the latent space, but AEOT only needs to solve a minimization problem in the latent space rather than the difficult MinMax optimization problem in GANs.
3) Experiments on an eightGaussian toy dataset demonstrate that the computed OT can model the multicluster distributions. Qualitative and quantitative results on the MNIST [19] dataset show that AEOT performs better than VAEs and WAE. Images generated on the CelebA [24] dataset show that AEOT generates much better facial images than VAE and WAE.
In this remainder of this paper, we shall review optimal transport first, and then introduce our proposed generative model, followed by experimental results.
2 Optimal Transport
Since our method is based on Optimal Transport (OT), we first introduce the background of OT. OT is a powerful tool to handle probability measure transformations. For details one could refer to
[20] [38].2.1 Optimal Transport theory
In this subsection, we will introduce basic concepts and theorems in classic optimal transport theory, focusing on Kantorovich potential and Brenier’s approach of solving the Monge problem defined in Problem 1.
Let , be two subsets of a dimensional Euclidean space, with measures and be probability measures defined on and , respectively. Also we require that they have equal total measure, i.e
Definition 1 (MeasurePreserving Map)
A map is measure preserving if for any measurable set , the set is measurable and
(1) 
The problem of optimal transport arises from minimizing the total cost of moving all particles from one place (i.e. source) to another place (i.e. target), given the cost of moving each unit of mass. Formally speaking, we define a cost function on , such that for every and . The total transport cost of moving particles with density at to density at is defined to be
(2) 
Eq. (2) can also be rewritten as , where is the push forward map induced by . Now we could define the Monge’s problem of optimal transport.
Problem 1
[Monge’s Optimal Transport [4]] Given a transport cost function , find a measure preserving map that minimizes the total transport cost
(3) 
Note that in Monge’s problem a map is required, whose existence is not guaranteed given an arbitrary pair of measures and . For example when is a Dirac measure and is an arbitrary measure absolutely continuous to Lebesgue measure in . This means that given and , the feasible solution set to Monge’s problem might be empty. Therefore, Kontorovich introduced a relaxation of Monge’s problem [13] in 1940s. Instead of considering a transport map, he considered the set of transport plan. Mathematically, given source measure on and target measure on
, a transport plan is a joint distribution
on , such that(4) 
Intuitively, if for some set , , there will be mass moved from to in plan . Now the total cost of transport plan is
(5) 
And the MongeKantorovich problem is defined as
(6) 
among all transport plans , where and are projection maps from onto and respectively.
To solve the () problem, we consider its dual form, known as the Kantorovich problem [38],
(7) 
where and are real functions defined on and . One of the key observations to solve (DP) is based on the concept of ctransform.
Definition 2 (ctransform)
Given a real function , the ctransform of is defined by
It can be shown [38] that by replacing with in (7), the value of the energy to be maximized will not decrease. Therefore we could just search for the optimal to solve the Kantorovich problem:
(8) 
Here is called Kantorovich potential.
Formula (8) can be rewritten into simpler forms, if we narrow down the choice of cost functions. For example, if we choose the cost function , i.e the distance, then the transform has the property , given being Lipschitz [38]. And (8) becomes
(9) 
However, the Kantorovich potential
is usually parameterized by a Deep Neural Network (DNN). To restrict a DNN to be 1Lipschitz is very difficult
[2]. Our method adopts the distance as the cost function, because in this case once we computed the optimal Kontorovich potential, the transport map can be written down in explicit form [20]. Since the Brenier potential is the transport map (corresponding to the generator in GANs), we shall introduce Brenier theorem [5] below. Suppose is a second order continuous convex function. Its gradient map is defined asTheorem 1 (Brenier[5])
Suppose and are the Euclidean space , and the transport cost is the quadratic Euclidean distance . If is absolutely continuous and and
have finite second order moments, then there exists a convex function
, whose gradient map gives the solution to the Monge’s problem, where is called the Brenier potential. Furthermore, the optimal transport map is unique.Here is called the Brenier potential. In GANs, the discriminator is served as the Kantorovich potential, and the generator is served as the Brenier potential. The following theorem [20] establishes the relationship between the Brenier potential and the Kantorovich potential:
Theorem 2
Given and on a compact domain there exists an optimal transport plan for the cost with being strictly convex. It is unique and of the form , provided is absolutely continuous and is negligible. Moreover, there exists a Kantorovich potential , and can be represented as
In particular, if we choose , then
(10) 
The above theorem shows that when the discriminator is optimal, the generator can be directly computed from the discriminator. We will employ this important property in our generative model.
3 The Proposed Generative Model
Existing VAE based methods try to transform the latent data distribution to a simple distribution. However, this changes the intrinsic data structure in the latent space and finally leads to the posterior collapse problem [37]. In our method, we want to preserve the intrinsic data structure in the latent space, but conversely transform a simple distribution to the latent data distribution.
Different from existing methods, we first pretrain an AE, and then we fix the AE and compute the OT map to transform a simple distribution to the latentspace data distribution, such that the intrinsic data structure is preserved. we name our method as AEOT. AEOT solves the optimal transport problem in distance.
AEOT has a strong geometric motivation. A manifold can be arbitrarily complex, and therefore, learning to transform a simple distribution to the complex manifold can be very difficult. There exists a onetoone mapping from every neighborhood of a manifold to an Euclidean space which is typically of low dimension. One can get an intuition from [20] Figure 1. Thus, we propose to apply an AE to find the low dimensional space, and then perform distribution transformation in the latent space. This is much easier than transforming distributions in the original input space.
3.1 Learning the Optimal Transport in Latent Space
In this part, we propose to transform the latent space distribution by only training a discriminator. According to Brenier’s theorem 1, the transport map or generator can be explicitly expressed by using the gradient of the optimal discriminator. Training a discriminator is a minimization problem which is much easier than solving a MinMax optimization problem. As in real applications, we are given the empirical distribution. We give the discrete case of optimal transport below:
3.1.1 Discrete Case of Optimal Transport
A generative model can be defined once we find the optimal transport map from a simple distribution to the distribution of real data. To carry out the computational tasks, we introduce basic ideas when the probability measures and are defined on discrete sets.
Let and denote the two disjoint sets of indices. Suppose and are discrete subsets of , and the cost function is defined by , where are positive real numbers. Suppose the source measure and the target measure . A transport plan is a real function that takes values on such that , and . We rewrite the total transport cost (5) as
(11) 
The MongeKantorovich problem then can be rewritten as
(12)  
The MongeKantorovich dual problem, which in this case is actually the dual form of (12), is
(13)  
) are linear programming problems, and thus can be solved with generic linear programming methods
[14], such as dual simplex method.Then, we introduce the learning the optimal transport in the latent space.
3.1.2 Training Phase
Denote by all the data in the given dataset, where is the index set of training samples. Denote by and the pretrained encoder and decoder on all the given images. First, we use the encoder to get the latent codes of the data . Then, we learn a OT map in the latent space from a simple distribution, unit Gaussian for example, to the empirical distribution formed by . Since the OT map can be computed from the Kantorovich potential, we learn the Kantorovich potential by a twostep manner proposed in [23]. In the first step, we solve Eq. (13) by solving the following linear programming problem:
(14)  
where are sampled from a simple distribution, and , where is the batch size. is the fusion of and . , , and .
In the second step, we employ a deep neural network parameterized by to regress the output provided by (14):
(15) 
However, when the dimensionality of the latent space is high, the number of data is sparse in the high dimension space. Eq. (10) does not necessarily hold given sparse data for computing the Kantorovich potential. Since Eq. (10) is the first order optimum condition of , each is mapped to . Empirically, we can approximate the mapping from to by the following ordering function:
(16) 
In the latent space, we compute the OT matching from random samples to the latent representation of given data using the following ordering function:
(17) 
Instead of optimizing Eq. (15), we optimize the following regularized regression problem:
(18) 
where is a tradeoff parameter. The second term is used to regularize the behavior of ’s gradient w.r.t. its input. The total loss ensures that approximates well in both value and the first order derivative to .
3.1.3 Generating Phase
After we solve (18), intuitively, is a smooth approximation of . The OT for a noise is given below using Eq. (10):
(19) 
After we obtain the mapped latent code, we employ the pretrained decoder to get a generated image .
Figure 1 shows the workflow of AEOT. The Encoder and Decoder are trained and then fixed. In the training phase, the latent representation of a sample is a real data for the discriminator, while a noise sampled from a simple distribution is a fake data. We train the discriminator using the twostep computation mentioned in this section. In the generating phase, sample a noise and feed into the Decoder to produce a generated data . Algorithm 1 and Algorithm 2 present the training and generating phases of AEOT, respectively.
4 Experiments


To demonstrate the effectiveness of the proposed method, we 1) evaluate AEOT on a eightGaussian toy dataset; 2) compare AEOT against VAE [16] and WAE [36] for generative modeling on MNIST [19] and CelebA [24] datasets. AEOT has a strong geometric motivation and is therefore more suitable for data that has strong manifold structures, for example, handwritten digits and human faces. For WAE, we use the GAN penalty proposed in [36]. In the AEOT implementation, we set for the eightGaussian toy dataset, and for the MNIST and CelebA datasets. We use Adam [15] for optimization and set and in all the experiments. for the eightGaussian experiment; for the experiments on the MNIST and CelebA datasets.






Network Architecture: For the AutoEncoder in AEOT, we use the vanilla AutoEncoder [12]. For the discriminator of AEOT on the eightGaussian dataset, we use the network architecture used in WGANGP [8]
on the eightGaussian dataset. It is a four layer MutiLayer Perceptions (MLP), with the number of nodes in each hidden layer being 512 and in the output layer being 1. We use ReLU as the nonlinear activation function in all layers. For the discriminator of AEOT on the MNIST and CelebA datasets, we use a six layer MLP, with the number of nodes in each hidden layer being 512 and in the output layer being 1. We use LeakyReLU as the nonlinear activation function in all layers and set the slop parameter to 0.2.
4.1 Results on the EightGuassian Toy Dataset
Dataset Description: Following previous work [8]
, we generate a toy dataset consists of eight 2D Gaussian distributions as the real data distribution. The eight Gaussians are centered at
, , , , , , , with . From each Gaussian we sample 32 data points. Therefore the dataset consists of 256 2D points to represent the real data distribution. The synthetic data is sampled from a Gaussian distribution centered at with , from which we sample 256 synthetic data points.On this 2D toy dataset, we do not use AE for dimensionality reduction. We train a discriminator using our proposed OT, and use the discriminator to map the synthetic data points to a set of new data points. We evaluate whether the transformed new data points form a distribution that is similar to the real data distribution. The surface values of the discriminator are plotted in Figure 2 (a) and (b) after 5 and 10000 discriminator iterations, respectively. The blue points in Figure 2 (a) and (b) are synthetic data points, while the green points are the real data points. The red points are computed from the synthetic data points using the proposed OT. In Figure 2 (a), we can see that the generated empirical distribution (red points) is still very close to synthetic distribution since the discriminator has only updated 5 times. After 10000 number of discriminator iterations, the red points form the distribution analogous to the real data distribution. This shows that even though we do not use a generator to generate synthetic samples, with only the discriminator, the source distribution can be transformed to the target distribution. Also, we do not use the regularization term in our method. Computing the Kantorovich potential only on the synthetic data points gives the accurate transport map to transform synthetic data to real data. From this experiment we can see that the proposed method can model the multicluster distributions, which is considered as a difficult task for generative models.
4.2 Results on the MNIST dataset
Method 
MNIST 

VAE 
1.76 0.11 
WAE  1.64 0.09 
AEOT  1.78 0.13 
In this subsection, we compare our method against VAE and WAE on the MNIST dataset. The images are resized to
. For VAE, we use the vanilla AutoEncoder. The dimensionality of the latent space is set to 10 for all the methods. For VAE, WAE and AE in AEOT, we train 1000 epochs on the MNIST dataset. For OT in AEOT, we perform 200K iterations.
In all the methods, we randomly sample 64 noises and feed them into different generative models. The images generated by different methods are shown in Figure 3. From Figure 3 (a) we can see that the brightness of the digits generated by VAE are lower compared to AEOT. There are some digits that are unclear and incomplete. Figure 3 (b) shows digits generated by WAE. Many images are blurry, mainly because WAE trains three networks simultaneously, and thus it cannot learn a desired reconstruction network. Digits produced by AEOT are visually better than those by VAE and WAE. In addition, we list the Inception Scores (IS) [32] for different methods in Table 1. From this table we can see that the highest IS is achieved by AEOT. This experiment shows that the AEOT is better than VAE and WAE. The main reason is that AEOT preserves the manifold structure of the data in the latent space.
4.3 Results on the CelebA dataset
We compare our method against VAE and WAE on the CelebA dataset. The images are cropped to 128128. The dimensionality of the latent space is set to 100 for all the methods. We train VAE and AE in AEOT for 300 epochs on this dataset. For OT in AEOT, we perform 200K iterations. Our experiment with WAE on this dataset crashes during training. The results shown by WAE are generated just before it crashes.
We sample 64 random noises in the latent space to generate images for all the methods. The generated faces are shown in Figure 4. From this figure we can see that many faces generated by VAE are distorted. VAE tends to mix the face and the background, because VAE enforces the distribution of latent representation of the face manifold to be a unit Gaussian. This distorts the intrinsic representation of the face manifold. Images generated by WAE shown in Figure 4 (b) are very blurry and many faces are incomplete. WAE crashes because it jointly trains the encoder, decoder and the discriminator in the latent space. There are competitions among the three networks thus making the training of WAE unstable. In contrast, faces produced by AEOT (Figure 4 (c)) are visually much better than VAE and WAE. The generated face images are clear, complete and recognizable. The faces and the background are well separated, thanks to the property of AEOT, with which the structure of the latent representation of the face manifold is preserved.
In order to verify that the latent space learned by AEOT is smooth, we randomly sample two noises, map them into the latent space using OT. We interpolate between the mapped vectors and forward them into the decoder to generate faces. Figure
5 shows the interpolation of the faces generated by AEOT. From this figure we can see that the interpolation of the faces generated by AEOT preserves clear facial structure and smooth transition of facial appearance. This manifests that the face manifold in the latent space is well preserved by AEOT.5 Conclusion
In this work, we propose a novel generative model named AEOT. Instead of enforcing the distribution of data in the latent space to a simple distribution as proposed in VAE, which leads to the posterior collapse problem. AEOT transforms a simple distribution to the data distribution in the latent space. In this way, the manifold structure of data in the latent space is preserved, and the posterior collapse problem is addressed. Moreover, In order to avoid the MinMax optimization problem in a GAN, we propose an OT to generate data from a welltrained discriminator. AEOT computes the optimal transport map directly in the latent space with explicitly theoretic interpretation. Results on the eightGaussian dataset show that the learned OT is capable of handling multicluster distributions. Qualitative and quantitative results on the MNIST dataset show that the AEOT generates better digits than VAE and WAE. Results on the CelebA dataset show that the AEOT generates much better faces than VAE and WAE, and preserves the manifold structure in the latent space.
In future work, we will try to solve the optimal transport more accurately.
References
 [1] O. Arandjelović. Unfolding a face: from singular to manifold. In ACCV, pages 203–213. Springer, 2009.
 [2] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In ICML, pages 214–223, 2017.
 [3] D. Berthelot, T. Schumm, and L. Metz. Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717, 2017.
 [4] N. Bonnotte. From knothe’s rearrangement to brenier’s optimal transport map. SIAM Journal on Mathematical Analysis, 45(1):64–87, 2013.
 [5] Y. Brenier. Polar factorization and monotone rearrangement of vectorvalued functions. Communications on pure and applied mathematics, 44(4):375–417, 1991.
 [6] S. Ferradans, N. Papadakis, G. Peyré, and J.F. Aujol. Regularized discrete optimal transport. SIAM Journal on Imaging Sciences, 7(3):1853–1882, 2014.
 [7] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
 [8] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In NIPS, pages 5769–5779, 2017.
 [9] X. He, S. Yan, Y. Hu, P. Niyogi, and H.J. Zhang. Face recognition using laplacianfaces. IEEE transactions on pattern analysis and machine intelligence, 27(3):328–340, 2005.
 [10] L. Hou, A. Agarwal, D. Samaras, T. M. Kurc, R. R. Gupta, and J. H. Saltz. Unsupervised histopathology image synthesis. arXiv preprint arXiv:1712.05021, 2017.
 [11] S. Iizuka, E. SimoSerra, and H. Ishikawa. Globally and locally consistent image completion. ACM Transactions on Graphics (TOG), 36(4):107, 2017.

[12]
U. Jain, Z. Zhang, and A. G. Schwing.
Creativity: Generating diverse questions using variational autoencoders.
In CVPR, pages 5415–5424, 2017.  [13] L. V. Kantorovich. On a problem of monge. Journal of Mathematical Sciences, 133(4):1383–1383, 2006.
 [14] N. Karmarkar. A new polynomialtime algorithm for linear programming. In STOC, pages 302–311. ACM, 1984.
 [15] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [16] D. P. Kingma and M. Welling. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
 [17] S. Kullback and R. A. Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
 [18] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
 [19] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 [20] N. Lei, K. Su, L. Cui, S.T. Yau, and D. X. Gu. A geometric view of optimal transportation and generative model. arXiv preprint arXiv:1710.05488, 2017.
 [21] Y. Li, S. Liu, J. Yang, and M.H. Yang. Generative face completion. In CVPR, volume 1, page 6, 2017.
 [22] C.Y. Liou, J.C. Huang, and W.C. Yang. Modeling word perception using the elman network. Neurocomputing, 71(1618):3150–3157, 2008.
 [23] H. Liu, G. Xianfeng, and D. Samaras. A twostep computation of the exact gan wasserstein distance. In ICML, pages 3165–3174, 2018.
 [24] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In ICCV, 2015.
 [25] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
 [26] L. Mescheder, S. Nowozin, and A. Geiger. Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. arXiv preprint arXiv:1701.04722, 2017.
 [27] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
 [28] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. In ICLR, 2018.
 [29] V. Nguyen, T. F. Y. Vicente, M. Zhao, M. Hoai, and D. Samaras. Shadow detection with conditional generative adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, pages 4520–4528. IEEE, 2017.

[30]
M. Perrot, N. Courty, R. Flamary, and A. Habrard.
Mapping estimation for discrete optimal transport.
In NIPS, pages 4197–4205, 2016.  [31] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, 2016.
 [32] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In NIPS, pages 2234–2242, 2016.
 [33] V. Seguy, B. B. Damodaran, R. Flamary, N. Courty, A. Rolet, and M. Blondel. Largescale optimal transport and mapping estimation. arXiv preprint arXiv:1711.02283, 2017.

[34]
Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras.
Neural face editing with intrinsic image disentangling.
In
Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on
, pages 5444–5453. IEEE, 2017.  [35] F. Stavropoulou and J. Müller. Parametrization of random vectors in polynomial chaos expansions via optimal transportation. SIAM Journal on Scientific Computing, 37(6):A2535–A2557, 2015.
 [36] I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf. Wasserstein autoencoders. In ICLR, 2018.
 [37] A. van den Oord, O. Vinyals, et al. Neural discrete representation learning. In NIPS, pages 6306–6315, 2017.
 [38] C. Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
 [39] J.Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired imagetoimage translation using cycleconsistent adversarial networks. arXiv preprint arXiv:1703.10593, 2017.
Comments
There are no comments yet.