Deep generative models such as Generative Adversarial Network (GAN) goodfellow2014generative and Variational Auto-Encoder (VAE) kingma2013auto show remarkable performance in learning the manifold structure of the data. Among other advantages, knowledge of the manifold structure allows interpolating the data points along the manifold, which gives semantically plausible interpolation results.
Toward geometrically grounded interpolations on the manifold, some recent works shao2017riemannian,chen2018metrics proposed a geodesic interpolation method. Unlike previous methods which naïvely followed linear paths in the latent space, geodesic interpolation methods find the shortest paths on the manifold based on the metric induced from the ambient data-space. This shares the philosophy with classic algorithms like Isomap tenenbaum2000global, and well-grounded on the Riemannian geometry. However, when experimented with real-world data, it often gives poor results containing unrealistic interpolation points.
We argue that the problem stems from a topological mismatch between the model and the data. Deep generative models define a simply-connected manifold because the latent space is endowed with a probability distribution of simply-connected support, and the smooth generative mapping111The smoothness of the generative mapping is a necessary condition to use the geodesic interpolation methods. preserves the latent-space topology in the data-space. On the other hand, real-world datasets generally contains disconnected regions or holes which make them non-simply-connected. As this topological difference is a fundamental matter, deep generative models cannot help including the holes in the dataset as valid regions in their manifold representation. In consequence, the geodesic curve often finds a shortcut that passes through the holes and the interpolation points get unrealistic.
To tackle this problem, we propose to add a density regularization term to the path-energy loss. The idea comes from an observation that even though deep generative models cannot be matched to the dataset topology, the maximum-likelihood training forces the models to have low probability densities where the holes exist. Therefore, if a low-density-penalizing term is added to the path-energy loss, minimizing the loss will find a path close to the geodesic while going around the holes denoted by low densities. We demonstrate this method is effective and gives semantically better interpolation results from the experiments.
Geometric Interpolations with Density Regularizer
We denote the latent space as , the input data space as and the coordinates of each space as and
respectively. We do not use bold letter for the vectors or matrices, but bracketsare used when explicit indication of matrix is needed.
Gradient Descent Method to Compute the Geodesic Interpolation
In this section, we introduce a path-energy loss function and corresponding gradient descent method to compute the geodesic curve between two data points.
As the manifold of deep generative model is defined via the latent space and the generative mapping, curves on the manifold can be parameterized in the latent space. The length is measured using the Riemannian metric , which is induced from the Euclidean metric of the ambient data space . The mathematical relationship can be written as using Einstein notation and the Kronecker delta, where the details can be found in chen2018metrics,shao2017riemannian.
Now, let be a curve in a latent space with boundary conditions and . The length of under metric is given as . Instead of directly minimizing the length , minimizing the energy gives the same result yet in an algebraically simpler form, by noting that if 444Can be easily derived using Cauchy-Schwarz Inequality.. Thus the formulation to obtain the geodesic is reduced to
To minimize the energy , we compute the gradient according to a small change to the curve ( with ). This is given as a Gâteaux derivative: , which is the functional analogue of directional derivative.
Note that the Euler-Lagrange equation is derived in a process of finding regardless of , but it is intractable to solve since the metric
described by neural network is too complicated. We instead seek the steepest gradient change, which is derived using the Cauchy-Schwarz inequality,
where is the Christoffel Symbol. Note that is purely related to the local change of the coordinate bases, thus it becomes zero in the input-space coordinates. Also, the metric is described as Kronecker delta in input-space coordinates, so the equation is far simplified as
where is a small change to the curve described in the input-space coordinates.
To summarize, for a curve in , we transform every point on the curve to the input space using the generative mapping, from which the corresponding curve in is obtained. Then, we compute the acceleration to get the steepest gradient descent change toward minimizing the path-length of the curve. By iteratively updating ( is a learning rate), will converge to the shortest path. Finally, putting back to the latent space will give the geodesic in the latent space. More detailed algorithm can be found in shao2017riemannian and the extended version with our proposing regularizer will be described in the next section.
As introduced earlier, the topological difference between the deep generative model and the dataset brings a serious problem in the interpolation. For example, consider a circular dataset and the geodesic between two points in the dataset (Fig. 3). It is natural to think that the geodesic should follow a circular path between two points and pass through dense regions. However, as the hole in the center is a valid region for deep generative models, the geodesic becomes a linear path and passes through the hole (Red-dotted curve in Fig. 3 (d)).
In order to prevent the geodesic from finding this kind of shortcuts, we propose to add a density regularizer term to the energy functional:
where is a regularization weight. The regularization term is in fact an integrated negative log-probability-density along the curve, thus it keeps the curve from passing through low-density regions.
In Fig. 3 (c) it can be seen that the density is low in the central area of the latent-space, which is mapped to the hole of the input data space. Introduction of the regularizer results in an interpolation avoiding this low-density regions, while keeping its path-length as short as possible (Green-dash-dotted curve in Fig. 3 (d)).
Numerical Algorithm for Geodesic Interpolation
As explained in the previous section, geodesic curve can be computed by iteratively updating every point on the curve toward the steepest variational direction . In practice, we approximate the curve with a set of points . As for , we first compute using finite difference method, then pull back the vectors to the latent space using the pseudo-inverse of the Jacobian matrix. The contribution from the gradient of the density regularizer is added to the obtained .
Datasets, Models and Hyperparameters
In experiments, we use two different image datasets: 1) MNIST lecun1998gradient, and 2) Yale belhumeur1997eigenfaces. Yale dataset originally consists of face images of 28 individuals, but only a single person images (under 64 different illumination conditions) are used to examine the illumination manifold.
We use three different deep generative models, where each of which is paired with the datasets. 1) MNIST with GAN goodfellow2014generative, 2) MNIST with VAE kingma2013auto, 3) Yale with DCGAN (Deep Convolutional GAN) radford2015unsupervised.
When running the numerical algorithm to compute the geodesic, number of approximation points is fixed to 35. We observed that too large number of approximation points sometimes yield an entire break-up of the smoothness of the curve due to the approximation noise. Regularizer weight is empirically determined, ranging from 0.1 ~ 0.003 according to each setting.
Interpolations and the Effect of the Regularizer
In Fig. 3, we show the results from geodesic interpolation method, with and without the density regularizer, as well as the results from a naïve method that follow a straight line in . Hereafter, we denote each method as GeodReg, Geod and StraightZ respectively.
In MNIST results, it can be clearly seen that GeodReg outperforms others in terms of quality. GeodReg shows smooth transitions along realistic samples, whereas Geod and StraightZ contain some mottled, non-number-like images. Geod and StraightZ do not consider the probability density, thus cannot avoid low density regions as shown in the log-density plot in the bottom. Geod particularly passes through much lower density regions, but instead has the shortest length among others. Note that these results agree with the toy example shown in Fig. 3 (d).
Lambertian Image Manifold
According to basri2003lambertian, object images taken under various illumination conditions form a 9-D linear manifold if the object has Lambertian reflectance. Yale face dataset matches up to the Lambertian condition in practice lee2005acquiring, thus we examine if our interpolation methods find linear paths on Yale dataset.
In the left pane of the Fig. 4, it can be seen that the length obtained from Geod is shortest thus closest to the true length. In the right pane, it can be seen that the path found by Geod is the straightest in the input space, as it has the smallest cosine dissimilarity to the true linear path. GeodReg deviates from the true path while seeking high density but still close to it compared to StraightZ as seen from the right pane.
In this paper, we have proposed a density-regularized geometric interpolation method for deep generative models trained from non-simply-connected datasets. The density regularizer has played a crucial role to compensate the topological difference between the model and the data. Our method has shown superior interpolation qualities over previous linear and geodesic interpolation methods. ††This work was partly supported by the Korea government (2015-0-00310, 2017-0-01772, 2018-0-00622, KEIT-10060086).
- [Basri and Jacobs2003] Basri, R., and Jacobs, D. W. 2003. Lambertian reflectance and linear subspaces. IEEE transactions on pattern analysis and machine intelligence 25(2):218–233.
- [Belhumeur, Hespanha, and Kriegman1997] Belhumeur, P. N.; Hespanha, J. P.; and Kriegman, D. J. 1997. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on pattern analysis and machine intelligence 19(7):711–720.
[Chen et al.2018]
Chen, N.; Klushyn, A.; Kurle, R.; Jiang, X.; Bayer, J.; and Smagt, P.
Metrics for deep generative models.
International Conference on Artificial Intelligence and Statistics, 1540–1550.
- [Goodfellow et al.2014] Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial nets. In Advances in neural information processing systems, 2672–2680.
- [Kingma and Welling2013] Kingma, D. P., and Welling, M. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- [LeCun et al.1998] LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324.
[Lee, Ho, and
Lee, K.-C.; Ho, J.; and Kriegman, D. J.
Acquiring linear subspaces for face recognition under variable lighting.IEEE Transactions on pattern analysis and machine intelligence 27(5):684–698.
- [Radford, Metz, and Chintala2015] Radford, A.; Metz, L.; and Chintala, S. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
- [Shao, Kumar, and Fletcher2017] Shao, H.; Kumar, A.; and Fletcher, P. T. 2017. The riemannian geometry of deep generative models. arXiv preprint arXiv:1711.08014.
- [Tenenbaum, De Silva, and Langford2000] Tenenbaum, J. B.; De Silva, V.; and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. science 290(5500):2319–2323.