Spherical Latent Spaces for Stable Variational Autoencoders

08/31/2018
by   Jiacheng Xu, et al.
0

A hallmark of variational autoencoders (VAEs) for text processing is their combination of powerful encoder-decoder models, such as LSTMs, with simple latent distributions, typically multivariate Gaussians. These models pose a difficult optimization problem: there is an especially bad local optimum where the variational posterior always equals the prior and the model does not use the latent variable at all, a kind of "collapse" which is encouraged by the KL divergence term of the objective. In this work, we experiment with another choice of latent distribution, namely the von Mises-Fisher (vMF) distribution, which places mass on the surface of the unit hypersphere. With this choice of prior and posterior, the KL divergence term now only depends on the variance of the vMF distribution, giving us the ability to treat it as a fixed hyperparameter. We show that doing so not only averts the KL collapse, but consistently gives better likelihoods than Gaussians across a range of modeling conditions, including recurrent language modeling and bag-of-words document modeling. An analysis of the properties of our vMF representations shows that they learn richer and more nuanced structures in their latent representations than their Gaussian counterparts.

READ FULL TEXT

page 6

page 9

research
04/04/2019

Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling

Recurrent Variational Autoencoder has been widely used for language mode...
research
09/30/2019

On the Importance of the Kullback-Leibler Divergence Term in Variational Autoencoders for Text Generation

Variational Autoencoders (VAEs) are known to suffer from learning uninfo...
research
11/01/2022

Improving Variational Autoencoders with Density Gap-based Regularization

Variational autoencoders (VAEs) are one of the powerful unsupervised lea...
research
06/08/2020

The Power Spherical distribution

There is a growing interest in probabilistic models defined in hyper-sph...
research
12/11/2020

A Log-likelihood Regularized KL Divergence for Video Prediction with A 3D Convolutional Variational Recurrent Network

The use of latent variable models has shown to be a powerful tool for mo...
research
03/25/2019

Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

Variational autoencoders (VAEs) with an auto-regressive decoder have bee...
research
06/23/2020

Simple and Effective VAE Training with Calibrated Decoders

Variational autoencoders (VAEs) provide an effective and simple method f...

Please sign up or login with your details

Forgot password? Click here to reset