InfoVAE: Information Maximizing Variational Autoencoders

by   Shengjia Zhao, et al.

It has been previously observed that variational autoencoders tend to ignore the latent code when combined with a decoding distribution that is too flexible. This undermines the purpose of unsupervised representation learning. In this paper, we additionally show that existing training criteria can lead to extremely poor amortized inference distributions and overestimation of the posterior variance, even when trained to optimality. We identify the reason for both short-comings in the regularization term used in the ELBO criterion to match the variational posterior to the latent prior distribution. We propose a class of training criteria termed InfoVAE that solves the two problems. We show that these models maximize the mutual information between input and latent features, make effective use of the latent features regardless of the flexibility of the decoding distribution, and avoid the variance over-estimation problem. Through extensive qualitative and quantitative analyses, we demonstrate that our models do not suffer from these problems, and outperform models trained with ELBO on multiple metrics of performance.


page 4

page 8


Forget-me-not! Contrastive Critics for Mitigating Posterior Collapse

Variational autoencoders (VAEs) suffer from posterior collapse, where th...

Preventing posterior collapse in variational autoencoders for text generation via decoder regularization

Variational autoencoders trained to minimize the reconstruction error ar...

High Mutual Information in Representation Learning with Symmetric Variational Inference

We introduce the Mutual Information Machine (MIM), a novel formulation o...

Mutual Information Constraints for Monte-Carlo Objectives

A common failure mode of density models trained as variational autoencod...

Deterministic Decoding for Discrete Data in Variational Autoencoders

Variational autoencoders are prominent generative models for modeling di...

Multi-Facet Clustering Variational Autoencoders

Work in deep clustering focuses on finding a single partition of data. H...

Associative Compression Networks for Representation Learning

This paper introduces Associative Compression Networks (ACNs), a new fra...