An Information-Theoretic Analysis of Deep Latent-Variable Models

by   Alexander A. Alemi, et al.

We present an information-theoretic framework for understanding trade-offs in unsupervised learning of deep latent-variables models using variational inference. This framework emphasizes the need to consider latent-variable models along two dimensions: the ability to reconstruct inputs (distortion) and the communication cost (rate). We derive the optimal frontier of generative models in the two-dimensional rate-distortion plane, and show how the standard evidence lower bound objective is insufficient to select between points along this frontier. However, by performing targeted optimization to learn generative models with different rates, we are able to learn many models that can achieve similar generative performance but make vastly different trade-offs in terms of the usage of the latent variable. Through experiments on MNIST and Omniglot with a variety of architectures, we show how our framework sheds light on many recent proposed extensions to the variational autoencoder family.


Avoiding Latent Variable Collapse With Generative Skip Models

Variational autoencoders (VAEs) learn distributions of high-dimensional ...

The Information Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Models

A variety of learning objectives have been proposed for training latent ...

A General Method for Amortizing Variational Filtering

We introduce the variational filtering EM algorithm, a simple, general-p...

Hierarchical Disentangled Representations

Deep latent-variable models learn representations of high-dimensional da...

Fundamental Limitations of Control and Filtering in Continuous-Time Systems: An Information-Theoretic Analysis

While information-theoretic methods have been introduced to investigate ...

Multi-Source Neural Variational Inference

Learning from multiple sources of information is an important problem in...

Benefits of Overparameterization in Single-Layer Latent Variable Generative Models

One of the most surprising and exciting discoveries in supervising learn...


  • Achille & Soatto (2016) Achille, A. and Soatto, S. Information Dropout: Learning Optimal Representations Through Noisy Computation. In Information Control and Learning, September 2016. URL
  • Achille & Soatto (2017) Achille, A. and Soatto, S. Emergence of Invariance and Disentangling in Deep Representations.

    Proceedings of the ICML Workshop on Principled Approaches to Deep Learning

    , 2017.
  • Agakov (2006) Agakov, Felix Vsevolodovich. Variational Information Maximization in Stochastic Environments. PhD thesis, University of Edinburgh, 2006.
  • Alemi et al. (2017) Alemi, Alexander A, Fischer, Ian, Dillon, Joshua V, and Murphy, Kevin. Deep Variational Information Bottleneck. In ICLR, 2017.
  • Ballé et al. (2017) Ballé, J., Laparra, V., and Simoncelli, E. P. End-to-end Optimized Image Compression. In ICLR, 2017.
  • Barber & Agakov (2003) Barber, David and Agakov, Felix V. Information maximization in noisy channels : A variational approach. In NIPS. 2003.
  • Bell & Sejnowski (1995) Bell, Anthony J and Sejnowski, Terrence J. An information-maximization approach to blind separation and blind deconvolution. Neural computation, 7(6):1129–1159, 1995.
  • Bowman et al. (2016) Bowman, Samuel R, Vilnis, Luke, Vinyals, Oriol, Dai, Andrew M, Jozefowicz, Rafal, and Bengio, Samy. Generating sentences from a continuous space. CoNLL, 2016.
  • Chen et al. (2017) Chen, X., Kingma, D. P., Salimans, T., Duan, Y., Dhariwal, P., Schulman, J., Sutskever, I., and Abbeel, P. Variational Lossy Autoencoder. In ICLR, 2017.
  • Chen et al. (2016) Chen, Xi, Duan, Yan, Houthooft, Rein, Schulman, John, Sutskever, Ilya, and Abbeel, Pieter. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. arXiv preprint 1606.03657, 2016.
  • Germain et al. (2015) Germain, Mathieu, Gregor, Karol, Murray, Iain, and Larochelle, Hugo.

    Made: Masked autoencoder for distribution estimation.

    In ICML, 2015.
  • Gregor et al. (2016) Gregor, Karol, Besse, Frederic, Rezende, Danilo Jimenez, Danihelka, Ivo, and Wierstra, Daan. Towards conceptual compression. In Advances In Neural Information Processing Systems, pp. 3549–3557, 2016.
  • Ha & Eck (2018) Ha, David and Eck, Doug. A neural representation of sketch drawings. International Conference on Learning Representations, 2018. URL
  • Higgins et al. (2017) Higgins, Irina, Matthey, Loic, Pal, Arka, Burgess, Christopher, Glorot, Xavier, Botvinick, Matthew, Mohamed, Shakir, and Lerchner, Alexander. -VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In ICLR, 2017.
  • Hinton & Van Camp (1993) Hinton, Geoffrey E and Van Camp, Drew.

    Keeping the neural networks simple by minimizing the description length of the weights.


    Proc. of the Workshop on Computational Learning Theory

    , 1993.
  • Hoffman & Johnson (2016) Hoffman, Matthew D and Johnson, Matthew J. Elbo surgery: yet another way to carve up the variational evidence lower bound. In

    NIPS Workshop in Advances in Approximate Bayesian Inference

    , 2016.
  • Huszár (2017) Huszár, Ferenc. Is maximum likelihood useful for representation learning?, 2017. URL
  • Johnston et al. (2017) Johnston, N., Vincent, D., Minnen, D., Covell, M., Singh, S., Chinen, T., Hwang, S. J., Shor, J., and Toderici, G. Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks. ArXiv e-prints, 2017.
  • Kingma & Welling (2014) Kingma, Diederik P and Welling, Max. Auto-encoding variational Bayes. In ICLR, 2014.
  • Kingma et al. (2016) Kingma, Diederik P, Salimans, Tim, Jozefowicz, Rafal, Chen, Xi, Sutskever, Ilya, and Welling, Max. Improved variational inference with inverse autoregressive flow. In NIPS. 2016.
  • Lake et al. (2015) Lake, Brenden M., Salakhutdinov, Ruslan, and Tenenbaum, Joshua B. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  • Larochelle & Murray (2011) Larochelle, Hugo and Murray, Iain. The neural autoregressive distribution estimator. In AI/Statistics, 2011.
  • Makhzani et al. (2016) Makhzani, Alireza, Shlens, Jonathon, Jaitly, Navdeep, and Goodfellow, Ian. Adversarial autoencoders. In ICLR, 2016.
  • Papamakarios et al. (2017) Papamakarios, George, Murray, Iain, and Pavlakou, Theo. Masked autoregressive flow for density estimation. In NIPS. 2017.
  • Phuong et al. (2018) Phuong, Mary, Welling, Max, Kushman, Nate, Tomioka, Ryota, and Nowozin, Sebastian. The mutual autoencoder: Controlling information in latent code representations, 2018. URL
  • Rezende et al. (2014) Rezende, Danilo Jimenez, Mohamed, Shakir, and Wierstra, Daan.

    Stochastic backpropagation and approximate inference in deep generative models.

    In ICML, 2014.
  • Salimans et al. (2017) Salimans, Tim, Karpathy, Andrej, Chen, Xi, and Kingma, Diederik P. Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. In ICLR, 2017.
  • Shamir et al. (2010) Shamir, Ohad, Sabato, Sivan, and Tishby, Naftali. Learning and generalization with the information bottleneck. Theoretical Computer Science, 411(29):2696 – 2711, 2010.
  • Slonim et al. (2005) Slonim, Noam, Atwal, Gurinder Singh, Tkačik, Gašper, and Bialek, William. Information-based clustering. PNAS, 102(51):18297–18302, 2005.
  • Tishby & Zaslavsky (2015) Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), 2015.
  • Tishby et al. (1999) Tishby, N., Pereira, F.C., and Biale, W. The information bottleneck method. In The 37th annual Allerton Conf. on Communication, Control, and Computing, pp. 368–377, 1999. URL
  • Tomczak & Welling (2017) Tomczak, J. M. and Welling, M. VAE with a VampPrior. ArXiv e-prints, 2017.
  • van den Oord et al. (2017) van den Oord, Aaron, Vinyals, Oriol, and kavukcuoglu, koray. Neural discrete representation learning. In NIPS. 2017.
  • Zhao et al. (2017) Zhao, Shengjia, Song, Jiaming, and Ermon, Stefano. Infovae: Information maximizing variational autoencoders. arXiv preprint 1706.02262, 2017.
  • Zhao et al. (2018) Zhao, Shengjia, Song, Jiaming, and Ermon, Stefano. The information-autoencoding family: A lagrangian perspective on latent variable generative modeling, 2018. URL