An Information-Theoretic Analysis of Deep Latent-Variable Models

11/01/2017 ∙ by Alexander A. Alemi, et al. ∙ 0

We present an information-theoretic framework for understanding trade-offs in unsupervised learning of deep latent-variables models using variational inference. This framework emphasizes the need to consider latent-variable models along two dimensions: the ability to reconstruct inputs (distortion) and the communication cost (rate). We derive the optimal frontier of generative models in the two-dimensional rate-distortion plane, and show how the standard evidence lower bound objective is insufficient to select between points along this frontier. However, by performing targeted optimization to learn generative models with different rates, we are able to learn many models that can achieve similar generative performance but make vastly different trade-offs in terms of the usage of the latent variable. Through experiments on MNIST and Omniglot with a variety of architectures, we show how our framework sheds light on many recent proposed extensions to the variational autoencoder family.



There are no comments yet.


page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


  • Achille & Soatto (2016) Achille, A. and Soatto, S. Information Dropout: Learning Optimal Representations Through Noisy Computation. In Information Control and Learning, September 2016. URL
  • Achille & Soatto (2017) Achille, A. and Soatto, S. Emergence of Invariance and Disentangling in Deep Representations.

    Proceedings of the ICML Workshop on Principled Approaches to Deep Learning

    , 2017.
  • Agakov (2006) Agakov, Felix Vsevolodovich. Variational Information Maximization in Stochastic Environments. PhD thesis, University of Edinburgh, 2006.
  • Alemi et al. (2017) Alemi, Alexander A, Fischer, Ian, Dillon, Joshua V, and Murphy, Kevin. Deep Variational Information Bottleneck. In ICLR, 2017.
  • Ballé et al. (2017) Ballé, J., Laparra, V., and Simoncelli, E. P. End-to-end Optimized Image Compression. In ICLR, 2017.
  • Barber & Agakov (2003) Barber, David and Agakov, Felix V. Information maximization in noisy channels : A variational approach. In NIPS. 2003.
  • Bell & Sejnowski (1995) Bell, Anthony J and Sejnowski, Terrence J. An information-maximization approach to blind separation and blind deconvolution. Neural computation, 7(6):1129–1159, 1995.
  • Bowman et al. (2016) Bowman, Samuel R, Vilnis, Luke, Vinyals, Oriol, Dai, Andrew M, Jozefowicz, Rafal, and Bengio, Samy. Generating sentences from a continuous space. CoNLL, 2016.
  • Chen et al. (2017) Chen, X., Kingma, D. P., Salimans, T., Duan, Y., Dhariwal, P., Schulman, J., Sutskever, I., and Abbeel, P. Variational Lossy Autoencoder. In ICLR, 2017.
  • Chen et al. (2016) Chen, Xi, Duan, Yan, Houthooft, Rein, Schulman, John, Sutskever, Ilya, and Abbeel, Pieter. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. arXiv preprint 1606.03657, 2016.
  • Germain et al. (2015) Germain, Mathieu, Gregor, Karol, Murray, Iain, and Larochelle, Hugo.

    Made: Masked autoencoder for distribution estimation.

    In ICML, 2015.
  • Gregor et al. (2016) Gregor, Karol, Besse, Frederic, Rezende, Danilo Jimenez, Danihelka, Ivo, and Wierstra, Daan. Towards conceptual compression. In Advances In Neural Information Processing Systems, pp. 3549–3557, 2016.
  • Ha & Eck (2018) Ha, David and Eck, Doug. A neural representation of sketch drawings. International Conference on Learning Representations, 2018. URL
  • Higgins et al. (2017) Higgins, Irina, Matthey, Loic, Pal, Arka, Burgess, Christopher, Glorot, Xavier, Botvinick, Matthew, Mohamed, Shakir, and Lerchner, Alexander. -VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In ICLR, 2017.
  • Hinton & Van Camp (1993) Hinton, Geoffrey E and Van Camp, Drew.

    Keeping the neural networks simple by minimizing the description length of the weights.


    Proc. of the Workshop on Computational Learning Theory

    , 1993.
  • Hoffman & Johnson (2016) Hoffman, Matthew D and Johnson, Matthew J. Elbo surgery: yet another way to carve up the variational evidence lower bound. In

    NIPS Workshop in Advances in Approximate Bayesian Inference

    , 2016.
  • Huszár (2017) Huszár, Ferenc. Is maximum likelihood useful for representation learning?, 2017. URL
  • Johnston et al. (2017) Johnston, N., Vincent, D., Minnen, D., Covell, M., Singh, S., Chinen, T., Hwang, S. J., Shor, J., and Toderici, G. Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks. ArXiv e-prints, 2017.
  • Kingma & Welling (2014) Kingma, Diederik P and Welling, Max. Auto-encoding variational Bayes. In ICLR, 2014.
  • Kingma et al. (2016) Kingma, Diederik P, Salimans, Tim, Jozefowicz, Rafal, Chen, Xi, Sutskever, Ilya, and Welling, Max. Improved variational inference with inverse autoregressive flow. In NIPS. 2016.
  • Lake et al. (2015) Lake, Brenden M., Salakhutdinov, Ruslan, and Tenenbaum, Joshua B. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  • Larochelle & Murray (2011) Larochelle, Hugo and Murray, Iain. The neural autoregressive distribution estimator. In AI/Statistics, 2011.
  • Makhzani et al. (2016) Makhzani, Alireza, Shlens, Jonathon, Jaitly, Navdeep, and Goodfellow, Ian. Adversarial autoencoders. In ICLR, 2016.
  • Papamakarios et al. (2017) Papamakarios, George, Murray, Iain, and Pavlakou, Theo. Masked autoregressive flow for density estimation. In NIPS. 2017.
  • Phuong et al. (2018) Phuong, Mary, Welling, Max, Kushman, Nate, Tomioka, Ryota, and Nowozin, Sebastian. The mutual autoencoder: Controlling information in latent code representations, 2018. URL
  • Rezende et al. (2014) Rezende, Danilo Jimenez, Mohamed, Shakir, and Wierstra, Daan.

    Stochastic backpropagation and approximate inference in deep generative models.

    In ICML, 2014.
  • Salimans et al. (2017) Salimans, Tim, Karpathy, Andrej, Chen, Xi, and Kingma, Diederik P. Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. In ICLR, 2017.
  • Shamir et al. (2010) Shamir, Ohad, Sabato, Sivan, and Tishby, Naftali. Learning and generalization with the information bottleneck. Theoretical Computer Science, 411(29):2696 – 2711, 2010.
  • Slonim et al. (2005) Slonim, Noam, Atwal, Gurinder Singh, Tkačik, Gašper, and Bialek, William. Information-based clustering. PNAS, 102(51):18297–18302, 2005.
  • Tishby & Zaslavsky (2015) Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), 2015.
  • Tishby et al. (1999) Tishby, N., Pereira, F.C., and Biale, W. The information bottleneck method. In The 37th annual Allerton Conf. on Communication, Control, and Computing, pp. 368–377, 1999. URL
  • Tomczak & Welling (2017) Tomczak, J. M. and Welling, M. VAE with a VampPrior. ArXiv e-prints, 2017.
  • van den Oord et al. (2017) van den Oord, Aaron, Vinyals, Oriol, and kavukcuoglu, koray. Neural discrete representation learning. In NIPS. 2017.
  • Zhao et al. (2017) Zhao, Shengjia, Song, Jiaming, and Ermon, Stefano. Infovae: Information maximizing variational autoencoders. arXiv preprint 1706.02262, 2017.
  • Zhao et al. (2018) Zhao, Shengjia, Song, Jiaming, and Ermon, Stefano. The information-autoencoding family: A lagrangian perspective on latent variable generative modeling, 2018. URL