
Glow: Generative Flow with Invertible 1x1 Convolutions
Flowbased generative models (Dinh et al., 2014) are conceptually attractive due to tractability of the exact loglikelihood, tractability of exact latentvariable inference, and parallelizability of both training and synthesis. In this paper we propose <i>Glow</i>, a simple type of generative flow using an invertible 1x1 convolution. Using our method we demonstrate a significant improvement in loglikelihood on standard benchmarks. Perhaps most strikingly, we demonstrate that a generative model optimized towards the plain loglikelihood objective is capable of efficient realisticlooking synthesis and manipulation of large images. The code for our model is available at <a href="https://github.com/openai/glow">https://github.com/openai/glow</a>
07/09/2018 ∙ by Diederik P. Kingma, et al. ∙ 2 ∙ shareread it

Variational Autoencoders and Nonlinear ICA: A Unifying Framework
The framework of variational autoencoders allows us to efficiently learn deep latentvariable models, such that the model's marginal distribution over observed variables fits the data. Often, we're interested in going a step further, and want to approximate the true joint distribution over observed and latent variables, including the true prior and posterior distributions over latent variables. This is known to be generally impossible due to unidentifiability of the model. We address this issue by showing that for a broad family of deep latentvariable models, identification of the true joint distribution over observed and latent variables is actually possible up to a simple transformation, thus achieving a principled and powerful form of disentanglement. Our result requires a factorized prior distribution over the latent variables that is conditioned on an additionally observed variable, such as a class label or almost any other observation. We build on recent developments in nonlinear ICA, which we extend to the case with noisy, undercomplete or discrete observations, integrated in a maximum likelihood framework. The result also trivially contains identifiable flowbased generative models as a special case.
07/10/2019 ∙ by Ilyes Khemakhem, et al. ∙ 2 ∙ shareread it

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixelcnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B subpixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional shortcut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present stateoftheart log likelihood results on CIFAR10 to demonstrate the usefulness of these modifications.
01/19/2017 ∙ by Tim Salimans, et al. ∙ 0 ∙ shareread it

Variational Lossy Autoencoder
Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification. For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture. In this paper, we present a simple but principled method to learn such global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN. Our proposed VAE model allows us to have control over what the global latent code can learn and , by designing the architecture accordingly, we can force the global latent code to discard irrelevant information such as texture in 2D images, and hence the VAE only "autoencodes" data in a lossy fashion. In addition, by leveraging autoregressive models as both prior distribution p(z) and decoding distribution p(xz), we can greatly improve generative modeling performance of VAEs, achieving new stateoftheart results on MNIST, OMNIGLOT and Caltech101 Silhouettes density estimation tasks.
11/08/2016 ∙ by Xi Chen, et al. ∙ 0 ∙ shareread it

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noisesensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speedup of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time. We demonstrate the usefulness of our method on applications in supervised image recognition, generative modelling, and deep reinforcement learning.
02/25/2016 ∙ by Tim Salimans, et al. ∙ 0 ∙ shareread it

Improving Variational Inference with Inverse Autoregressive Flow
The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to highdimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained loglikelihood on natural images, while allowing significantly faster synthesis.
06/15/2016 ∙ by Diederik P. Kingma, et al. ∙ 0 ∙ shareread it

Variational Dropout and the Local Reparameterization Trick
We investigate a local reparameterizaton technique for greatly reducing the variance of stochastic gradients for variational Bayesian inference (SGVB) of a posterior over model parameters, while retaining parallelizability. This local reparameterization translates uncertainty about global parameters into local noise that is independent across datapoints in the minibatch. Such parameterizations can be trivially parallelized and have variance that is inversely proportional to the minibatch size, generally leading to much faster convergence. Additionally, we explore a connection with dropout: Gaussian dropout objectives correspond to SGVB with local reparameterization, a scaleinvariant prior and proportionally fixed posterior variance. Our method allows inference of more flexibly parameterized posteriors; specifically, we propose variational dropout, a generalization of Gaussian dropout where the dropout rates are learned, often leading to better models. The method is demonstrated through several experiments.
06/08/2015 ∙ by Diederik P. Kingma, et al. ∙ 0 ∙ shareread it

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap
Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results.
10/23/2014 ∙ by Tim Salimans, et al. ∙ 0 ∙ shareread it

Efficient GradientBased Inference through Transformations between Bayes Nets and Neural Nets
Hierarchical Bayesian networks and neural networks with stochastic hidden units are commonly perceived as two separate types of models. We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable noncentered parameterizations of the latent variables. The choice of parameterization greatly influences the efficiency of gradientbased posterior inference; we show that they are often complementary to eachother, we clarify when each parameterization is preferred and show how inference can be made robust. In the noncentered form, a simple Monte Carlo estimator of the marginal likelihood can be used for learning the parameters. Theoretical results are supported by experiments.
02/03/2014 ∙ by Diederik P. Kingma, et al. ∙ 0 ∙ shareread it

AutoEncoding Variational Bayes
How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is twofold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.
12/20/2013 ∙ by Diederik P. Kingma, et al. ∙ 0 ∙ shareread it

Fast GradientBased Inference with Continuous Latent Variable Models in Auxiliary Form
We propose a technique for increasing the efficiency of gradientbased inference and learning in Bayesian networks with multiple layers of continuous latent vari ables. We show that, in many cases, it is possible to express such models in an auxiliary form, where continuous latent variables are conditionally deterministic given their parents and a set of independent auxiliary variables. Variables of mod els in this auxiliary form have much larger Markov blankets, leading to significant speedups in gradientbased inference, e.g. rapid mixing Hybrid Monte Carlo and efficient gradientbased optimization. The relative efficiency is confirmed in ex periments.
06/04/2013 ∙ by Diederik P. Kingma, et al. ∙ 0 ∙ shareread it