
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees
Monte Carlo Tree Search (MCTS) algorithms perform simulationbased search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play. MCTS has been used by stateoftheart programs for many problems, however a disadvantage to MCTS is that it estimates the values of states with Monte Carlo averages, stored in a search tree; this does not scale to games with very high branching factors. We propose an alternative simulationbased search method, Policy Gradient Search (PGS), which adapts a neural network simulation policy online via policy gradient updates, avoiding the need for a search tree. In Hex, PGS achieves comparable performance to MCTS, and an agent trained using Expert Iteration with PGS was able defeat MoHex 2.0, the strongest opensource Hex agent, in 9x9 Hex.
04/07/2019 ∙ by Thomas Anthony, et al. ∙ 4 ∙ shareread it

Evolution Strategies as a Scalable Alternative to Reinforcement Learning
We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDPbased RL techniques such as Qlearning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, our ES implementation only needs to communicate scalars, making it possible to scale to over a thousand parallel workers. This allows us to solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.
03/10/2017 ∙ by Tim Salimans, et al. ∙ 0 ∙ shareread it

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixelcnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B subpixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional shortcut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present stateoftheart log likelihood results on CIFAR10 to demonstrate the usefulness of these modifications.
01/19/2017 ∙ by Tim Salimans, et al. ∙ 0 ∙ shareread it

Variational Lossy Autoencoder
Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification. For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture. In this paper, we present a simple but principled method to learn such global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN. Our proposed VAE model allows us to have control over what the global latent code can learn and , by designing the architecture accordingly, we can force the global latent code to discard irrelevant information such as texture in 2D images, and hence the VAE only "autoencodes" data in a lossy fashion. In addition, by leveraging autoregressive models as both prior distribution p(z) and decoding distribution p(xz), we can greatly improve generative modeling performance of VAEs, achieving new stateoftheart results on MNIST, OMNIGLOT and Caltech101 Silhouettes density estimation tasks.
11/08/2016 ∙ by Xi Chen, et al. ∙ 0 ∙ shareread it

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noisesensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speedup of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time. We demonstrate the usefulness of our method on applications in supervised image recognition, generative modelling, and deep reinforcement learning.
02/25/2016 ∙ by Tim Salimans, et al. ∙ 0 ∙ shareread it

Improving Variational Inference with Inverse Autoregressive Flow
The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to highdimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained loglikelihood on natural images, while allowing significantly faster synthesis.
06/15/2016 ∙ by Diederik P. Kingma, et al. ∙ 0 ∙ shareread it

A Structured Variational Autoencoder for Learning Deep Hierarchies of Sparse Features
In this note we present a generative model of natural images consisting of a deep hierarchy of layers of latent random variables, each of which follows a new type of distribution that we call rectified Gaussian. These rectified Gaussian units allow spikeandslab type sparsity, while retaining the differentiability necessary for efficient stochastic gradient variational inference. To learn the parameters of the new model, we approximate the posterior of the latent variables with a variational autoencoder. Rather than making the usual meanfield assumption however, the encoder parameterizes a new type of structured variational approximation that retains the prior dependencies of the generative model. Using this structured posterior approximation, we are able to perform joint training of deep models with many layers of latent random variables, without having to resort to stacking or other layerwise training procedures.
02/28/2016 ∙ by Tim Salimans, et al. ∙ 0 ∙ shareread it

Variational Dropout and the Local Reparameterization Trick
We investigate a local reparameterizaton technique for greatly reducing the variance of stochastic gradients for variational Bayesian inference (SGVB) of a posterior over model parameters, while retaining parallelizability. This local reparameterization translates uncertainty about global parameters into local noise that is independent across datapoints in the minibatch. Such parameterizations can be trivially parallelized and have variance that is inversely proportional to the minibatch size, generally leading to much faster convergence. Additionally, we explore a connection with dropout: Gaussian dropout objectives correspond to SGVB with local reparameterization, a scaleinvariant prior and proportionally fixed posterior variance. Our method allows inference of more flexibly parameterized posteriors; specifically, we propose variational dropout, a generalization of Gaussian dropout where the dropout rates are learned, often leading to better models. The method is demonstrated through several experiments.
06/08/2015 ∙ by Diederik P. Kingma, et al. ∙ 0 ∙ shareread it

Improved Techniques for Training GANs
We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semisupervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, nor do we require the model to be able to learn well without using any labels. Using our new techniques, we achieve stateoftheart results in semisupervised classification on MNIST, CIFAR10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR10 samples that yield a human error rate of 21.3 ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.
06/10/2016 ∙ by Tim Salimans, et al. ∙ 0 ∙ shareread it

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap
Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results.
10/23/2014 ∙ by Tim Salimans, et al. ∙ 0 ∙ shareread it

FixedForm Variational Posterior Approximation through Stochastic Linear Regression
We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the KullbackLeibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several examples illustrate the speed and accuracy of our approximation method in practice.
06/28/2012 ∙ by Tim Salimans, et al. ∙ 0 ∙ shareread it
Tim Salimans
is this you? claim profile