The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

11/02/2016
by   Chris J. Maddison, et al.
0

The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the graph are low variance unbiased estimators of the gradients of the expected loss. While many continuous random variables have such reparameterizations, discrete random variables lack useful reparameterizations due to the discontinuous nature of discrete states. In this work we introduce Concrete random variables---continuous relaxations of discrete random variables. The Concrete distribution is a new family of distributions with closed form densities and a simple reparameterization. Whenever a discrete stochastic node of a computation graph can be refactored into a one-hot bit representation that is treated continuously, Concrete stochastic nodes can be used with automatic differentiation to produce low-variance biased gradients of objectives (including objectives that depend on the log-probability of latent stochastic nodes) on the corresponding discrete graph. We demonstrate the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks.

READ FULL TEXT
research
03/04/2020

Generalized Gumbel-Softmax Gradient Estimator for Various Discrete Random Variables

Estimating the gradients of stochastic nodes is one of the crucial resea...
research
01/17/2019

GO Gradient for Expectation-Based Objectives

Within many machine learning algorithms, a fundamental problem concerns ...
research
05/22/2018

Implicit Reparameterization Gradients

By providing a simple and efficient way of computing low-variance gradie...
research
07/26/2023

Efficient Learning of Discrete-Continuous Computation Graphs

Numerous models for supervised and reinforcement learning benefit from c...
research
09/01/2022

Testing for the Important Components of Posterior Predictive Variance

We give a decomposition of the posterior predictive variance using the l...
research
08/05/2021

Sparse Communication via Mixed Distributions

Neural networks and other machine learning models compute continuous rep...
research
05/09/2012

Improved Mean and Variance Approximations for Belief Net Responses via Network Doubling

A Bayesian belief network models a joint distribution with an directed a...

Please sign up or login with your details

Forgot password? Click here to reset