Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators

10/07/2021
by   Alexander Shekhovtsov, et al.
0

Discrete and especially binary random variables occur in many machine learning models, notably in variational autoencoders with binary latent states and in stochastic binary networks. When learning such models, a key tool is an estimator of the gradient of the expected loss with respect to the probabilities of binary variables. The straight-through (ST) estimator gained popularity due to its simplicity and efficiency, in particular in deep networks where unbiased estimators are impractical. Several techniques were proposed to improve over ST while keeping the same low computational complexity: Gumbel-Softmax, ST-Gumbel-Softmax, BayesBiNN, FouST. We conduct a theoretical analysis of bias and variance of these methods in order to understand tradeoffs and verify the originally claimed properties. The presented theoretical results allow for better understanding of these methods and in some cases reveal serious issues.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2021

Double Control Variates for Gradient Estimation in Discrete Latent Variable Models

Stochastic gradient-based optimisation for discrete latent variable mode...
research
02/14/2020

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

We derive an unbiased estimator for expectations over discrete random va...
research
02/19/2022

Gradient Estimation with Discrete Stein Operators

Gradient estimation – approximating the gradient of an expectation with ...
research
09/29/2018

Improved Gradient-Based Optimization Over Discrete Distributions

In many applications we seek to maximize an expectation with respect to ...
research
06/04/2020

Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks

In networks with binary activations and or binary weights the training b...
research
10/26/2021

CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator

Accurately backpropagating the gradient through categorical variables is...
research
06/15/2020

Gradient Estimation with Stochastic Softmax Tricks

The Gumbel-Max trick is the basis of many relaxed gradient estimators. T...

Please sign up or login with your details

Forgot password? Click here to reset