Straight-Through Estimator as Projected Wasserstein Gradient Flow

10/05/2019
by   Pengyu Cheng, et al.
5

The Straight-Through (ST) estimator is a widely used technique for back-propagating gradients through discrete random variables. However, this effective method lacks theoretical justification. In this paper, we show that ST can be interpreted as the simulation of the projected Wasserstein gradient flow (pWGF). Based on this understanding, a theoretical foundation is established to justify the convergence properties of ST. Further, another pWGF estimator variant is proposed, which exhibits superior performance on distributions with infinite support,e.g., Poisson distributions. Empirically, we show that ST and our proposed estimator, while applied to different types of discrete structures (including both Bernoulli and Poisson latent variables), exhibit comparable or even better performances relative to other state-of-the-art methods. Our results uncover the origin of the widespread adoption of the ST estimator and represent a helpful step towards exploring alternative gradient estimators for discrete variables.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

Coupled Gradient Estimators for Discrete Latent Variables

Training models with discrete latent variables is challenging due to the...
research
02/14/2020

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

We derive an unbiased estimator for expectations over discrete random va...
research
03/04/2020

Generalized Gumbel-Softmax Gradient Estimator for Various Discrete Random Variables

Estimating the gradients of stochastic nodes is one of the crucial resea...
research
07/24/2019

Notes on Latent Structure Models and SPIGOT

These notes aim to shed light on the recently proposed structured projec...
research
07/30/2018

ARM: Augment-REINFORCE-Merge Gradient for Discrete Latent Variable Models

To backpropagate the gradients through discrete stochastic layers, we en...
research
10/10/2018

Rao-Blackwellized Stochastic Gradients for Discrete Distributions

We wish to compute the gradient of an expectation over a finite or count...
research
06/29/2022

Discrete Langevin Sampler via Wasserstein Gradient Flow

Recently, a family of locally balanced (LB) samplers has demonstrated ex...

Please sign up or login with your details

Forgot password? Click here to reset