DeepAI AI Chat
Log In Sign Up

Gradient Estimation with Stochastic Softmax Tricks

06/15/2020
by   Max B. Paulus, et al.
Google
UNIVERSITY OF TORONTO
ETH Zurich
0

The Gumbel-Max trick is the basis of many relaxed gradient estimators. These estimators are easy to implement and low variance, but the goal of scaling them comprehensively to large combinatorial distributions is still outstanding. Working within the perturbation model framework, we introduce stochastic softmax tricks, which generalize the Gumbel-Softmax trick to combinatorial spaces. Our framework is a unified perspective on existing relaxed estimators for perturbation models, and it contains many novel relaxations. We design structured relaxations for subset selection, spanning trees, arborescences, and others. When compared to less structured baselines, we find that stochastic softmax tricks can be used to train latent variable models that perform better and discover more latent structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/09/2020

Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator

Gradient estimation in models with discrete latent variables is a challe...
07/03/2020

Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

Training neural network models with discrete (categorical or structured)...
05/09/2022

ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence

The Gumbel-softmax distribution, or Concrete distribution, is often used...
02/23/2018

Learning Latent Permutations with Gumbel-Sinkhorn Networks

Permutations and matchings are core building blocks in a variety of late...
10/28/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Structured latent variables allow incorporating meaningful prior knowled...
09/17/2019

Relaxed Softmax for learning from Positive and Unlabeled data

In recent years, the softmax model and its fast approximations have beco...
10/05/2020

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Latent structure models are a powerful tool for modeling language data: ...