Log In Sign Up

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

by   Will Grathwohl, et al.

We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.


page 5

page 8

page 16

page 18

page 19

page 20

page 21


A Langevin-like Sampler for Discrete Distributions

We propose discrete Langevin proposal (DLP), a simple and scalable gradi...

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

Discrete structures play an important role in applications like program ...

Boltzmann machines and energy-based models

We review Boltzmann machines and energy-based models. A Boltzmann machin...

Restricted Collapsed Draw: Accurate Sampling for Hierarchical Chinese Restaurant Process Hidden Markov Models

We propose a restricted collapsed draw (RCD) sampler, a general Markov c...

Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent

We focus on the problem of efficient sampling and learning of probabilit...

Discrete Langevin Sampler via Wasserstein Gradient Flow

Recently, a family of locally balanced (LB) samplers has demonstrated ex...

Training Restricted Boltzmann Machines on Word Observations

The restricted Boltzmann machine (RBM) is a flexible tool for modeling c...