Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients

06/13/2022
by   Kyurae Kim, et al.
9

Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods-which we collectively refer to as Markov chain score ascent (MCSA) methods-can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2023

Stochastic Gradient Descent under Markovian Sampling Schemes

We study a variation of vanilla stochastic gradient descent where the op...
research
09/23/2019

Decentralized Markov Chain Gradient Descent

Decentralized stochastic gradient method emerges as a promising solution...
research
10/31/2019

Mixing of Stochastic Accelerated Gradient Descent

We study the mixing properties for stochastic accelerated gradient desce...
research
02/07/2022

Grassmann Stein Variational Gradient Descent

Stein variational gradient descent (SVGD) is a deterministic particle in...
research
09/12/2018

On Markov Chain Gradient Descent

Stochastic gradient methods are the workhorse (algorithms) of large-scal...
research
10/25/2017

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

This work provides a simplified proof of the statistical minimax optimal...
research
11/22/2018

Markov Chain Block Coordinate Descent

The method of block coordinate gradient descent (BCD) has been a powerfu...

Please sign up or login with your details

Forgot password? Click here to reset