High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

12/23/2015
by   Chunyuan Li, et al.
0

Learning in deep models using Bayesian methods has generated significant attention recently. This is largely because of the feasibility of modern Bayesian methods to yield scalable learning and inference, while maintaining a measure of uncertainty in the model parameters. Stochastic gradient MCMC algorithms (SG-MCMC) are a family of diffusion-based sampling methods for large-scale Bayesian learning. In SG-MCMC, multivariate stochastic gradient thermostats (mSGNHT) augment each parameter of interest, with a momentum and a thermostat variable to maintain stationary distributions as target posterior distributions. As the number of variables in a continuous-time diffusion increases, its numerical approximation error becomes a practical bottleneck, so better use of a numerical integrator is desirable. To this end, we propose use of an efficient symmetric splitting integrator in mSGNHT, instead of the traditional Euler integrator. We demonstrate that the proposed scheme is more accurate, robust, and converges faster. These properties are demonstrated to be desirable in Bayesian deep learning. Extensive experiments on two canonical models and their deep extensions demonstrate that the proposed scheme improves general Bayesian posterior sampling, particularly for deep models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

Subsampling Error in Stochastic Gradient Langevin Diffusions

The Stochastic Gradient Langevin Dynamics (SGLD) are popularly used to a...
research
10/21/2016

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

Recent advances in Bayesian learning with large-scale data have witnesse...
research
02/07/2020

Extended Stochastic Gradient MCMC for Large-Scale Bayesian Variable Selection

Stochastic gradient Markov chain Monte Carlo (MCMC) algorithms have rece...
research
02/11/2019

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

The posteriors over neural network weights are high dimensional and mult...
research
11/30/2018

Stochastic Gradient MCMC with Repulsive Forces

We propose a unifying view of two different families of Bayesian inferen...
research
12/18/2022

Pigeonhole Stochastic Gradient Langevin Dynamics for Large Crossed Mixed Effects Models

Large crossed mixed effects models with imbalanced structures and missin...
research
01/14/2019

Posterior inference unchained with EL_2O

Statistical inference of analytically non-tractable posteriors is a diff...

Please sign up or login with your details

Forgot password? Click here to reset