Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

08/12/2020
by   Wei Deng, et al.
12

Replica exchange Monte Carlo (reMC), also known as parallel tempering, is an important technique for accelerating the convergence of the conventional Markov Chain Monte Carlo (MCMC) algorithms. However, such a method requires the evaluation of the energy function based on the full dataset and is not scalable to big data. The naïve implementation of reMC in mini-batch settings introduces large biases, which cannot be directly extended to the stochastic gradient MCMC (SGMCMC), the standard sampling method for simulating from deep neural networks (DNNs). In this paper, we propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties. The analysis implies an acceleration-accuracy trade-off in the numerical discretization of a Markov jump process in a stochastic environment. Empirically, we test the algorithm through extensive experiments on various setups and obtain the state-of-the-art results on CIFAR10, CIFAR100, and SVHN in both supervised learning and semi-supervised learning tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2020

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

Replica exchange stochastic gradient Langevin dynamics (reSGLD) has show...
research
05/30/2023

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

The rise of artificial intelligence (AI) hinges on the efficient trainin...
research
09/20/2020

Stochastic Gradient Langevin Dynamics Algorithms with Adaptive Drifts

Bayesian deep learning offers a principled way to address many issues co...
research
10/12/2019

Model-based process design of a ternary protein separation using multi-step gradient ion-exchange SMB chromatography

Model-based process design of ion-exchange simulated moving bed (IEX-SMB...
research
10/23/2019

An Adaptive Empirical Bayesian Method for Sparse Deep Learning

We propose a novel adaptive empirical Bayesian (AEB) method for sparse d...
research
10/05/2019

Characterizing Membership Privacy in Stochastic Gradient Langevin Dynamics

Bayesian deep learning is recently regarded as an intrinsic way to chara...
research
10/17/2022

Data Subsampling for Bayesian Neural Networks

Markov Chain Monte Carlo (MCMC) algorithms do not scale well for large d...

Please sign up or login with your details

Forgot password? Click here to reset