Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

10/02/2020
βˆ™
by   Wei Deng, et al.
βˆ™
8
βˆ™

Replica exchange stochastic gradient Langevin dynamics (reSGLD) has shown promise in accelerating the convergence in non-convex learning; however, an excessively large correction for avoiding biases from noisy energy estimators has limited the potential of the acceleration. To address this issue, we study the variance reduction for noisy energy estimators, which promotes much more effective swaps. Theoretically, we provide a non-asymptotic analysis on the exponential acceleration for the underlying continuous-time Markov jump process; moreover, we consider a generalized Girsanov theorem which includes the change of Poisson measure to overcome the crude discretization based on the GrΓΆwall's inequality and yields a much tighter error in the 2-Wasserstein (𝒲_2) distance. Numerically, we conduct extensive experiments and obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.

READ FULL TEXT

page 1

page 2

page 3

page 4

βˆ™ 08/12/2020

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

Replica exchange Monte Carlo (reMC), also known as parallel tempering, i...
βˆ™ 02/13/2017

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Sto...
βˆ™ 03/18/2016

Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

Nesterov's momentum trick is famously known for accelerating gradient de...
βˆ™ 11/20/2018

Variance Reduction in Stochastic Particle-Optimization Sampling

Stochastic particle-optimization sampling (SPOS) is a recently-developed...
βˆ™ 07/04/2020

Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion

Langevin diffusion is a powerful method for nonconvex optimization, whic...
βˆ™ 10/16/2015

SGD with Variance Reduction beyond Empirical Risk Minimization

We introduce a doubly stochastic proximal gradient algorithm for optimiz...
βˆ™ 08/18/2021

Geometry-informed irreversible perturbations for accelerated convergence of Langevin dynamics

We introduce a novel geometry-informed irreversible perturbation that ac...