Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

05/30/2023
by   Wei Deng, et al.
0

The rise of artificial intelligence (AI) hinges on the efficient training of modern deep neural networks (DNNs) for non-convex optimization and uncertainty quantification, which boils down to a non-convex Bayesian learning problem. A standard tool to handle the problem is Langevin Monte Carlo, which proposes to approximate the posterior distribution with theoretical guarantees. In this thesis, we start with the replica exchange Langevin Monte Carlo (also known as parallel tempering), which proposes appropriate swaps between exploration and exploitation to achieve accelerations. However, the naïve extension of swaps to big data problems leads to a large bias, and bias-corrected swaps are required. Such a mechanism leads to few effective swaps and insignificant accelerations. To alleviate this issue, we first propose a control variates method to reduce the variance of noisy energy estimators and show a potential to accelerate the exponential convergence. We also present the population-chain replica exchange based on non-reversibility and obtain an optimal round-trip rate for deep learning. In the second part of the thesis, we study scalable dynamic importance sampling algorithms based on stochastic approximation. Traditional dynamic importance sampling algorithms have achieved success, however, the lack of scalability has greatly limited their extensions to big data. To handle this scalability issue, we resolve the vanishing gradient problem and propose two dynamic importance sampling algorithms. Theoretically, we establish the stability condition for the underlying ordinary differential equation (ODE) system and guarantee the asymptotic convergence of the latent variable to the desired fixed point. Interestingly, such a result still holds given non-convex energy landscapes.

READ FULL TEXT
research
08/12/2020

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

Replica exchange Monte Carlo (reMC), also known as parallel tempering, i...
research
11/20/2022

Non-reversible Parallel Tempering for Deep Posterior Approximation

Parallel tempering (PT), also known as replica exchange, is the go-to wo...
research
09/20/2020

Stochastic Gradient Langevin Dynamics Algorithms with Adaptive Drifts

Bayesian deep learning offers a principled way to address many issues co...
research
04/08/2022

Free Energy Evaluation Using Marginalized Annealed Importance Sampling

The evaluation of the free energy of a stochastic model is considered to...
research
01/02/2022

Global convergence of optimized adaptive importance samplers

We analyze the optimized adaptive importance sampler (OAIS) for performi...
research
11/18/2020

Bias-Variance Trade-off and Overlearning in Dynamic Decision Problems

Modern Monte Carlo-type approaches to dynamic decision problems face the...
research
10/02/2020

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

Replica exchange stochastic gradient Langevin dynamics (reSGLD) has show...

Please sign up or login with your details

Forgot password? Click here to reset