Non-reversible Parallel Tempering for Deep Posterior Approximation

11/20/2022
by   Wei Deng, et al.
0

Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from O(P^2) to O(P) given sufficiently many P chains. However, such an innovation largely disappears in big data due to the limited chains and few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote non-reversibility and propose a few solutions to tackle the underlying bias caused by the geometric stopping time. Notably, in big data scenarios, we obtain an appealing communication cost O(Plog P) based on the optimal window size. In addition, we also adopt stochastic gradient descent (SGD) with large and constant learning rates as exploration kernels. Such a user-friendly nature enables us to conduct approximation tasks for complex posteriors without much tuning costs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2023

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

The rise of artificial intelligence (AI) hinges on the efficient trainin...
research
04/13/2017

Stochastic Gradient Descent as Approximate Bayesian Inference

Stochastic Gradient Descent with a constant learning rate (constant SGD)...
research
05/09/2019

On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization

Recent developments on large-scale distributed machine learning applicat...
research
09/12/2018

On Markov Chain Gradient Descent

Stochastic gradient methods are the workhorse (algorithms) of large-scal...
research
06/20/2023

Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent

Gaussian processes are a powerful framework for quantifying uncertainty ...
research
04/28/2021

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

As the size and complexity of models and datasets grow, so does the need...

Please sign up or login with your details

Forgot password? Click here to reset