Asynchronous and Distributed Data Augmentation for Massive Data Settings

09/18/2021
by   Jiayuan Zhou, et al.
0

Data augmentation (DA) algorithms are widely used for Bayesian inference due to their simplicity. In massive data settings, however, DA algorithms are prohibitively slow because they pass through the full data in any iteration, imposing serious restrictions on their usage despite the advantages. Addressing this problem, we develop a framework for extending any DA that exploits asynchronous and distributed computing. The extended DA algorithm is indexed by a parameter r ∈ (0, 1) and is called Asynchronous and Distributed (AD) DA with the original DA as its parent. Any ADDA starts by dividing the full data into k smaller disjoint subsets and storing them on k processes, which could be machines or processors. Every iteration of ADDA augments only an r-fraction of the k data subsets with some positive probability and leaves the remaining (1-r)-fraction of the augmented data unchanged. The parameter draws are obtained using the r-fraction of new and (1-r)-fraction of old augmented data. For many choices of k and r, the fractional updates of ADDA lead to a significant speed-up over the parent DA in massive data settings, and it reduces to the distributed version of its parent DA when r=1. We show that the ADDA Markov chain is Harris ergodic with the desired stationary distribution under mild conditions on the parent DA algorithm. We demonstrate the numerical advantages of the ADDA in three representative examples corresponding to different kinds of massive data settings encountered in applications. In all these examples, our DA generalization is significantly faster than its parent DA algorithm for all the choices of k and r. We also establish geometric ergodicity of the ADDA Markov chain for all three examples, which in turn yields asymptotically valid standard errors for estimates of desired posterior quantities.

READ FULL TEXT

page 25

page 28

research
11/18/2019

A Distributed Algorithm for Polya-Gamma Data Augmentation

The Polya-Gamma data augmentation (PG-DA) algorithm is routinely used fo...
research
12/20/2021

Convergence properties of data augmentation algorithms for high-dimensional robit regression

The logistic and probit link functions are the most common choices for r...
research
02/06/2021

Distributed and Asynchronous Operational Optimization of Networked Microgrids

Smart programmable microgrids (SPM) is an emerging technology for making...
research
02/18/2023

Data Augmentation for Imbalanced Regression

In this work, we consider the problem of imbalanced data in a regression...
research
05/08/2019

Does Data Augmentation Lead to Positive Margin?

Data augmentation (DA) is commonly used during model training, as it sig...
research
11/07/2019

Data transforming augmentation for heteroscedastic models

Data augmentation (DA) turns seemingly intractable computational problem...
research
09/03/2021

Bayesian Estimation of the Degrees of Freedom Parameter of the Student-t Distribution—A Beneficial Re-parameterization

In this paper, conditional data augmentation (DA) is investigated for th...

Please sign up or login with your details

Forgot password? Click here to reset