ReBoot: Distributed statistical learning via refitting Bootstrap samples

07/19/2022
by   Yumeng Wang, et al.
0

In this paper, we study a one-shot distributed learning algorithm via refitting Bootstrap samples, which we refer to as ReBoot. Given the local models that are fit on multiple independent subsamples, ReBoot refits a new model on the union of the Bootstrap samples drawn from these local models. The whole procedure requires only one round of communication of model parameters. Theoretically, we analyze the statistical rate of ReBoot for generalized linear models (GLM) and noisy phase retrieval, which represent convex and non-convex problems respectively. In both cases, ReBoot provably achieves the full-sample statistical rate whenever the subsample size is not too small. In particular, we show that the systematic bias of ReBoot, the error that is independent of the number of subsamples, is O(n ^ -2) in GLM, where n is the subsample size. This rate is sharper than that of model parameter averaging and its variants, implying the higher tolerance of ReBoot with respect to data splits to maintain the full-sample rate. Simulation study exhibits the statistical advantage of ReBoot over competing methods including averaging and CSL (Communication-efficient Surrogate Likelihood) with up to two rounds of gradient communication. Finally, we propose FedReBoot, an iterative version of ReBoot, to aggregate convolutional neural networks for image classification, which exhibits substantial superiority over FedAve within early rounds of communication.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2016

Bootstrap Model Aggregation for Distributed Statistical Learning

In distributed, or privacy-preserving learning, we are often given a set...
research
10/25/2018

Quantum Advantage for the LOCAL Model in Distributed Computing

There are two central models considered in (fault-free synchronous) dist...
research
02/28/2021

Communication-efficient Byzantine-robust distributed learning with statistical guarantee

Communication efficiency and robustness are two major issues in modern d...
research
01/17/2020

Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Distributed statistical inference has recently attracted immense attenti...
research
03/05/2018

Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates

In large-scale distributed learning, security issues have become increas...
research
01/23/2022

Distributed Learning of Generalized Linear Causal Networks

We consider the task of learning causal structures from data stored on m...
research
02/19/2021

Distributed Bootstrap for Simultaneous Inference Under High Dimensionality

We propose a distributed bootstrap method for simultaneous inference on ...

Please sign up or login with your details

Forgot password? Click here to reset