The Big Data Bootstrap

06/27/2012
by   Ariel Kleiner, et al.
0

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we present the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality. BLB is well suited to modern parallel and distributed computing architectures and retains the generic applicability, statistical efficiency, and favorable theoretical properties of the bootstrap. We provide the results of an extensive empirical and theoretical investigation of BLB's behavior, including a study of its statistical correctness, its large-scale implementation and performance, selection of hyperparameters, and performance on real data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2011

A Scalable Bootstrap for Massive Data

The bootstrap provides a simple and powerful means of assessing the qual...
research
02/14/2023

A Framework for Mediation Analysis with Massive Data

During the past few years, mediation analysis has gained increasing popu...
research
07/04/2016

Bootstrap Model Aggregation for Distributed Statistical Learning

In distributed, or privacy-preserving learning, we are often given a set...
research
04/09/2015

Robust, scalable and fast bootstrap method for analyzing large scale data

In this paper we address the problem of performing statistical inference...
research
02/15/2023

Optimal Subsampling Bootstrap for Massive Data

The bootstrap is a widely used procedure for statistical inference becau...
research
06/02/2020

Hyperparameter Selection for Subsampling Bootstraps

Massive data analysis becomes increasingly prevalent, subsampling method...
research
08/17/2022

Two-Stage Robust and Sparse Distributed Statistical Inference for Large-Scale Data

In this paper, we address the problem of conducting statistical inferenc...

Please sign up or login with your details

Forgot password? Click here to reset