DeepAI AI Chat
Log In Sign Up

The Big Data Bootstrap

06/27/2012
by   Ariel Kleiner, et al.
0

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we present the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality. BLB is well suited to modern parallel and distributed computing architectures and retains the generic applicability, statistical efficiency, and favorable theoretical properties of the bootstrap. We provide the results of an extensive empirical and theoretical investigation of BLB's behavior, including a study of its statistical correctness, its large-scale implementation and performance, selection of hyperparameters, and performance on real data.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/21/2011

A Scalable Bootstrap for Massive Data

The bootstrap provides a simple and powerful means of assessing the qual...
02/14/2023

A Framework for Mediation Analysis with Massive Data

During the past few years, mediation analysis has gained increasing popu...
07/04/2016

Bootstrap Model Aggregation for Distributed Statistical Learning

In distributed, or privacy-preserving learning, we are often given a set...
04/09/2015

Robust, scalable and fast bootstrap method for analyzing large scale data

In this paper we address the problem of performing statistical inference...
02/15/2023

Optimal Subsampling Bootstrap for Massive Data

The bootstrap is a widely used procedure for statistical inference becau...
06/02/2020

Hyperparameter Selection for Subsampling Bootstraps

Massive data analysis becomes increasingly prevalent, subsampling method...
02/06/2023

A Fast Bootstrap Algorithm for Causal Inference with Large Data

Estimating causal effects from large experimental and observational data...