Scalable subsampling: computation, aggregation and inference

12/13/2021
by   Dimitris N. Politis, et al.
0

Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic θ̂_n in order to conduct nonparametric inference such as the construction of confidence intervals and hypothesis tests. Subsampling has seen a resurgence in the Big Data era where the standard, full-resample size bootstrap can be infeasible to compute. Nevertheless, even choosing a single random subsample of size b can be computationally challenging with both b and the sample size n being very large. In the paper at hand, we show how a set of appropriately chosen, non-random subsamples can be used to conduct effective – and computationally feasible – distribution estimation via subsampling. Further, we show how the same set of subsamples can be used to yield a procedure for subsampling aggregation – also known as subagging – that is scalable with big data. Interestingly, the scalable subagging estimator can be tuned to have the same (or better) rate of convergence as compared to θ̂_n. The paper is concluded by showing how to conduct inference, e.g., confidence intervals, based on the scalable subagging estimator instead of the original θ̂_n.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2023

Confidence intervals in monotone regression

We construct bootstrap confidence intervals for a monotone regression fu...
research
04/10/2019

On the construction of confidence intervals for ratios of expectations

In econometrics, many parameters of interest can be written as ratios of...
research
11/06/2017

An Iterative Scheme for Leverage-based Approximate Aggregation

Currently data explosion poses great challenges to approximate aggregati...
research
11/13/2018

Quantile regression approach to conditional mode estimation

In this paper, we consider estimation of the conditional mode of an outc...
research
08/03/2018

Monotone function estimator and its application

In this paper, the model Y_i=g(Z_i), i=1,2,...,n with Z_i being random v...
research
04/25/2014

Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

This work develops formal statistical inference procedures for machine l...
research
11/06/2022

Confidence Intervals for Unobserved Events

Consider a finite sample from an unknown distribution over a countable a...

Please sign up or login with your details

Forgot password? Click here to reset