Optimal Subsampling Approaches for Large Sample Linear Regression

09/17/2015
by   Rong Zhu, et al.
0

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample from the original full sample and uses it as a surrogate for subsequent computation and estimation. In this paper, we study subsampling methods under two scenarios: approximating the full sample ordinary least-square (OLS) estimator and estimating the coefficients in linear regression. We present two algorithms, weighted estimation algorithm and unweighted estimation algorithm, and analyze asymptotic behaviors of their resulting subsample estimators under general conditions. For the weighted estimation algorithm, we propose a criterion for selecting the optimal sampling probability by making use of the asymptotic results. On the basis of the criterion, we provide two novel subsampling methods, the optimal subsampling and the predictor- length subsampling methods. The predictor-length subsampling method is based on the L2 norm of predictors rather than leverage scores. Its computational cost is scalable. For unweighted estimation algorithm, we show that its resulting subsample estimator is not consistent to the full sample OLS estimator. However, it has better performance than the weighted estimation algorithm for estimating the coefficients. Simulation studies and a real data example are used to demonstrate the effectiveness of our proposed subsampling methods.

READ FULL TEXT
research
02/03/2017

Optimal Subsampling for Large Sample Logistic Regression

For massive data, the family of subsampling algorithms is popular to dow...
research
04/10/2022

Optimal Subsampling for Large Sample Ridge Regression

Subsampling is a popular approach to alleviating the computational burde...
research
09/07/2015

Poisson Subsampling Algorithms for Large Sample Linear Regression in Massive Data

Large sample size brings the computation bottleneck for modern data anal...
research
05/04/2021

Modern Subsampling Methods for Large-Scale Least Squares Regression

Subsampling methods aim to select a subsample as a surrogate for the obs...
research
08/14/2019

Least Squares Approximation for a Distributed System

In this work we develop a distributed least squares approximation (DLSA)...
research
10/20/2022

Iteratively Reweighte Least Squares Method for Estimating Polyserial and Polychoric Correlation Coefficients

An iteratively reweighted least squares (IRLS) method is proposed for es...
research
09/21/2021

A Model-free Variable Screening Method Based on Leverage Score

With rapid advances in information technology, massive datasets are coll...

Please sign up or login with your details

Forgot password? Click here to reset