
Optimal Subsampling for Large Sample Logistic Regression
For massive data, the family of subsampling algorithms is popular to dow...
read it

Modern Subsampling Methods for LargeScale Least Squares Regression
Subsampling methods aim to select a subsample as a surrogate for the obs...
read it

Poisson Subsampling Algorithms for Large Sample Linear Regression in Massive Data
Large sample size brings the computation bottleneck for modern data anal...
read it

Scalaronfunction local linear regression and beyond
Regressing a scalar response on a random function is nowadays a common s...
read it

Best Linear Predictor with Missing Response: Locally Robust Approach
This paper provides asymptotic theory for Inverse Probability Weighing (...
read it

Least Squares Approximation for a Distributed System
In this work we develop a distributed least squares approximation (DLSA)...
read it

Kernel Selection for Modal Linear Regression: Optimal Kernel and IRLS Algorithm
Modal linear regression (MLR) is a method for obtaining a conditional mo...
read it
Optimal Subsampling Approaches for Large Sample Linear Regression
A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample from the original full sample and uses it as a surrogate for subsequent computation and estimation. In this paper, we study subsampling methods under two scenarios: approximating the full sample ordinary leastsquare (OLS) estimator and estimating the coefficients in linear regression. We present two algorithms, weighted estimation algorithm and unweighted estimation algorithm, and analyze asymptotic behaviors of their resulting subsample estimators under general conditions. For the weighted estimation algorithm, we propose a criterion for selecting the optimal sampling probability by making use of the asymptotic results. On the basis of the criterion, we provide two novel subsampling methods, the optimal subsampling and the predictor length subsampling methods. The predictorlength subsampling method is based on the L2 norm of predictors rather than leverage scores. Its computational cost is scalable. For unweighted estimation algorithm, we show that its resulting subsample estimator is not consistent to the full sample OLS estimator. However, it has better performance than the weighted estimation algorithm for estimating the coefficients. Simulation studies and a real data example are used to demonstrate the effectiveness of our proposed subsampling methods.
READ FULL TEXT
Comments
There are no comments yet.