Poisson Subsampling Algorithms for Large Sample Linear Regression in Massive Data

09/07/2015
by   Rong Zhu, et al.
0

Large sample size brings the computation bottleneck for modern data analysis. Subsampling is one of efficient strategies to handle this problem. In previous studies, researchers make more fo- cus on subsampling with replacement (SSR) than on subsampling without replacement (SSWR). In this paper we investigate a kind of SSWR, poisson subsampling (PSS), for fast algorithm in ordinary least-square problem. We establish non-asymptotic property, i.e, the error bound of the correspond- ing subsample estimator, which provide a tradeoff between computation cost and approximation efficiency. Besides the non-asymptotic result, we provide asymptotic consistency and normality of the subsample estimator. Methodologically, we propose a two-step subsampling algorithm, which is efficient with respect to a statistical objective and independent on the linear model assumption.. Synthetic and real data are used to empirically study our proposed subsampling strategies. We argue by these empirical studies that, (1) our proposed two-step algorithm has obvious advantage when the assumed linear model does not accurate, and (2) the PSS strategy performs obviously better than SSR when the subsampling ratio increases.

READ FULL TEXT
research
02/03/2017

Optimal Subsampling for Large Sample Logistic Regression

For massive data, the family of subsampling algorithms is popular to dow...
research
09/17/2015

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effe...
research
10/05/2021

Robust censored regression with l1-norm regularization

This paper considers inference in a linear regression model with random ...
research
03/02/2018

Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares

In modern data analysis, random sampling is an efficient and widely-used...
research
10/14/2019

All of Linear Regression

Least squares linear regression is one of the oldest and widely used dat...
research
05/17/2022

Sampling with replacement vs Poisson sampling: a comparative study in optimal subsampling

Faced with massive data, subsampling is a commonly used technique to imp...
research
05/24/2023

Optimal subsampling for large scale Elastic-net regression

Datasets with sheer volume have been generated from fields including com...

Please sign up or login with your details

Forgot password? Click here to reset