DeepAI AI Chat
Log In Sign Up

Modern Subsampling Methods for Large-Scale Least Squares Regression

by   Tao Li, et al.

Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model prediction. This review presents some cutting-edge subsampling methods based on the large-scale least squares estimation. Two major families of subsampling methods are introduced, respectively, the randomized subsampling approach and the optimal subsampling approach. The former aims to develop a more effective data-dependent sampling probability, while the latter aims to select a deterministic subsample in accordance with certain optimality criteria. Real data examples are provided to compare these methods empirically, respecting both the estimation accuracy and the computing time.


page 1

page 2

page 3

page 4


Statistical learning methods for neuroimaging data analysis with applications

The aim of this paper is to provide a comprehensive review of statistica...

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effe...

Distributed nonparametric regression imputation for missing response problems with large-scale data

Nonparametric regression imputation is commonly used in missing data ana...

Surrogate-based Autotuning for Randomized Sketching Algorithms in Regression Problems

Algorithms from Randomized Numerical Linear Algebra (RandNLA) are known ...

A nonparametric regression approach to asymptotically optimal estimation of normal means

Simultaneous estimation of multiple parameters has received a great deal...

Accelerating Certifiable Estimation with Preconditioned Eigensolvers

Convex (specifically semidefinite) relaxation provides a powerful approa...

Large-scale subspace clustering using sketching and validation

The nowadays massive amounts of generated and communicated data present ...