On Principal Components Regression, Random Projections, and Column Subsampling

09/23/2017
by   Martin Slawski, et al.
0

Principal Components Regression (PCR) is a traditional tool for dimension reduction in linear regression that has been both criticized and defended. One concern about PCR is that obtaining the leading principal components tends to be computationally demanding for large data sets. While random projections do not possess the optimality properties of the leading principal subspace, they are computationally appealing and hence have become increasingly popular in recent years. In this paper, we present an analysis showing that for random projections satisfying a Johnson-Lindenstrauss embedding property, the prediction error in subsequent regression is close to that of PCR, at the expense of requiring a slightly large number of random projections than principal components. Column sub-sampling constitutes an even cheaper way of randomized dimension reduction outside the class of Johnson-Lindenstrauss transforms. We provide numerical results based on synthetic and real data as well as basic theory revealing differences and commonalities in terms of statistical performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2016

On the Nyström and Column-Sampling Methods for the Approximate Principal Components Analysis of Large Data Sets

In this paper we analyze approximate methods for undertaking a principal...
research
10/14/2018

A New Theory for Sketching in Linear Regression

Large datasets create opportunities as well as analytic challenges. A re...
research
05/01/2020

How to reduce dimension with PCA and random projections?

In our "big data" age, the size and complexity of data is steadily incre...
research
03/11/2021

Overlap of OLS Regression and Principal Loading Analysis

Principal loading analysis is a dimension reduction method that discards...
research
02/12/2020

On sufficient dimension reduction via principal asymmetric least squares

In this paper, we introduce principal asymmetric least squares (PALS) as...
research
06/04/2018

MOSES: A Streaming Algorithm for Linear Dimensionality Reduction

This paper introduces Memory-limited Online Subspace Estimation Scheme (...
research
09/30/2014

Unsupervised Bump Hunting Using Principal Components

Principal Components Analysis is a widely used technique for dimension r...

Please sign up or login with your details

Forgot password? Click here to reset