DeepAI AI Chat
Log In Sign Up

How many variables should be entered in a principal component regression equation?

by   Ji Xu, et al.
Columbia University

We study least squares linear regression over N uncorrelated Gaussian features that are selected in order of decreasing variance. When the number of selected features p is at most the sample size n, the estimator under consideration coincides with the principal component regression estimator; when p>n, the estimator is the least ℓ_2 norm solution over the selected features. We give an average-case analysis of the out-of-sample prediction error as p,n,N →∞ with p/N →α and n/N →β, for some constants α∈ [0,1] and β∈ (0,1). In this average-case setting, the prediction error exhibits a `double descent' shape as a function of p.


page 1

page 2

page 3

page 4


A note on the variance in principal component regression

Principal component regression is a popular method to use when the predi...

On Principal Component Regression in a High-Dimensional Error-in-Variables Setting

We analyze the classical method of Principal Component Regression (PCR) ...

Adaptive Principal Component Regression with Applications to Panel Data

Principal component regression (PCR) is a popular technique for fixed-de...

An Efficient Bayesian Robust Principal Component Regression

Principal component regression is a linear regression model with princip...

A race-DC in Big Data

The strategy of divide-and-combine (DC) has been widely used in the area...

Secure multi-party linear regression at plaintext speed

We detail a scheme for scalable, distributed, secure multiparty linear r...