On the Optimal Weighted ā„“_2 Regularization in Overparameterized Linear Regression

06/10/2020 āˆ™ by Denny Wu, et al. āˆ™ 0 āˆ™

We consider the linear model š² = š—Ī²_ā‹† + Ļµ with š—āˆˆā„^nƗ p in the overparameterized regime p>n. We estimate Ī²_ā‹† via generalized (weighted) ridge regression: Ī²Ģ‚_Ī» = (š—^Tš— + Ī»Ī£_w)^ā€ š—^Tš², where Ī£_w is the weighting matrix. Assuming a random effects model with general data covariance Ī£_x and anisotropic prior on the true coefficients Ī²_ā‹†, i.e., š”¼Ī²_ā‹†Ī²_ā‹†^T = Ī£_Ī², we provide an exact characterization of the prediction risk š”¼(y-š±^TĪ²Ģ‚_Ī»)^2 in the proportional asymptotic limit p/nā†’Ī³āˆˆ (1,āˆž). Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting Ī»_ opt for the ridge parameter Ī» and confirm the implicit ā„“_2 regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that Ī»_ opt can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when š— and Ī²_ā‹† are non-isotropic. Finally, we determine the optimal Ī£_w for both the ridgeless (Ī»ā†’ 0) and optimally regularized (Ī» = Ī»_ opt) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.