On the Optimal Weighted ℓ_2 Regularization in Overparameterized Linear Regression
We consider the linear model 𝐲 = 𝐗β_⋆ + ϵ with 𝐗∈ℝ^n× p in the overparameterized regime p>n. We estimate β_⋆ via generalized (weighted) ridge regression: β̂_λ = (𝐗^T𝐗 + λΣ_w)^†𝐗^T𝐲, where Σ_w is the weighting matrix. Assuming a random effects model with general data covariance Σ_x and anisotropic prior on the true coefficients β_⋆, i.e., 𝔼β_⋆β_⋆^T = Σ_β, we provide an exact characterization of the prediction risk 𝔼(y-𝐱^Tβ̂_λ)^2 in the proportional asymptotic limit p/n→γ∈ (1,∞). Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting λ_ opt for the ridge parameter λ and confirm the implicit ℓ_2 regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that λ_ opt can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when 𝐗 and β_⋆ are non-isotropic. Finally, we determine the optimal Σ_w for both the ridgeless (λ→ 0) and optimally regularized (λ = λ_ opt) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.
READ FULL TEXT