Dimension free ridge regression

10/16/2022
by   Chen Cheng, et al.
0

Random matrix theory has become a widely useful tool in high-dimensional statistics and theoretical machine learning. However, random matrix theory is largely focused on the proportional asymptotics in which the number of columns grows proportionally to the number of rows of the data matrix. This is not always the most natural setting in statistics where columns correspond to covariates and rows to samples. With the objective to move beyond the proportional asymptotics, we revisit ridge regression (ℓ_2-penalized least squares) on i.i.d. data (x_i, y_i), i≤ n, where x_i is a feature vector and y_i = β^⊤ x_i +ϵ_i ∈ℝ is a response. We allow the feature vector to be high-dimensional, or even infinite-dimensional, in which case it belongs to a separable Hilbert space, and assume either z_i := Σ^-1/2x_i to have i.i.d. entries, or to satisfy a certain convex concentration property. Within this setting, we establish non-asymptotic bounds that approximate the bias and variance of ridge regression in terms of the bias and variance of an `equivalent' sequence model (a regression model with diagonal design matrix). The approximation is up to multiplicative factors bounded by (1±Δ) for some explicitly small Δ. Previously, such an approximation result was known only in the proportional regime and only up to additive errors: in particular, it did not allow to characterize the behavior of the excess risk when this converges to 0. Our general theory recovers earlier results in the proportional regime (with better error rates). As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum. Finally, we analyze the overparametrized near-interpolation setting and obtain sharp `benign overfitting' guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2021

Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration

Consider the classical supervised learning problem: we are given data (y...
research
09/09/2022

Penalization-induced shrinking without rotation in high dimensional GLM regression: a cavity analysis

In high dimensional regression, where the number of covariates is of the...
research
05/26/2023

Error Bounds for Learning with Vector-Valued Random Features

This paper provides a comprehensive error analysis of learning with vect...
research
06/10/2020

On the Optimal Weighted ℓ_2 Regularization in Overparameterized Linear Regression

We consider the linear model 𝐲 = 𝐗β_⋆ + ϵ with 𝐗∈ℝ^n× p in the overparam...
research
04/03/2022

A Modern Theory for High-dimensional Cox Regression Models

The proportional hazards model has been extensively used in many fields ...
research
06/20/2016

On the prediction loss of the lasso in the partially labeled setting

In this paper we revisit the risk bounds of the lasso estimator in the c...
research
08/31/2018

An explicit mean-covariance parameterization for multivariate response linear regression

We develop a new method to fit the multivariate response linear regressi...

Please sign up or login with your details

Forgot password? Click here to reset