Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression

by   William T. Stephenson, et al.

Models like LASSO and ridge regression are extensively used in practice due to their interpretability, ease of use, and strong theoretical guarantees. Cross-validation (CV) is widely used for hyperparameter tuning in these models, but do practical optimization methods minimize the true out-of-sample loss? A recent line of research promises to show that the optimum of the CV loss matches the optimum of the out-of-sample loss (possibly after simple corrections). It remains to show how tractable it is to minimize the CV loss. In the present paper, we show that, in the case of ridge regression, the CV loss may fail to be quasiconvex and thus may have multiple local optima. We can guarantee that the CV loss is quasiconvex in at least one case: when the spectrum of the covariate matrix is nearly flat and the noise in the observed responses is not too high. More generally, we show that quasiconvexity status is independent of many properties of the observed data (response norm, covariate-matrix right singular vectors and singular-value scaling) and has a complex dependence on the few that remain. We empirically confirm our theory using simulated experiments.


Ridge Regression: Structure, Cross-Validation, and Sketching

We study the following three fundamental problems about ridge regression...

Provably tuning the ElasticNet across instances

An important unresolved challenge in the theory of regularization is to ...

Fast cross-validation for multi-penalty ridge regression

Prediction based on multiple high-dimensional data types needs to accoun...

A Cross Validation Framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond

This paper formulates a general cross validation framework for signal de...

Gain Confidence, Reduce Disappointment: A New Approach to Cross-Validation for Sparse Regression

Ridge regularized sparse regression involves selecting a subset of featu...

Aggregated hold out for sparse linear regression with a robust loss function

Sparse linear regression methods generally have a free hyperparameter wh...

Please sign up or login with your details

Forgot password? Click here to reset