Rademacher upper bounds for cross-validation errors with an application to the lasso

07/30/2020
by   Ning Xu, et al.
0

We establish a general upper bound for K-fold cross-validation (K-CV) errors that can be adapted to many K-CV-based estimators and learning algorithms. Based on Rademacher complexity of the model and the Orlicz-Ψ_ν norm of the error process, the CV error upper bound applies to both light-tail and heavy-tail error distributions. We also extend the CV error upper bound to β-mixing data using the technique of independent blocking. We provide a Python package (CVbound, <https://github.com/isaac2math>) for computing the CV error upper bound in K-CV-based algorithms. Using the lasso as an example, we demonstrate in simulations that the upper bounds are tight and stable across different parameter settings and random seeds. As well as accurately bounding the CV errors for the lasso, the minimizer of the new upper bounds can be used as a criterion for variable selection. Compared with the CV-error minimizer, simulations show that tuning the lasso penalty parameter according to the minimizer of the upper bound yields a more sparse and more stable model that retains all of the relevant variables.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

09/07/2020

New Upper Bounds for Trace Reconstruction

We improve the upper bound on trace reconstruction to (O(n^1/5))....
02/12/2018

Certified Roundoff Error Bounds using Bernstein Expansions and Sparse Krivine-Stengle Representations

Floating point error is a drawback of embedded systems implementation th...
12/02/2018

Model Selection and estimation of Multi Screen Penalty

We propose a multi-step method, called Multi Screen Penalty (MSP), to es...
09/12/2016

Finite-sample and asymptotic analysis of generalization ability with an application to penalized regression

In this paper, we study the performance of extremum estimators from the ...
10/18/2016

Generalization error minimization: a new approach to model evaluation and selection with an application to penalized regression

We study model evaluation and model selection from the perspective of ge...
11/02/2020

At most 4.47^n stable matchings

We improve the upper bound for the maximum possible number of stable mat...
01/15/2014

Transductive Rademacher Complexity and its Applications

We develop a technique for deriving data-dependent error bounds for tran...

Code Repositories

CV_bounds

an upper bound computation algorihtm for CV error. Compared with CV error minimization, the new upper bound is much more accurate, stable and robust on regularization and overfitting control


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.