Generalization error minimization: a new approach to model evaluation and selection with an application to penalized regression

10/18/2016
by   Ning Xu, et al.
0

We study model evaluation and model selection from the perspective of generalization ability (GA): the ability of a model to predict outcomes in new samples from the same population. We believe that GA is one way formally to address concerns about the external validity of a model. The GA of a model estimated on a sample can be measured by its empirical out-of-sample errors, called the generalization errors (GE). We derive upper bounds for the GE, which depend on sample sizes, model complexity and the distribution of the loss function. The upper bounds can be used to evaluate the GA of a model, ex ante. We propose using generalization error minimization (GEM) as a framework for model selection. Using GEM, we are able to unify a big class of penalized regression estimators, including lasso, ridge and bridge, under the same set of assumptions. We establish finite-sample and asymptotic properties (including L_2-consistency) of the GEM estimator for both the n ≥ p and the n < p cases. We also derive the L_2-distance between the penalized and corresponding unpenalized regression estimates. In practice, GEM can be implemented by validation or cross-validation. We show that the GE bounds can be used for selecting the optimal number of folds in K-fold cross-validation. We propose a variant of R^2, the GR^2, as a measure of GA, which considers both both in-sample and out-of-sample goodness of fit. Simulations are used to demonstrate our key results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2016

Finite-sample and asymptotic analysis of generalization ability with an application to penalized regression

In this paper, we study the performance of extremum estimators from the ...
research
06/01/2016

Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso

Model selection is difficult to analyse yet theoretically and empiricall...
research
05/20/2017

( β, ϖ)-stability for cross-validation and the choice of the number of folds

In this paper, we introduce a new concept of stability for cross-validat...
research
07/30/2020

Rademacher upper bounds for cross-validation errors with an application to the lasso

We establish a general upper bound for K-fold cross-validation (K-CV) er...
research
03/03/2020

Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions

We study the problem of out-of-sample risk estimation in the high dimens...
research
03/28/2019

An analysis of the cost of hyper-parameter selection via split-sample validation, with applications to penalized regression

In the regression setting, given a set of hyper-parameters, a model-esti...
research
08/15/2019

A deep learning model for segmentation of geographic atrophy to study its long-term natural history

Purpose: To develop and validate a deep learning model for automatic seg...

Please sign up or login with your details

Forgot password? Click here to reset