Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso

06/01/2016
by   Ning Xu, et al.
0

Model selection is difficult to analyse yet theoretically and empirically important, especially for high-dimensional data analysis. Recently the least absolute shrinkage and selection operator (Lasso) has been applied in the statistical and econometric literature. Consis- tency of Lasso has been established under various conditions, some of which are difficult to verify in practice. In this paper, we study model selection from the perspective of generalization ability, under the framework of structural risk minimization (SRM) and Vapnik-Chervonenkis (VC) theory. The approach emphasizes the balance between the in-sample and out-of-sample fit, which can be achieved by using cross-validation to select a penalty on model complexity. We show that an exact relationship exists between the generalization ability of a model and model selection consistency. By implementing SRM and the VC inequality, we show that Lasso is L2-consistent for model selection under assumptions similar to those imposed on OLS. Furthermore, we derive a probabilistic bound for the distance between the penalized extremum estimator and the extremum estimator without penalty, which is dominated by overfitting. We also propose a new measurement of overfitting, GR2, based on generalization ability, that converges to zero if model selection is consistent. Using simulations, we demonstrate that the proposed CV-Lasso algorithm performs well in terms of model selection and overfitting control.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2016

Generalization error minimization: a new approach to model evaluation and selection with an application to penalized regression

We study model evaluation and model selection from the perspective of ge...
research
10/16/2021

On Model Selection Consistency of Lasso for High-Dimensional Ising Models on Tree-like Graphs

We consider the problem of high-dimensional Ising model selection using ...
research
04/02/2019

An Adapted Geographically Weighted Lasso(Ada-GWL) model for estimating metro ridership

Ridership estimation at station level plays a critical role in metro tra...
research
12/02/2018

Model Selection and estimation of Multi Screen Penalty

We propose a multi-step method, called Multi Screen Penalty (MSP), to es...
research
04/30/2015

Model Selection and Overfitting in Genetic Programming: Empirical Study [Extended Version]

Genetic Programming has been very successful in solving a large area of ...
research
05/14/2019

Fast and robust model selection based on ranks

We consider the problem of identifying important predictors in large dat...
research
01/30/2020

Learning the Hypotheses Space from data Part II: Convergence and Feasibility

In part I we proposed a structure for a general Hypotheses Space H, the ...

Please sign up or login with your details

Forgot password? Click here to reset