An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression

12/08/2012
by   Charanpal Dhanjal, et al.
0

Model selection is a crucial issue in machine-learning and a wide variety of penalisation methods (with possibly data dependent complexity penalties) have recently been introduced for this purpose. However their empirical performance is generally not well documented in the literature. It is the goal of this paper to investigate to which extent such recent techniques can be successfully used for the tuning of both the regularisation and kernel parameters in support vector regression (SVR) and the complexity measure in regression trees (CART). This task is traditionally solved via V-fold cross-validation (VFCV), which gives efficient results for a reasonable computational cost. A disadvantage however of VFCV is that the procedure is known to provide an asymptotically suboptimal risk estimate as the number of examples tends to infinity. Recently, a penalisation procedure called V-fold penalisation has been proposed to improve on VFCV, supported by theoretical arguments. Here we report on an extensive set of experiments comparing V-fold penalisation and VFCV for SVR/CART calibration on several benchmark datasets. We highlight cases in which VFCV and V-fold penalisation provide poor estimates of the risk respectively and introduce a modified penalisation technique to reduce the estimation error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2008

V-fold cross-validation improved: V-fold penalization

We study the efficiency of V-fold cross-validation (VFCV) for model sele...
research
06/18/2021

Local asymptotics of cross-validation in least-squares density estimation

In model selection, several types of cross-validation are commonly used ...
research
06/19/2018

Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models

K-fold cross validation (CV) is a popular method for estimating the true...
research
03/09/2022

Cross validation for model selection: a primer with examples from ecology

The growing use of model-selection principles in ecology for statistical...
research
01/23/2020

Improving generalisation of AutoML systems with dynamic fitness evaluations

A common problem machine learning developers are faced with is overfitti...
research
08/22/2019

Efficient Cross-Validation of Echo State Networks

Echo State Networks (ESNs) are known for their fast and precise one-shot...
research
12/30/2018

On Cross-validation for Sparse Reduced Rank Regression

In high-dimensional data analysis, regularization methods pursuing spars...

Please sign up or login with your details

Forgot password? Click here to reset