Theoretical Analyses of Cross-Validation Error and Voting in Instance-Based Learning

12/11/2002
by   Peter D. Turney, et al.
0

This paper begins with a general theory of error in cross-validation testing of algorithms for supervised learning from examples. It is assumed that the examples are described by attribute-value pairs, where the values are symbolic. Cross-validation requires a set of training examples and a set of testing examples. The value of the attribute that is to be predicted is known to the learner in the training set, but unknown in the testing set. The theory demonstrates that cross-validation error has two components: error on the training set (inaccuracy) and sensitivity to noise (instability). This general theory is then applied to voting in instance-based learning. Given an example in the testing set, a typical instance-based learning algorithm predicts the designated attribute by voting among the k nearest neighbors (the k most similar examples) to the testing example in the training set. Voting is intended to increase the stability (resistance to noise) of instance-based learning, but a theoretical analysis shows that there are circumstances in which voting can be destabilizing. The theory suggests ways to minimize cross-validation error, by insuring that voting is stable and does not adversely affect accuracy.

READ FULL TEXT
research
12/11/2002

A Theory of Cross-Validation Error

This paper presents a theory of error in cross-validation testing of alg...
research
11/27/2021

Fast and Informative Model Selection using Learning Curve Cross-Validation

Common cross-validation (CV) methods like k-fold cross-validation or Mon...
research
02/08/2021

Model Rectification via Unknown Unknowns Extraction from Deployment Samples

Model deficiency that results from incomplete training data is a form of...
research
05/20/2017

( β, ϖ)-stability for cross-validation and the choice of the number of folds

In this paper, we introduce a new concept of stability for cross-validat...
research
03/22/2021

A Link between Coding Theory and Cross-Validation with Applications

We study the combinatorics of cross-validation based AUC estimation unde...
research
04/19/2023

Data as voters: instance selection using approval-based multi-winner voting

We present a novel approach to the instance selection problem in machine...
research
01/12/2023

Toward Theoretical Guidance for Two Common Questions in Practical Cross-Validation based Hyperparameter Selection

We show, to our knowledge, the first theoretical treatments of two commo...

Please sign up or login with your details

Forgot password? Click here to reset