Identifying Mislabeled Training Data

06/01/2011
by   C. E. Brodley, et al.
0

This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classifiers that serve as noise filters for the training data. We evaluate single algorithm, majority vote and consensus filters on five datasets that are prone to labeling errors. Our experiments illustrate that filtering significantly improves classification accuracy for noise levels up to 30 percent. An analytical and empirical evaluation of the precision of our approach shows that consensus filters are conservative at throwing away good data at the expense of retaining bad data and that majority filters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus filters are preferable, whereas majority vote filters are preferable for situations with an abundance of data.

READ FULL TEXT

page 11

page 13

research
12/13/2013

An Extensive Evaluation of Filtering Misclassified Instances in Supervised Classification Tasks

Removing or filtering outliers and mislabeled instances prior to trainin...
research
09/05/2017

A Statistical Approach to Increase Classification Accuracy in Supervised Learning Algorithms

Probabilistic mixture models have been widely used for different machine...
research
02/26/2023

Autoencoders as Pattern Filters

We discuss a simple approach to transform autoencoders into "pattern fil...
research
06/11/2021

Break-It-Fix-It: Unsupervised Learning for Program Repair

We consider repair tasks: given a critic (e.g., compiler) that assesses ...
research
10/28/2018

Iteratively Learning from the Best

We study a simple generic framework to address the issue of bad training...
research
12/13/2018

Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem

Classifiers used in the wild, in particular for safety-critical systems,...
research
06/11/2019

Simultaneously Learning Architectures and Features of Deep Neural Networks

This paper presents a novel method which simultaneously learns the numbe...

Please sign up or login with your details

Forgot password? Click here to reset