Iteratively Learning from the Best

10/28/2018
by   Yanyao Shen, et al.
0

We study a simple generic framework to address the issue of bad training data; both bad labels in supervised problems, and bad samples in unsupervised ones. Our approach starts by fitting a model to the whole training dataset, but then iteratively improves it by alternating between (a) revisiting the training data to select samples with lowest current loss, and (b) re-training the model on only these selected samples. It can be applied to any existing model training setting which provides a loss measure for samples, and a way to refit on new ones. We show the merit of this approach in both theory and practice. We first prove statistical consistency, and linear convergence to the ground truth and global optimum, for two simpler model settings: mixed linear regression, and gaussian mixture models. We then demonstrate its success empirically in (a) saving the accuracy of existing deep image classifiers when there are errors in the labels of training images, and (b) improving the quality of samples generated by existing DC-GAN models, when it is given training data that contains a fraction of the images from a different and unintended dataset. The experimental results show significant improvement over the baseline methods that ignore the existence of bad labels/samples.

READ FULL TEXT

page 8

page 9

page 10

research
02/10/2019

Iterative Least Trimmed Squares for Mixed Linear Regression

Given a linear regression setting, Iterative Least Trimmed Squares (ILTS...
research
07/03/2019

Supervised Classifiers for Audio Impairments with Noisy Labels

Voice-over-Internet-Protocol (VoIP) calls are prone to various speech im...
research
06/19/2022

Gray Learning from Non-IID Data with Out-of-distribution Samples

The quality of the training data annotated by experts cannot be guarante...
research
09/04/2016

Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences

We provide two fundamental results on the population (infinite-sample) l...
research
12/10/2020

One for More: Selecting Generalizable Samples for Generalizable ReID Model

Current training objectives of existing person Re-IDentification (ReID) ...
research
06/01/2011

Identifying Mislabeled Training Data

This paper presents a new approach to identifying and eliminating mislab...
research
11/10/2020

Towards a Better Global Loss Landscape of GANs

Understanding of GAN training is still very limited. One major challenge...

Please sign up or login with your details

Forgot password? Click here to reset