Quantity vs Quality: Investigating the Trade-Off between Sample Size and Label Reliability

04/20/2022
by   Timo Bertram, et al.
0

In this paper, we study learning in probabilistic domains where the learner may receive incorrect labels but can improve the reliability of labels by repeatedly sampling them. In such a setting, one faces the problem of whether the fixed budget for obtaining training examples should rather be used for obtaining all different examples or for improving the label quality of a smaller number of examples by re-sampling their labels. We motivate this problem in an application to compare the strength of poker hands where the training signal depends on the hidden community cards, and then study it in depth in an artificial setting where we insert controlled noise levels into the MNIST database. Our results show that with increasing levels of noise, resampling previous examples becomes increasingly more important than obtaining new examples, as classifier performance deteriorates when the number of incorrect labels is too high. In addition, we propose two different validation strategies; switching from lower to higher validations over the course of training and using chi-square statistics to approximate the confidence in obtained labels.

READ FULL TEXT
research
06/14/2020

Class2Simi: A New Perspective on Learning with Label Noise

Label noise is ubiquitous in the era of big data. Deep learning algorith...
research
10/11/2022

C-Mixup: Improving Generalization in Regression

Improving the generalization of deep networks is an important open chall...
research
05/04/2017

Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels

Noisy PN learning is the problem of binary classification when training ...
research
11/22/2021

S3: Supervised Self-supervised Learning under Label Noise

Despite the large progress in supervised learning with Neural Networks, ...
research
03/05/2023

On the Capacity Limits of Privileged ERM

We study the supervised learning paradigm called Learning Using Privileg...
research
06/30/2016

Ballpark Learning: Estimating Labels from Rough Group Comparisons

We are interested in estimating individual labels given only coarse, agg...

Please sign up or login with your details

Forgot password? Click here to reset