AQuA: A Benchmarking Tool for Label Quality Assessment

06/15/2023
by   Mononito Goswami, et al.
0

Machine learning (ML) models are only as good as the data they are trained on. But recent studies have found datasets widely used to train and evaluate ML models, e.g. ImageNet, to have pervasive labeling errors. Erroneous labels on the train set hurt ML models' ability to generalize, and they impact evaluation and model selection using the test set. Consequently, learning in the presence of labeling errors is an active area of research, yet this field lacks a comprehensive benchmark to evaluate these methods. Most of these methods are evaluated on a few computer vision datasets with significant variance in the experimental protocols. With such a large pool of methods and inconsistent evaluation, it is also unclear how ML practitioners can choose the right models to assess label quality in their data. To this end, we propose a benchmarking environment AQuA to rigorously evaluate methods that enable machine learning in the presence of label noise. We also introduce a design space to delineate concrete design choices of label error detection models. We hope that our proposed design space and benchmark enable practitioners to choose the right tools to improve their label quality and that our benchmark enables objective and rigorous evaluation of machine learning tools facing mislabeled data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2022

An Empirical Study on the Usage of Automated Machine Learning Tools

The popularity of automated machine learning (AutoML) tools in different...
research
02/17/2023

Wizard of Errors: Introducing and Evaluating Machine Learning Errors in Wizard of Oz Studies

When designing Machine Learning (ML) enabled solutions, designers often ...
research
11/04/2022

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Human variation in labeling is often considered noise. Annotation projec...
research
07/29/2020

Decoding machine learning benchmarks

Despite the availability of benchmark machine learning (ML) repositories...
research
12/11/2019

Callisto: Entropy based test generation and data quality assessment for Machine Learning Systems

Machine Learning (ML) has seen massive progress in the last decade and a...
research
04/20/2019

CleanML: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]

It is widely recognized that the data quality affects machine learning (...
research
07/15/2021

Data vs classifiers, who wins?

The classification experiments covered by machine learning (ML) are comp...

Please sign up or login with your details

Forgot password? Click here to reset