Eliciting and Learning with Soft Labels from Every Annotator

07/02/2022
by   Katherine M. Collins, et al.
0

The labels used to train machine learning (ML) models are of paramount importance. Typically for ML classification tasks, datasets contain hard labels, yet learning using soft labels has been shown to yield benefits for model generalization, robustness, and calibration. Earlier work found success in forming soft labels from multiple annotators' hard labels; however, this approach may not converge to the best labels and necessitates many annotators, which can be expensive and inefficient. We focus on efficiently eliciting soft labels from individual annotators. We collect and release a dataset of soft labels for CIFAR-10 via a crowdsourcing study (N=248). We demonstrate that learning with our labels achieves comparable model performance to prior approaches while requiring far fewer annotators. Our elicitation methodology therefore shows promise towards enabling practitioners to enjoy the benefits of improved model performance and reliability with fewer annotators, and serves as a guide for future dataset curators on the benefits of leveraging richer information, such as categorical uncertainty, from individual annotators.

READ FULL TEXT

page 2

page 4

page 5

page 12

research
02/28/2023

Training sound event detection with soft labels from crowdsourced annotations

In this paper, we study the use of soft labels to train a system for sou...
research
12/01/2022

Soft Labels for Rapid Satellite Object Detection

Soft labels in image classification are vector representations of an ima...
research
07/13/2022

Beyond Hard Labels: Investigating data label distributions

High-quality data is a key aspect of modern machine learning. However, l...
research
04/26/2020

COLAM: Co-Learning of Deep Neural Networks and Soft Labels via Alternating Minimization

Softening labels of training datasets with respect to data representatio...
research
12/19/2022

Multi-View Knowledge Distillation from Crowd Annotations for Out-of-Domain Generalization

Selecting an effective training signal for tasks in natural language pro...
research
03/28/2023

Dice Semimetric Losses: Optimizing the Dice Score with Soft Labels

The soft Dice loss (SDL) has taken a pivotal role in many automated segm...
research
04/29/2021

Learning Robust Variational Information Bottleneck with Reference

We propose a new approach to train a variational information bottleneck ...

Please sign up or login with your details

Forgot password? Click here to reset