Improving Label Quality by Jointly Modeling Items and Annotators

We propose a fully Bayesian framework for learning ground truth labels from noisy annotators. Our framework ensures scalability by factoring a generative, Bayesian soft clustering model over label distributions into the classic David and Skene joint annotator-data model. Earlier research along these lines has neither fully incorporated label distributions nor explored clustering by annotators only or data only. Our framework incorporates all of these properties as: (1) a graphical model designed to provide better ground truth estimates of annotator responses as input to any black box supervised learning algorithm, and (2) a standalone neural model whose internal structure captures many of the properties of the graphical model. We conduct supervised learning experiments using both models and compare them to the performance of one baseline and a state-of-the-art model.

READ FULL TEXT

page 3

page 9

page 13

research
05/07/2018

Label Refinery: Improving ImageNet Classification through Label Progression

Among the three main components (data, labels, and models) of any superv...
research
08/27/2020

Moderately supervised learning: definition and framework

Supervised learning (SL) has achieved remarkable success in numerous art...
research
01/28/2023

DALI: Dynamically Adjusted Label Importance for Noisy Partial Label Learning

Noisy partial label learning (noisy PLL) is an important branch of weakl...
research
09/08/2023

Generating the Ground Truth: Synthetic Data for Label Noise Research

Most real-world classification tasks suffer from label noise to some ext...
research
05/25/2022

Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations

Despite recent explosion in research interests, in-context learning and ...
research
10/25/2022

Joint Point and Variance Estimation under a Hierarchical Bayesian model for Survey Count Data

We propose a novel Bayesian framework for the joint modeling of survey p...
research
07/13/2022

Beyond Hard Labels: Investigating data label distributions

High-quality data is a key aspect of modern machine learning. However, l...

Please sign up or login with your details

Forgot password? Click here to reset