Learning from Crowds with Sparse and Imbalanced Annotations

07/11/2021
by   Ye Shi, et al.
0

Traditional supervised learning requires ground truth labels for the training data, whose collection can be difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution through resorting to non-expert crowds. To reduce the labeling error effects, one common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the sparse annotation phenomenon. In this paper, we note that when meeting with class-imbalance, i.e., when the ground truth labels are class-imbalanced, the sparse annotations are prone to be skewly distributed, which thus can severely bias the learning algorithm. To combat this issue, we propose one self-training based approach named Self-Crowd by progressively adding confident pseudo-annotations and rebalancing the annotation distribution. Specifically, we propose one distribution aware confidence measure to select confident pseudo-annotations, which adopts the resampling strategy to oversample the minority annotations and undersample the majority annotations. On one real-world crowdsourcing image classification task, we show that the proposed method yields more balanced annotations throughout training than the distribution agnostic methods and substantially improves the learning performance at different annotation sparsity levels.

READ FULL TEXT
research
01/15/2023

On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning

When there are unlabeled Out-Of-Distribution (OOD) data from other class...
research
07/22/2021

Improve Learning from Crowds via Generative Augmentation

Crowdsourcing provides an efficient label collection schema for supervis...
research
04/05/2023

Multi-annotator Deep Learning: A Probabilistic Framework for Classification

Solving complex classification tasks using deep neural networks typicall...
research
12/08/2020

Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution

This paper develops and implements a scalable methodology for (a) estima...
research
02/06/2023

Interface Design for Crowdsourcing Hierarchical Multi-Label Text Annotations

Human data labeling is an important and expensive task at the heart of s...
research
03/31/2021

CrowdTeacher: Robust Co-teaching with Noisy Answers Sample-specific Perturbations for Tabular Data

Samples with ground truth labels may not always be available in numerous...
research
10/12/2021

Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations

Majority voting and averaging are common approaches employed to resolve ...

Please sign up or login with your details

Forgot password? Click here to reset