Interface Design for Crowdsourcing Hierarchical Multi-Label Text Annotations

02/06/2023
by   Rickard Stureborg, et al.
0

Human data labeling is an important and expensive task at the heart of supervised learning systems. Hierarchies help humans understand and organize concepts. We ask whether and how concept hierarchies can inform the design of annotation interfaces to improve labeling quality and efficiency. We study this question through annotation of vaccine misinformation, where the labeling task is difficult and highly subjective. We investigate 6 user interface designs for crowdsourcing hierarchical labels by collecting over 18,000 individual annotations. Under a fixed budget, integrating hierarchies into the design improves crowdsource workers' F1 scores. We attribute this to (1) Grouping similar concepts, improving F1 scores by +0.16 over random groupings, (2) Strong relative performance on high-difficulty examples (relative F1 score difference of +0.40), and (3) Filtering out obvious negatives, increasing precision by +0.07. Ultimately, labeling schemes integrating the hierarchy outperform those that do not - achieving mean F1 of 0.70.

READ FULL TEXT

page 12

page 16

page 17

research
09/06/2021

Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification

Annotated images are required for both supervised model training and eva...
research
07/11/2021

Learning from Crowds with Sparse and Imbalanced Annotations

Traditional supervised learning requires ground truth labels for the tra...
research
12/07/2021

The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation

Human annotated data is the cornerstone of today's artificial intelligen...
research
12/15/2020

Enriched Annotations for Tumor Attribute Classification from Pathology Reports with Limited Labeled Data

Precision medicine has the potential to revolutionize healthcare, but mu...
research
07/25/2016

Much Ado About Time: Exhaustive Annotation of Temporal Data

Large-scale annotated datasets allow AI systems to learn from and build ...
research
10/15/2021

Learning with Noisy Labels by Targeted Relabeling

Crowdsourcing platforms are often used to collect datasets for training ...
research
07/09/2019

Let's Keep It Safe: Designing User Interfaces that Allow Everyone to Contribute to AI Safety

When AI systems are granted the agency to take impactful actions in the ...

Please sign up or login with your details

Forgot password? Click here to reset