You Are What You Annotate: Towards Better Models through Annotator Representations

05/24/2023
by   Naihao Deng, et al.
0

Annotator disagreement is ubiquitous in natural language processing (NLP) tasks. There are multiple reasons for such disagreements, including the subjectivity of the task, difficult cases, unclear guidelines, and so on. Rather than simply aggregating labels to obtain data annotations, we instead propose to explicitly account for the annotator idiosyncrasies and leverage them in the modeling process. We create representations for the annotators (annotator embeddings) and their annotations (annotation embeddings) with learnable matrices associated with each. Our approach significantly improves model performance on various NLP benchmarks by adding fewer than 1 parameters. By capturing the unique tendencies and subjectivity of individual annotators, our embeddings help democratize AI and ensure that AI models are inclusive of diverse viewpoints.

READ FULL TEXT

page 7

page 8

page 10

page 17

research
09/01/2019

Incidental Supervision from Question-Answering Signals

Human annotations are costly for many natural language processing (NLP) ...
research
12/01/2020

Meta-Embeddings for Natural Language Inference and Semantic Similarity tasks

Word Representations form the core component for almost all advanced Nat...
research
12/19/2022

Multi-View Knowledge Distillation from Crowd Annotations for Out-of-Domain Generalization

Selecting an effective training signal for tasks in natural language pro...
research
04/25/2023

Lessons Learned from a Citizen Science Project for Natural Language Processing

Many Natural Language Processing (NLP) systems use annotated corpora for...
research
12/17/2020

BERT Goes Shopping: Comparing Distributional Models for Product Representations

Word embeddings (e.g., word2vec) have been applied successfully to eComm...
research
06/20/2023

The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics

Many NLP tasks exhibit human label variation, where different annotators...
research
12/14/2021

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

Labelled data is the foundation of most natural language processing task...

Please sign up or login with your details

Forgot password? Click here to reset