Multi-View Knowledge Distillation from Crowd Annotations for Out-of-Domain Generalization

12/19/2022
by   Dustin Wright, et al.
0

Selecting an effective training signal for tasks in natural language processing is difficult: collecting expert annotations is expensive, and crowd-sourced annotations may not be reliable. At the same time, recent work in machine learning has demonstrated that learning from soft-labels acquired from crowd annotations can be effective, especially when there is distribution shift in the test set. However, the best method for acquiring these soft labels is inconsistent across tasks. This paper proposes new methods for acquiring soft-labels from crowd-annotations by aggregating the distributions produced by existing methods. In particular, we propose to find a distribution over classes by learning from multiple-views of crowd annotations via temperature scaling and finding the Jensen-Shannon centroid of their distributions. We demonstrate that using these aggregation methods leads to best or near-best performance across four NLP tasks on out-of-domain test sets, mitigating fluctuations in performance when using the constituent methods on their own. Additionally, these methods result in best or near-best uncertainty estimation across tasks. We argue that aggregating different views of crowd-annotations as soft-labels is an effective way to ensure performance which is as good or better than the best individual view, which is useful given the inconsistency in performance of the individual methods.

READ FULL TEXT

page 7

page 8

page 14

research
12/15/2021

Expert and Crowd-Guided Affect Annotation and Prediction

We employ crowdsourcing to acquire time-continuous affective annotations...
research
02/28/2023

Training sound event detection with soft labels from crowdsourced annotations

In this paper, we study the use of soft labels to train a system for sou...
research
08/02/2019

Self-Knowledge Distillation in Natural Language Processing

Since deep learning became a key player in natural language processing (...
research
05/24/2023

You Are What You Annotate: Towards Better Models through Annotator Representations

Annotator disagreement is ubiquitous in natural language processing (NLP...
research
05/05/2020

CODA-19: Reliably Annotating Research Aspects on 10,000+ CORD-19 Abstracts Using Non-Expert Crowd

This paper introduces CODA-19, a human-annotated dataset that denotes th...
research
07/02/2022

Eliciting and Learning with Soft Labels from Every Annotator

The labels used to train machine learning (ML) models are of paramount i...
research
10/14/2021

Practical Benefits of Feature Feedback Under Distribution Shift

In attempts to develop sample-efficient algorithms, researcher have expl...

Please sign up or login with your details

Forgot password? Click here to reset