Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

05/19/2019
by   Yinfei Yang, et al.
0

Modern NLP systems require high-quality annotated data. In specialized domains, expert annotations may be prohibitively expensive. An alternative is to rely on crowdsourcing to reduce costs at the risk of introducing noise. In this paper we demonstrate that directly modeling instance difficulty can be used to improve model performance, and to route instances to appropriate annotators. Our difficulty prediction model combines two learned representations: a `universal' encoder trained on out-of-domain data, and a task-specific encoder. Experiments on a complex biomedical information extraction task using expert and lay annotators show that: (i) simply excluding from the training data instances predicted to be difficult yields a small boost in performance; (ii) using difficulty scores to weight instances during training provides further, consistent gains; (iii) assigning instances predicted to be difficult to domain experts is an effective strategy for task routing. Our experiments confirm the expectation that for specialized tasks expert annotations are higher quality than crowd labels, and hence preferable to obtain if practical. Moreover, augmenting small amounts of expert data with a larger set of lay annotations leads to further improvements in model performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2019

Annotation and Classification of Sentence-level Revision Improvement

Studies of writing revisions rarely focus on revision quality. To addres...
research
12/15/2021

Expert and Crowd-Guided Affect Annotation and Prediction

We employ crowdsourcing to acquire time-continuous affective annotations...
research
06/06/2019

Analysis of Automatic Annotation Suggestions for Hard Discourse-Level Tasks in Expert Domains

Many complex discourse-level tasks can aid domain experts in their work ...
research
12/24/2020

Learning from Crowds by Modeling Common Confusions

Crowdsourcing provides a practical way to obtain large amounts of labele...
research
09/11/2020

Variance Loss: A Confidence-Based Reweighting Strategy for Coarse Semantic Segmentation

Coarsely-labeled semantic segmentation annotations are easy to obtain, b...
research
09/26/2022

Diversified Dynamic Routing for Vision Tasks

Deep learning models for vision tasks are trained on large datasets unde...
research
06/04/2021

Annotation Curricula to Implicitly Train Non-Expert Annotators

Annotation studies often require annotators to familiarize themselves wi...

Please sign up or login with your details

Forgot password? Click here to reset