Practical Annotation Strategies for Question Answering Datasets

03/06/2020
by   Bernhard Kratzwald, et al.
0

Annotating datasets for question answering (QA) tasks is very costly, as it requires intensive manual labor and often domain-specific knowledge. Yet strategies for annotating QA datasets in a cost-effective manner are scarce. To provide a remedy for practitioners, our objective is to develop heuristic rules for annotating a subset of questions, so that the annotation cost is reduced while maintaining both in- and out-of-domain performance. For this, we conduct a large-scale analysis in order to derive practical recommendations. First, we demonstrate experimentally that more training samples contribute often only to a higher in-domain test-set performance, but do not help the model in generalizing to unseen datasets. Second, we develop a model-guided annotation strategy: it makes a recommendation with regard to which subset of samples should be annotated. Its effectiveness is demonstrated in a case study based on domain customization of QA to a clinical setting. Here, remarkably, annotating a stratified subset with only 1.2 of the performance as if the complete dataset was annotated. Hence, the labeling effort can be reduced immensely. Altogether, our work fulfills a demand in practice when labeling budgets are limited and where thus recommendations are needed for annotating QA datasets more cost-effectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2020

Learning a Cost-Effective Annotation Policy for Question Answering

State-of-the-art question answering (QA) relies upon large amounts of tr...
research
07/21/2016

Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering

While question answering (QA) with neural network, i.e. neural QA, has a...
research
07/10/2020

Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing

Building and maintaining high-quality test sets remains a laborious and ...
research
12/17/2022

Improving Question Answering Performance through Manual Annotation: Costs, Benefits and Strategies

Recently proposed systems for open-domain question answering (OpenQA) re...
research
04/02/2018

Simple and Effective Semi-Supervised Question Answering

Recent success of deep learning models for the task of extractive Questi...
research
08/30/2023

Knowing Your Annotator: Rapidly Testing the Reliability of Affect Annotation

The laborious and costly nature of affect annotation is a key detrimenta...
research
08/23/2021

Analyzing the Granularity and Cost of Annotation in Clinical Sequence Labeling

Well-annotated datasets, as shown in recent top studies, are becoming mo...

Please sign up or login with your details

Forgot password? Click here to reset