Improving Low-Resource Question Answering using Active Learning in Multiple Stages

by   Maximilian Schmidt, et al.

Neural approaches have become very popular in the domain of Question Answering, however they require a large amount of annotated data. Furthermore, they often yield very good performance but only in the domain they were trained on. In this work we propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low resource settings, where the target domains are diverse in terms of difficulty and similarity to the source domain. We also investigate Active Learning for question answering in different stages, overall reducing the annotation effort of humans. For this purpose, we consider target domains in realistic settings, with an extremely low amount of annotated samples but with many unlabeled documents, which we assume can be obtained with little effort. Additionally, we assume sufficient amount of labeled data from the source domain is available. We perform extensive experiments to find the best setup for incorporating domain experts. Our findings show that our novel approach, where humans are incorporated as early as possible in the process, boosts performance in the low-resource, domain-specific setting, allowing for low-labeling-effort question answering systems in new, specialized domains. They further demonstrate how human annotation affects the performance of QA depending on the stage it is performed.


page 8

page 14

page 15

page 16


Cascading Adaptors to Leverage English Data to Improve Performance of Question Answering for Low-Resource Languages

Transformer based architectures have shown notable results on many down ...

Active Learning Over Multiple Domains in Natural Language Tasks

Studies of active learning traditionally assume the target and source da...

Improving Vietnamese Legal Question–Answering System based on Automatic Data Enrichment

Question answering (QA) in law is a challenging problem because legal do...

Can You Label Less by Using Out-of-Domain Data? Active Transfer Learning with Few-shot Instructions

Labeling social-media data for custom dimensions of toxicity and social ...

Few-shot Unified Question Answering: Tuning Models or Prompts?

Question-answering (QA) tasks often investigate specific question types,...

Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey

Dense retrieval (DR) approaches based on powerful pre-trained language m...

Simple and Effective Semi-Supervised Question Answering

Recent success of deep learning models for the task of extractive Questi...

Please sign up or login with your details

Forgot password? Click here to reset