Data Augmentation for Biomedical Factoid Question Answering

04/10/2022
by   Dimitris Pappas, et al.
0

We study the effect of seven data augmentation (da) methods in factoid question answering, focusing on the biomedical domain, where obtaining training instances is particularly difficult. We experiment with data from the BioASQ challenge, which we augment with training instances obtained from an artificial biomedical machine reading comprehension dataset, or via back-translation, information retrieval, word substitution based on word2vec embeddings, or masked language modeling, question generation, or extending the given passage with additional context. We show that da can lead to very significant performance gains, even when using large pre-trained Transformers, contributing to a broader discussion of if/when da benefits large pre-trained models. One of the simplest da methods, word2vec-based word substitution, performed best and is recommended. We release our artificial training instances and code.

READ FULL TEXT
research
12/04/2019

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering

To produce a domain-agnostic question answering model for the Machine Re...
research
10/04/2020

Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space

In this paper, we propose a novel data augmentation method, referred to ...
research
09/18/2019

Pre-trained Language Model for Biomedical Question Answering

The recent success of question answering systems is largely attributed t...
research
01/09/2018

Biomedical Question Answering via Weighted Neural Network Passage Retrieval

The amount of publicly available biomedical literature has been growing ...
research
10/15/2021

Tracing Origins: Coref-aware Machine Reading Comprehension

Machine reading comprehension is a heavily-studied research and test fie...
research
06/02/2021

Knowing More About Questions Can Help: Improving Calibration in Question Answering

We study calibration in question answering, estimating whether model cor...
research
11/27/2019

Evaluating Commonsense in Pre-trained Language Models

Contextualized representations trained over large raw text data have giv...

Please sign up or login with your details

Forgot password? Click here to reset