Unsupervised Pre-training for Biomedical Question Answering

by   Vaishnavi Kommaraju, et al.

We explore the suitability of unsupervised representation learning methods on biomedical text – BioBERT, SciBERT, and BioSentVec – for biomedical question answering. To further improve unsupervised representations for biomedical QA, we introduce a new pre-training task from unlabeled data designed to reason about biomedical entities in the context. Our pre-training method consists of corrupting a given context by randomly replacing some mention of a biomedical entity with a random entity mention and then querying the model with the correct entity mention in order to locate the corrupted part of the context. This de-noising task enables the model to learn good representations from abundant, unlabeled biomedical text that helps QA tasks and minimizes the train-test mismatch between the pre-training task and the downstream QA tasks by requiring the model to predict spans. Our experiments show that pre-training BioBERT on the proposed pre-training task significantly boosts performance and outperforms the previous best model from the 7th BioASQ Task 7b-Phase B challenge.


page 1

page 2

page 3

page 4


Contextual embedding and model weighting by fusing domain knowledge on Biomedical Question Answering

Biomedical Question Answering aims to obtain an answer to the given ques...

How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering

Using deep learning models on small scale datasets would result in overf...

Relation-Guided Pre-Training for Open-Domain Question Answering

Answering complex open-domain questions requires understanding the laten...

Self-alignment Pre-training for Biomedical Entity Representations

Despite the widespread success of self-supervised learning via masked la...

Boosting Low-Resource Biomedical QA via Entity-Aware Masking Strategies

Biomedical question-answering (QA) has gained increased attention for it...

Biomedical Entity Representations with Synonym Marginalization

Biomedical named entities often play important roles in many biomedical ...

Unsupervised pre-training helps to conserve views from input distribution

We investigate the effects of the unsupervised pre-training method under...