Constructing Artificial Data for Fine-tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation

10/21/2019
by   Gaurav Singh, et al.
27

Biomedical text tagging systems are plagued by the dearth of labeled training data. There have been recent attempts at using pre-trained encoders to deal with this issue. Pre-trained encoder provides representation of the input text which is then fed to task-specific layers for classification. The entire network is fine-tuned on the labeled data from the target task. Unfortunately, a low-resource biomedical task often has too few labeled instances for satisfactory fine-tuning. Also, if the label space is large, it contains few or no labeled instances for majority of the labels. Most biomedical tagging systems treat labels as indexes, ignoring the fact that these labels are often concepts expressed in natural language e.g. `Appearance of lesion on brain imaging'. To address these issues, we propose constructing extra labeled instances using label-text (i.e. label's name) as input for the corresponding label-index (i.e. label's index). In fact, we propose a number of strategies for manufacturing multiple artificial labeled instances from a single label. The network is then fine-tuned on a combination of real and these newly constructed artificial labeled instances. We evaluate the proposed approach on an important low-resource biomedical task called PICO annotation, which requires tagging raw text describing clinical trials with labels corresponding to different aspects of the trial i.e. PICO (Population, Intervention/Control, Outcome) characteristics of the trial. Our empirical results show that the proposed method achieves a new state-of-the-art performance for PICO annotation with very significant improvements over competitive baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2022

Prompt-based Text Entailment for Low-Resource Named Entity Recognition

Pre-trained Language Models (PLMs) have been applied in NLP tasks and ac...
research
10/13/2021

Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

We present state-of-the-art results on morphosyntactic tagging across di...
research
08/26/2021

Fine-tuning Pretrained Language Models with Label Attention for Explainable Biomedical Text Classification

The massive growth of digital biomedical data is making biomedical text ...
research
05/05/2023

Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge

Fine-tuning pre-trained language models (PLMs), e.g., SciBERT, generally...
research
10/18/2022

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

Part-of-Speech (POS) tagging is an important component of the NLP pipeli...
research
04/07/2019

Joint Learning of Pre-Trained and Random Units for Domain Adaptation in Part-of-Speech Tagging

Fine-tuning neural networks is widely used to transfer valuable knowledg...
research
12/21/2022

Can NLI Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction?

Two key obstacles in biomedical relation extraction (RE) are the scarcit...

Please sign up or login with your details

Forgot password? Click here to reset