BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER

05/18/2023
by   Sreyan Ghosh, et al.
0

Biomedical Named Entity Recognition (BioNER) is the fundamental task of identifying named entities from biomedical text. However, BioNER suffers from severe data scarcity and lacks high-quality labeled data due to the highly specialized and expert knowledge required for annotation. Though data augmentation has shown to be highly effective for low-resource NER in general, existing data augmentation techniques fail to produce factual and diverse augmentations for BioNER. In this paper, we present BioAug, a novel data augmentation framework for low-resource BioNER. BioAug, built on BART, is trained to solve a novel text reconstruction task based on selective masking and knowledge augmentation. Post training, we perform conditional generation and generate diverse augmentations conditioning BioAug on selectively corrupted text similar to the training stage. We demonstrate the effectiveness of BioAug on 5 benchmark BioNER datasets and show that BioAug outperforms all our baselines by a significant margin (1.5 to generate augmentations that are both more factual and diverse. Code: https://github.com/Sreyan88/BioAug.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER

Complex Named Entity Recognition (NER) is the task of detecting linguist...
research
08/26/2021

Data Augmentation for Low-Resource Named Entity Recognition Using Backtranslation

The state of art natural language processing systems relies on sizable t...
research
11/18/2022

GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

We introduce GENIUS: a conditional text generation model using sketches ...
research
05/19/2023

Enhancing Few-shot NER with Prompt Ordering based Data Augmentation

Recently, data augmentation (DA) methods have been proven to be effectiv...
research
10/04/2020

Local Additivity Based Data Augmentation for Semi-supervised NER

Named Entity Recognition (NER) is one of the first stages in deep langua...
research
10/14/2022

Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition

In this work, we take the named entity recognition task in the English l...
research
11/03/2020

DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Data augmentation techniques have been widely used to improve machine le...

Please sign up or login with your details

Forgot password? Click here to reset