Recognising Biomedical Names: Challenges and Solutions

06/23/2021
by   Xiang Dai, et al.
0

The growth rate in the amount of biomedical documents is staggering. Unlocking information trapped in these documents can enable researchers and practitioners to operate confidently in the information world. Biomedical NER, the task of recognising biomedical names, is usually employed as the first step of the NLP pipeline. Standard NER models, based on sequence tagging technique, are good at recognising short entity mentions in the generic domain. However, there are several open challenges of applying these models to recognise biomedical names: 1) Biomedical names may contain complex inner structure (discontinuity and overlapping) which cannot be recognised using standard sequence tagging technique; 2) The training of NER models usually requires large amount of labelled data, which are difficult to obtain in the biomedical domain; and, 3) Commonly used language representation models are pre-trained on generic data; a domain shift therefore exists between these models and target biomedical data. To deal with these challenges, we explore several research directions and make the following contributions: 1) we propose a transition-based NER model which can recognise discontinuous mentions; 2) We develop a cost-effective approach that nominates the suitable pre-training data; and, 3) We design several data augmentation methods for NER. Our contributions have obvious practical implications, especially when new biomedical applications are needed. Our proposed data augmentation methods can help the NER model achieve decent performance, requiring only a small amount of labelled data. Our investigation regarding selecting pre-training data can improve the model by incorporating language representation models, which are pre-trained using in-domain data. Finally, our proposed transition-based NER model can further improve the performance by recognising discontinuous mentions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2020

An Effective Transition-based Model for Discontinuous NER

Unlike widely used Named Entity Recognition (NER) data sets in generic d...
research
08/16/2023

BIOptimus: Pre-training an Optimal Biomedical Language Model with Curriculum Learning for Named Entity Recognition

Using language models (LMs) pre-trained in a self-supervised setting on ...
research
06/30/2023

Biomedical Language Models are Robust to Sub-optimal Tokenization

As opposed to general English, many concepts in biomedical terminology h...
research
05/29/2023

Extrinsic Factors Affecting the Accuracy of Biomedical NER

Biomedical named entity recognition (NER) is a critial task that aims to...
research
04/11/2022

Generative Biomedical Entity Linking via Knowledge Base-Guided Pre-training and Synonyms-Aware Fine-tuning

Entities lie in the heart of biomedical natural language understanding, ...
research
04/21/2022

TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla

Many areas, such as the biological and healthcare domain, artistic works...
research
07/16/2019

MedCATTrainer: A Biomedical Free Text Annotation Interface with Active Learning and Research Use Case Specific Customisation

We present MedCATTrainer an interface for building, improving and custom...

Please sign up or login with your details

Forgot password? Click here to reset