UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition

07/20/2023
by   Aidan Mannion, et al.
0

Pre-trained transformer language models (LMs) have in recent years become the dominant paradigm in applied NLP. These models have achieved state-of-the-art performance on tasks such as information extraction, question answering, sentiment analysis, document classification and many others. In the biomedical domain, significant progress has been made in adapting this paradigm to NLP tasks that require the integration of domain-specific knowledge as well as statistical modelling of language. In particular, research in this area has focused on the question of how best to construct LMs that take into account not only the patterns of token distribution in medical text, but also the wealth of structured information contained in terminology resources such as the UMLS. This work contributes a data-centric paradigm for enriching the language representations of biomedical transformer-encoder LMs by extracting text sequences from the UMLS. This allows for graph-based learning objectives to be combined with masked-language pre-training. Preliminary results from experiments in the extension of pre-trained LMs as well as training from scratch show that this framework improves downstream performance on multiple biomedical and clinical Named Entity Recognition (NER) tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2022

RuBioRoBERTa: a pre-trained biomedical language model for Russian language biomedical text mining

This paper presents several BERT-based models for Russian language biome...
research
04/19/2021

ELECTRAMed: a new pre-trained language representation model for biomedical NLP

The overwhelming amount of biomedical scientific texts calls for the dev...
research
09/25/2022

Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique

Since radiology reports needed for clinical practice and research are wr...
research
04/18/2023

HeRo: RoBERTa and Longformer Hebrew Language Models

In this paper, we fill in an existing gap in resources available to the ...
research
01/30/2020

Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks

This research on data extraction methods applies recent advances in natu...
research
09/08/2021

Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario

This work presents biomedical and clinical language models for Spanish b...
research
06/17/2021

Biomedical Interpretable Entity Representations

Pre-trained language models induce dense entity representations that off...

Please sign up or login with your details

Forgot password? Click here to reset