UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

04/27/2022
by   Thilini Wijesiriwardene, et al.
0

The UMLS Metathesaurus integrates more than 200 biomedical source vocabularies. During the Metathesaurus construction process, synonymous terms are clustered into concepts by human editors, assisted by lexical similarity algorithms. This process is error-prone and time-consuming. Recently, a deep learning model (LexLM) has been developed for the UMLS Vocabulary Alignment (UVA) task. This work introduces UBERT, a BERT-based language model, pretrained on UMLS terms via a supervised Synonymy Prediction (SP) task replacing the original Next Sentence Prediction (NSP) task. The effectiveness of UBERT for UMLS Metathesaurus construction process is evaluated using the UMLS Vocabulary Alignment (UVA) task. We show that UBERT outperforms the LexLM, as well as biomedical BERT-based models. Key to the performance of UBERT are the synonymy prediction task specifically developed for UBERT, the tight alignment of training data to the UVA task, and the similarity of the models used for pretrained UBERT.

READ FULL TEXT
research
05/21/2022

UVA Resources for the Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

The construction and maintenance process of the UMLS (Unified Medical La...
research
09/10/2021

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization

We present IndoBERTweet, the first large-scale pretrained model for Indo...
research
02/03/2023

Bioformer: an efficient transformer language model for biomedical text mining

Pretrained language models such as Bidirectional Encoder Representations...
research
09/25/2019

Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

In natural language processing, it has been observed recently that gener...
research
10/26/2021

AVocaDo: Strategy for Adapting Vocabulary to Downstream Domain

During the fine-tuning phase of transfer learning, the pretrained vocabu...
research
09/28/2021

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

Pretrained Masked Language Models (MLMs) have revolutionised NLP in rece...
research
08/18/2023

Predictive Authoring for Brazilian Portuguese Augmentative and Alternative Communication

Individuals with complex communication needs (CCN) often rely on augment...

Please sign up or login with your details

Forgot password? Click here to reset