Hierarchical Pretraining for Biomedical Term Embeddings

07/01/2023
by   Bryan Cai, et al.
0

Electronic health records (EHR) contain narrative notes that provide extensive details on the medical condition and management of patients. Natural language processing (NLP) of clinical notes can use observed frequencies of clinical terms as predictive features for downstream applications such as clinical decision making and patient trajectory prediction. However, due to the vast number of highly similar and related clinical concepts, a more effective modeling strategy is to represent clinical terms as semantic embeddings via representation learning and use the low dimensional embeddings as feature vectors for predictive modeling. To achieve efficient representation, fine-tuning pretrained language models with biomedical knowledge graphs may generate better embeddings for biomedical terms than those from standard language models alone. These embeddings can effectively discriminate synonymous pairs of from those that are unrelated. However, they often fail to capture different degrees of similarity or relatedness for concepts that are hierarchical in nature. To overcome this limitation, we propose HiPrBERT, a novel biomedical term representation model trained on additionally complied data that contains hierarchical structures for various biomedical terms. We modify an existing contrastive loss function to extract information from these hierarchies. Our numerical experiments demonstrate that HiPrBERT effectively learns the pair-wise distance from hierarchical information, resulting in a substantially more informative embeddings for further biomedical applications

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2021

Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing

Motivation: A perennial challenge for biomedical researchers and clinica...
research
04/01/2022

Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations

Term clustering is important in biomedical knowledge graph construction....
research
12/06/2019

Med2Meta: Learning Representations of Medical Concepts with Meta-Embeddings

Distributed representations of medical concepts have been used to suppor...
research
08/26/2022

Extracting Biomedical Factual Knowledge Using Pretrained Language Model and Electronic Health Record Context

Language Models (LMs) have performed well on biomedical natural language...
research
09/06/2019

Improved Patient Classification with Language Model Pretraining Over Clinical Notes

Clinical notes in electronic health records contain highly heterogeneous...
research
03/11/2021

Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative Study

Unsupervised pretraining is an integral part of many natural language pr...
research
08/11/2023

Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

We hypothesize that large language models (LLMs) based on the transforme...

Please sign up or login with your details

Forgot password? Click here to reset