Developing a general-purpose clinical language inference model from a large corpus of clinical notes

10/12/2022
by   Madhumita Sushil, et al.
0

Several biomedical language models have already been developed for clinical language inference. However, these models typically utilize general vocabularies and are trained on relatively small clinical corpora. We sought to evaluate the impact of using a domain-specific vocabulary and a large clinical training corpus on the performance of these language models in clinical language inference. We trained a Bidirectional Encoder Decoder from Transformers (BERT) model using a diverse, deidentified corpus of 75 million deidentified clinical notes authored at the University of California, San Francisco (UCSF). We evaluated this model on several clinical language inference benchmark tasks: clinical and temporal concept recognition, relation extraction and medical language inference. We also evaluated our model on two tasks using discharge summaries from UCSF: diagnostic code assignment and therapeutic class inference. Our model performs at par with the best publicly available biomedical language models of comparable sizes on the public benchmark tasks, and is significantly better than these models in a within-system evaluation on the two tasks using UCSF data. The use of in-domain vocabulary appears to improve the encoding of longer documents. The use of large clinical corpora appears to enhance document encoding and inferential accuracy. However, further research is needed to improve abbreviation resolution, and numerical, temporal, and implicitly causal inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2023

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

The development of large language models tailored for handling patients'...
research
08/04/2023

A Survey of Spanish Clinical Language Models

This survey focuses in encoder Language Models for solving tasks in the ...
research
01/29/2023

Large Language Models for Biomedical Causal Graph Construction

Automatic causal graph construction is of high importance in medical res...
research
08/11/2023

Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

We hypothesize that large language models (LLMs) based on the transforme...
research
04/13/2020

Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction

Background: Identifying relationships between clinical events and tempor...
research
05/25/2021

Estimating Redundancy in Clinical Text

The current mode of use of Electronic Health Record (EHR) elicits text r...
research
08/02/2021

Self-supervised Answer Retrieval on Clinical Notes

Retrieving answer passages from long documents is a complex task requiri...

Please sign up or login with your details

Forgot password? Click here to reset