Identifying and Extracting Rare Disease Phenotypes with Large Language Models

06/22/2023
by   Cathy Shyr, et al.
0

Rare diseases (RDs) are collectively common and affect 300 million people worldwide. Accurate phenotyping is critical for informing diagnosis and treatment, but RD phenotypes are often embedded in unstructured text and time-consuming to extract manually. While natural language processing (NLP) models can perform named entity recognition (NER) to automate extraction, a major bottleneck is the development of a large, annotated corpus for model training. Recently, prompt learning emerged as an NLP paradigm that can lead to more generalizable results without any (zero-shot) or few labeled samples (few-shot). Despite growing interest in ChatGPT, a revolutionary large language model capable of following complex human prompts and generating high-quality responses, none have studied its NER performance for RDs in the zero- and few-shot settings. To this end, we engineered novel prompts aimed at extracting RD phenotypes and, to the best of our knowledge, are the first the establish a benchmark for evaluating ChatGPT's performance in these settings. We compared its performance to the traditional fine-tuning approach and conducted an in-depth error analysis. Overall, fine-tuning BioClinicalBERT resulted in higher performance (F1 of 0.689) than ChatGPT (F1 of 0.472 and 0.591 in the zero- and few-shot settings, respectively). Despite this, ChatGPT achieved similar or higher accuracy for certain entities (i.e., rare diseases and signs) in the one-shot setting (F1 of 0.776 and 0.725). This suggests that with appropriate prompt engineering, ChatGPT has the potential to match or outperform fine-tuned language models for certain entity types with just one labeled sample. While the proliferation of large language models may provide opportunities for supporting RD diagnosis and treatment, researchers and clinicians should critically evaluate model outputs and be well-informed of their limitations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2021

A Realistic Study of Auto-regressive Language Models for Named Entity Typing and Recognition

Despite impressive results of language models for named entity recogniti...
research
07/22/2023

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

We evaluate four state-of-the-art instruction-tuned large language model...
research
08/02/2021

The RareDis corpus: a corpus annotated with rare diseases, their signs and symptoms

The RareDis corpus contains more than 5,000 rare diseases and almost 6,0...
research
03/29/2023

Zero-shot Clinical Entity Recognition using ChatGPT

In this study, we investigated the potential of ChatGPT, a large languag...
research
08/29/2023

Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering

This paper evaluates the extent to which current Large Language Models (...
research
08/24/2023

Large Language Models Vote: Prompting for Rare Disease Identification

The emergence of generative Large Language Models (LLMs) emphasizes the ...
research
09/04/2023

Zero-shot information extraction from radiological reports using ChatGPT

Electronic health records contain an enormous amount of valuable informa...

Please sign up or login with your details

Forgot password? Click here to reset