Extracting detailed oncologic history and treatment plan from medical oncology notes with large language models

08/07/2023
by   Madhumita Sushil, et al.
0

Both medical care and observational studies in oncology require a thorough understanding of a patient's disease progression and treatment history, often elaborately documented in clinical notes. Despite their vital role, no current oncology information representation and annotation schema fully encapsulates the diversity of information recorded within these notes. Although large language models (LLMs) have recently exhibited impressive performance on various medical natural language processing tasks, due to the current lack of comprehensively annotated oncology datasets, an extensive evaluation of LLMs in extracting and reasoning with the complex rhetoric in oncology notes remains understudied. We developed a detailed schema for annotating textual oncology information, encompassing patient characteristics, tumor characteristics, tests, treatments, and temporality. Using a corpus of 10 de-identified breast cancer progress notes at University of California, San Francisco, we applied this schema to assess the abilities of three recently-released LLMs (GPT-4, GPT-3.5-turbo, and FLAN-UL2) to perform zero-shot extraction of detailed oncological history from two narrative sections of clinical progress notes. Our team annotated 2750 entities, 2874 modifiers, and 1623 relationships. The GPT-4 model exhibited overall best performance, with an average BLEU score of 0.69, an average ROUGE score of 0.72, and an average accuracy of 67 (expert manual evaluation). Notably, it was proficient in tumor characteristic and medication extraction, and demonstrated superior performance in inferring symptoms due to cancer and considerations of future medications. The analysis demonstrates that GPT-4 is potentially already usable to extract important facts from cancer progress notes needed for clinical research, complex population management, and documenting quality patient care.

READ FULL TEXT

page 10

page 15

research
03/06/2020

A Corpus for Detecting High-Context Medical Conditions in Intensive Care Patient Notes Focusing on Frequently Readmitted Patients

A crucial step within secondary analysis of electronic health records (E...
research
05/19/2023

Eye-SpatialNet: Spatial Information Extraction from Ophthalmology Notes

We introduce an annotated corpus of 600 ophthalmology notes labeled with...
research
10/12/2020

Extracting Angina Symptoms from Clinical Notes Using Pre-Trained Transformer Architectures

Anginal symptoms can connote increased cardiac risk and a need for chang...
research
01/29/2023

Large Language Models for Biomedical Causal Graph Construction

Automatic causal graph construction is of high importance in medical res...
research
12/15/2020

Enriched Annotations for Tumor Attribute Classification from Pathology Reports with Limited Labeled Data

Precision medicine has the potential to revolutionize healthcare, but mu...
research
04/02/2019

A frame semantic overview of NLP-based information extraction for cancer-related EHR notes

Objective: There is a lot of information about cancer in Electronic Heal...
research
11/17/2020

Toward Understanding Clinical Context of Medication Change Events in Clinical Narratives

Understanding medication events in clinical narratives is essential to a...

Please sign up or login with your details

Forgot password? Click here to reset