Beyond the Hype: Assessing the Performance, Trustworthiness, and Clinical Suitability of GPT3.5

06/28/2023
by   Salmonn Talebi, et al.
0

The use of large language models (LLMs) in healthcare is gaining popularity, but their practicality and safety in clinical settings have not been thoroughly assessed. In high-stakes environments like medical settings, trust and safety are critical issues for LLMs. To address these concerns, we present an approach to evaluate the performance and trustworthiness of a GPT3.5 model for medical image protocol assignment. We compare it with a fine-tuned BERT model and a radiologist. In addition, we have a radiologist review the GPT3.5 output to evaluate its decision-making process. Our evaluation dataset consists of 4,700 physician entries across 11 imaging protocol classes spanning the entire head. Our findings suggest that the GPT3.5 performance falls behind BERT and a radiologist. However, GPT3.5 outperforms BERT in its ability to explain its decision, detect relevant word indicators, and model calibration. Furthermore, by analyzing the explanations of GPT3.5 for misclassifications, we reveal systematic errors that need to be resolved to enhance its safety and suitability for clinical use.

READ FULL TEXT
research
03/20/2023

Capabilities of GPT-4 on Medical Challenge Problems

Large language models (LLMs) have demonstrated remarkable capabilities i...
research
01/28/2020

PEL-BERT: A Joint Model for Protocol Entity Linking

Pre-trained models such as BERT are widely used in NLP tasks and are fin...
research
09/19/2023

Large Language Models as Agents in the Clinic

Recent developments in large language models (LLMs) have unlocked new op...
research
05/19/2023

Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding

Large Language Models (LLMs) present immense potential in the medical fi...
research
08/02/2023

Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction

The objective of this study is to develop natural language processing (N...
research
02/15/2023

Separating Technological and Clinical Safety Assurance for Medical Devices

The safety and clinical effectiveness of medical devices are closely ass...

Please sign up or login with your details

Forgot password? Click here to reset