Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding

04/09/2023
by   Yuqing Wang, et al.
0

Large language models (LLMs) have made significant progress in various domains, including healthcare. However, the specialized nature of clinical language understanding tasks presents unique challenges and limitations that warrant further investigation. In this study, we conduct a comprehensive evaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, within the realm of clinical language understanding tasks. These tasks span a diverse range, including named entity recognition, relation extraction, natural language inference, semantic textual similarity, document classification, and question-answering. We also introduce a novel prompting strategy, self-questioning prompting (SQP), tailored to enhance LLMs' performance by eliciting informative questions and answers pertinent to the clinical scenarios at hand. Our evaluation underscores the significance of task-specific learning strategies and prompting techniques for improving LLMs' effectiveness in healthcare-related tasks. Additionally, our in-depth error analysis on the challenging relation extraction task offers valuable insights into error distribution and potential avenues for improvement using SQP. Our study sheds light on the practical implications of employing LLMs in the specialized domain of healthcare, serving as a foundation for future research and the development of potential applications in healthcare settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2022

DeepStruct: Pretraining of Language Models for Structure Prediction

We introduce a method for improving the structural understanding abiliti...
research
03/08/2023

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Recent advancements in large language models (LLMs) have led to the deve...
research
05/24/2023

Large Language Models as Counterfactual Generator: Strengths and Weaknesses

Large language models (LLMs) have demonstrated remarkable performance in...
research
05/20/2021

KLUE: Korean Language Understanding Evaluation

We introduce Korean Language Understanding Evaluation (KLUE) benchmark. ...
research
01/17/2022

RuMedBench: A Russian Medical Language Understanding Benchmark

The paper describes the open Russian medical language understanding benc...
research
06/20/2023

Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction

Document-level relation extraction (DocRE) attracts more research intere...
research
08/31/2023

Large language models in medicine: the potentials and pitfalls

Large language models (LLMs) have been applied to tasks in healthcare, r...

Please sign up or login with your details

Forgot password? Click here to reset