Med-HALT: Medical Domain Hallucination Test for Large Language Models

07/28/2023
by   Logesh Kumar Umapathi, et al.
0

This research paper focuses on the challenges posed by hallucinations in large language models (LLMs), particularly in the context of the medical domain. Hallucination, wherein these models generate plausible yet unverified or incorrect information, can have serious consequences in healthcare applications. We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), designed specifically to evaluate and reduce hallucinations. Med-HALT provides a diverse multinational dataset derived from medical examinations across various countries and includes multiple innovative testing modalities. Med-HALT includes two categories of tests reasoning and memory-based hallucination tests, designed to assess LLMs's problem-solving and information retrieval abilities. Our study evaluated leading LLMs, including Text Davinci, GPT-3.5, LlaMa-2, MPT, and Falcon, revealing significant differences in their performance. The paper provides detailed insights into the dataset, promoting transparency and reproducibility. Through this work, we aim to contribute to the development of safer and more reliable language models in healthcare. Our benchmark can be found at medhalt.github.io

READ FULL TEXT

page 1

page 2

research
06/16/2023

ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation

Large language models have exhibited exceptional performance on various ...
research
07/10/2023

Self-Diagnosis and Large Language Models: A New Front for Medical Misinformation

Improving healthcare quality and access remains a critical concern for c...
research
12/20/2022

True Detective: A Challenging Benchmark for Deep Abductive Reasoning in Foundation Models

Large language models (LLMs) have demonstrated strong performance in zer...
research
09/05/2023

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models

Large language models (LLMs) have achieved significant success in intera...
research
08/15/2023

Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

Medical Image Segmentation is crucial in various clinical applications w...
research
09/16/2023

RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification

In this study, we present a novel and challenging multilabel Vietnamese ...
research
08/28/2023

Challenges of GPT-3-based Conversational Agents for Healthcare

The potential to provide patients with faster information access while a...

Please sign up or login with your details

Forgot password? Click here to reset