Med-MMHL: A Multi-Modal Dataset for Detecting Human- and LLM-Generated Misinformation in the Medical Domain

06/15/2023
by   Yanshen Sun, et al.
0

The pervasive influence of misinformation has far-reaching and detrimental effects on both individuals and society. The COVID-19 pandemic has witnessed an alarming surge in the dissemination of medical misinformation. However, existing datasets pertaining to misinformation predominantly focus on textual information, neglecting the inclusion of visual elements, and tend to center solely on COVID-19-related misinformation, overlooking misinformation surrounding other diseases. Furthermore, the potential of Large Language Models (LLMs), such as the ChatGPT developed in late 2022, in generating misinformation has been overlooked in previous works. To overcome these limitations, we present Med-MMHL, a novel multi-modal misinformation detection dataset in a general medical domain encompassing multiple diseases. Med-MMHL not only incorporates human-generated misinformation but also includes misinformation generated by LLMs like ChatGPT. Our dataset aims to facilitate comprehensive research and development of methodologies for detecting misinformation across diverse diseases and various scenarios, including human and LLM-generated misinformation detection at the sentence, document, and multi-modal levels. To access our dataset and code, visit our GitHub repository: <https://github.com/styxsys0927/Med-MMHL>.

READ FULL TEXT
research
04/20/2022

LingYi: Medical Conversational Question Answering System based on Multi-modal Knowledge Graphs

The medical conversational system can relieve the burden of doctors and ...
research
06/01/2021

Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms

The research direction of identifying acoustic bio-markers of respirator...
research
03/17/2023

Hospital Length of Stay Prediction Based on Multi-modal Data towards Trustworthy Human-AI Collaboration in Radiomics

To what extent can the patient's length of stay in a hospital be predict...
research
08/23/2023

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

Non-photorealistic videos are in demand with the wave of the metaverse, ...
research
07/05/2020

CORD19STS: COVID-19 Semantic Textual Similarity Dataset

In order to combat the COVID-19 pandemic, society can benefit from vario...
research
09/02/2022

Multi-Modal Experience Inspired AI Creation

AI creation, such as poem or lyrics generation, has attracted increasing...
research
01/18/2023

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

The introduction of ChatGPT has garnered widespread attention in both ac...

Please sign up or login with your details

Forgot password? Click here to reset