The CALLA Dataset: Probing LLMs' Interactive Knowledge Acquisition from Chinese Medical Literature

09/08/2023
by   Yanrui Du, et al.
0

The application of Large Language Models (LLMs) to the medical domain has stimulated the interest of researchers. Recent studies have focused on constructing Instruction Fine-Tuning (IFT) data through medical knowledge graphs to enrich the interactive medical knowledge of LLMs. However, the medical literature serving as a rich source of medical knowledge remains unexplored. Our work introduces the CALLA dataset to probe LLMs' interactive knowledge acquisition from Chinese medical literature. It assesses the proficiency of LLMs in mastering medical knowledge through a free-dialogue fact-checking task. We identify a phenomenon called the “fact-following response“, where LLMs tend to affirm facts mentioned in questions and display a reluctance to challenge them. To eliminate the inaccurate evaluation caused by this phenomenon, for the golden fact, we artificially construct test data from two perspectives: one consistent with the fact and one inconsistent with the fact. Drawing from the probing experiment on the CALLA dataset, we conclude that IFT data highly correlated with the medical literature corpus serves as a potent catalyst for LLMs, enabling themselves to skillfully employ the medical knowledge acquired during the pre-training phase within interactive scenarios, enhancing accuracy. Furthermore, we design a framework for automatically constructing IFT data based on medical literature and discuss some real-world applications.

READ FULL TEXT

page 1

page 3

page 4

research
09/03/2023

MedChatZH: a Better Medical Adviser Learns from Better Instructions

Generative large language models (LLMs) have shown great success in vari...
research
09/08/2023

Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

Large Language Models (LLMs) have demonstrated remarkable success in div...
research
08/07/2023

Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue

Recent advances in Large Language Models (LLMs) have achieved remarkable...
research
08/28/2023

DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation

We propose DISC-MedLLM, a comprehensive solution that leverages Large La...
research
08/24/2020

Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources

Machine Reading Comprehension (MRC) aims to extract answers to questions...
research
09/05/2023

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models

Large language models (LLMs) have achieved significant success in intera...
research
05/11/2023

FactKG: Fact Verification via Reasoning on Knowledge Graphs

In real world applications, knowledge graphs (KG) are widely used in var...

Please sign up or login with your details

Forgot password? Click here to reset