Huatuo-26M, a Large-scale Chinese Medical QA Dataset

by   Jianquan Li, et al.

In this paper, we release a largest ever medical Question Answering (QA) dataset with 26 million QA pairs. We benchmark many existing approaches in our dataset in terms of both retrieval and generation. Experimental results show that the existing models perform far lower than expected and the released dataset is still challenging in the pre-trained language model era. Moreover, we also experimentally show the benefit of the proposed dataset in many aspects: (i) trained models for other QA datasets in a zero-shot fashion; and (ii) as external knowledge for retrieval-augmented generation (RAG); and (iii) improving existing pre-trained language models by using the QA pairs as a pre-training corpus in continued training manner. We believe that this dataset will not only contribute to medical research but also facilitate both the patients and clinical doctors. See <>.


page 4

page 12


CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

With the rise of large-scale pre-trained language models, open-domain qu...

MedChatZH: a Better Medical Adviser Learns from Better Instructions

Generative large language models (LLMs) have shown great success in vari...

Controllable Generation from Pre-trained Language Models via Inverse Prompting

Large-scale pre-trained language models have demonstrated strong capabil...

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

The field of Question Answering (QA) has made remarkable progress in rec...

Learning to Ask Like a Physician

Existing question answering (QA) datasets derived from electronic health...

MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

Vision-language pre-training (VLP) models have been demonstrated to be e...

Long-Tailed Question Answering in an Open World

Real-world data often have an open long-tailed distribution, and buildin...

Please sign up or login with your details

Forgot password? Click here to reset