CMB: A Comprehensive Medical Benchmark in Chinese

08/17/2023
by   Xidong Wang, et al.
0

Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in contextual incongruities to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. It is worth noting that our benchmark is not devised as a leaderboard competition but as an instrument for self-assessment of model advancements. We hope this benchmark could facilitate the widespread adoption and enhancement of medical LLMs within China. Check details in <https://cmedbenchmark.llmzoo.com/>.

READ FULL TEXT

page 2

page 7

research
09/03/2023

MedChatZH: a Better Medical Adviser Learns from Better Instructions

Generative large language models (LLMs) have shown great success in vari...
research
06/05/2023

Benchmarking Large Language Models on CMExam – A Comprehensive Chinese Medical Exam Dataset

Recent advancements in large language models (LLMs) have transformed the...
research
08/28/2023

ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models

The unprecedented performance of large language models (LLMs) requires c...
research
12/05/2018

An enhanced computational feature selection method for medical synonym identification via bilingualism and multi-corpus training

Medical synonym identification has been an important part of medical nat...
research
09/28/2020

What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams

Open domain question answering (OpenQA) tasks have been recently attract...
research
01/23/2018

Automatic construction of Chinese herbal prescription from tongue image via convolution networks and auxiliary latent therapy topics

The tongue image is an important physical information of human, it is of...
research
03/31/2023

Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

As large language models (LLMs) gain popularity among speakers of divers...

Please sign up or login with your details

Forgot password? Click here to reset