C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

05/15/2023
by   Yuzhen Huang, et al.
0

New NLP benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context. C-Eval comprises multiple-choice questions across four difficulty levels: middle school, high school, college, and professional. The questions span 52 diverse disciplines, ranging from humanities to science and engineering. C-Eval is accompanied by C-Eval Hard, a subset of very challenging subjects in C-Eval that requires advanced reasoning abilities to solve. We conduct a comprehensive evaluation of the most advanced LLMs on C-Eval, including both English- and Chinese-oriented models. Results indicate that only GPT-4 could achieve an average accuracy of over 60 current LLMs. We anticipate C-Eval will help analyze important strengths and shortcomings of foundation models, and foster their development and growth for Chinese users.

READ FULL TEXT

page 6

page 17

research
06/15/2023

CMMLU: Measuring massive multitask language understanding in Chinese

As the capabilities of large language models (LLMs) continue to advance,...
research
04/13/2023

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

Evaluating the general abilities of foundation models to tackle human-le...
research
08/28/2023

ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models

The unprecedented performance of large language models (LLMs) requires c...
research
09/13/2023

SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions

With the rapid development of Large Language Models (LLMs), increasing a...
research
09/04/2023

Are Emergent Abilities in Large Language Models just In-Context Learning?

Large language models have exhibited emergent abilities, demonstrating e...
research
08/09/2023

CLEVA: Chinese Language Models EVAluation Platform

With the continuous emergence of Chinese Large Language Models (LLMs), h...

Please sign up or login with your details

Forgot password? Click here to reset