CGCE: A Chinese Generative Chat Evaluation Benchmark for General and Financial Domains

05/23/2023
by   Xuanyu Zhang, et al.
0

Generative chat models, such as ChatGPT and GPT-4, have revolutionized natural language generation (NLG) by incorporating instructions and human feedback to achieve significant performance improvements. However, the lack of standardized evaluation benchmarks for chat models, particularly for Chinese and domain-specific models, hinders their assessment and progress. To address this gap, we introduce the Chinese Generative Chat Evaluation (CGCE) benchmark, focusing on general and financial domains. The CGCE benchmark encompasses diverse tasks, including 200 questions in the general domain and 150 specific professional questions in the financial domain. Manual scoring evaluates factors such as accuracy, coherence, expression clarity, and completeness. The CGCE benchmark provides researchers with a standardized framework to assess and compare Chinese generative chat models, fostering advancements in NLG research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2023

FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models

Large language models (LLMs) have demonstrated exceptional performance i...
research
02/18/2023

BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark

To advance Chinese financial natural language processing (NLP), we intro...
research
07/04/2023

CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care

The recent advances in NLP, have led to a new trend of applying LLMs to ...
research
05/19/2023

XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters

In recent years, pre-trained language models have undergone rapid develo...
research
08/30/2021

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

Standard multi-task benchmarks are essential for driving the progress of...
research
03/21/2022

General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining

The lack of label data is one of the significant bottlenecks for Chinese...
research
05/10/2023

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks

The most recent large language models such as ChatGPT and GPT-4 have gar...

Please sign up or login with your details

Forgot password? Click here to reset