GameEval: Evaluating LLMs on Conversational Games

08/19/2023
by   Dan Qiao, et al.
0

The rapid advancements in large language models (LLMs) have presented challenges in evaluating those models. Existing evaluation methods are either reference-based or preference based, which inevitably need human intervention or introduce test bias caused by evaluator models. In this paper, we propose GameEval, a novel approach to evaluating LLMs through goal-driven conversational games, overcoming the limitations of previous methods. GameEval treats LLMs as game players and assigns them distinct roles with specific goals achieved by launching conversations of various forms, including discussion, question answering, and voting. We design three unique games with cooperative or adversarial objectives, accompanied by corresponding evaluation metrics, to show how this new paradigm comprehensively evaluates model performance.Through extensive experiments, we show that GameEval can effectively differentiate the capabilities of various LLMs, providing a comprehensive assessment of their integrated abilities to solve complex problems. Our public anonymous code is available at https://github.com/GameEval/GameEval.

READ FULL TEXT

page 4

page 7

page 10

page 11

page 12

research
08/31/2023

TouchStone: Evaluating Vision-Language Models by Language Models

Large vision-language models (LVLMs) have recently witnessed rapid advan...
research
10/07/2022

ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering

With the recent advance in large pre-trained language models, researcher...
research
06/07/2023

INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Instruction-tuned large language models have revolutionized natural lang...
research
05/22/2023

clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents

Recent work has proposed a methodology for the systematic evaluation of ...
research
10/08/2022

Generative Language Models for Paragraph-Level Question Generation

Powerful generative models have led to recent progress in question gener...
research
06/13/2023

WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences

We present WebGLM, a web-enhanced question-answering system based on the...
research
08/25/2023

Rethinking Language Models as Symbolic Knowledge Graphs

Symbolic knowledge graphs (KGs) play a pivotal role in knowledge-centric...

Please sign up or login with your details

Forgot password? Click here to reset