Towards the Scalable Evaluation of Cooperativeness in Language Models

by   Alan Chan, et al.

It is likely that AI systems driven by pre-trained language models (PLMs) will increasingly be used to assist humans in high-stakes interactions with other agents, such as negotiation or conflict resolution. Consistent with the goals of Cooperative AI <cit.>, we wish to understand and shape the multi-agent behaviors of PLMs in a pro-social manner. An important first step is the evaluation of model behaviour across diverse cooperation problems. Since desired behaviour in an interaction depends upon precise game-theoretic structure, we focus on generating scenarios with particular structures with both crowdworkers and a language model. Our work proceeds as follows. First, we discuss key methodological issues in the generation of scenarios corresponding to particular game-theoretic structures. Second, we employ both crowdworkers and a language model to generate such scenarios. We find that the quality of generations tends to be mediocre in both cases. We additionally get both crowdworkers and a language model to judge whether given scenarios align with their intended game-theoretic structure, finding mixed results depending on the game. Third, we provide a dataset of scenario based on our data generated. We provide both quantitative and qualitative evaluations of UnifiedQA and GPT-3 on this dataset. We find that instruct-tuned models tend to act in a way that could be perceived as cooperative when scaled up, while other models seemed to have flat scaling trends.


page 3

page 8


Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics

In this study, we investigate the capacity of large language models (LLM...

Evaluating Shutdown Avoidance of Language Models in Textual Scenarios

Recently, there has been an increase in interest in evaluating large lan...

Word Play for Playing Othello (Reverses)

Language models like OpenAI's Generative Pre-Trained Transformers (GPT-2...

Of Models and Tin Men – a behavioural economics study of principal-agent problems in AI alignment using large-language models

AI Alignment is often presented as an interaction between a single desig...

Evaluating Language Models for Mathematics through Interactions

The standard methodology of evaluating large language models (LLMs) base...

Detoxify Language Model Step-by-Step

Detoxification for LLMs is challenging since it requires models to avoid...

An Empirical Cybersecurity Evaluation of GitHub Copilot's Code Contributions

There is burgeoning interest in designing AI-based systems to assist hum...

Please sign up or login with your details

Forgot password? Click here to reset