Evaluating Human-Language Model Interaction

12/19/2022
by   Mina Lee, et al.
0

Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction. However, the main LM benchmarks are non-interactive in that a system produces output without human involvement. To evaluate human-LM interaction, we develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.

READ FULL TEXT

page 9

page 10

research
03/11/2023

Parachute: Evaluating Interactive Human-LM Co-writing Systems

A surge of advances in language models (LMs) has led to significant inte...
research
02/09/2023

ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design

Large language models (LLMs) have taken the scientific world by storm, c...
research
04/29/2020

GePpeTto Carves Italian into a Language Model

In the last few years, pre-trained neural architectures have provided im...
research
09/30/2022

Against Interaction Design

Against Interaction Design is a short manifesto that distils a position ...
research
04/07/2023

Generative Agents: Interactive Simulacra of Human Behavior

Believable proxies of human behavior can empower interactive application...
research
07/31/2023

Alpha-GPT: Human-AI Interactive Alpha Mining for Quantitative Investment

One of the most important tasks in quantitative investment research is m...
research
06/02/2023

Evaluating Language Models for Mathematics through Interactions

The standard methodology of evaluating large language models (LLMs) base...

Please sign up or login with your details

Forgot password? Click here to reset