Large language models are not zero-shot communicators

10/26/2022
by   Laura Ruis, et al.
3

Despite widespread use of LLMs as conversational agents, evaluations of performance fail to capture a crucial aspect of communication: interpreting language in context. Humans interpret language using beliefs and prior knowledge about the world. For example, we intuitively understand the response "I wore gloves" to the question "Did you leave fingerprints?" as meaning "No". To investigate whether LLMs have the ability to make this type of inference, known as an implicature, we design a simple task and evaluate widely used state-of-the-art models. We find that, despite only evaluating on utterances that require a binary inference (yes or no), most perform close to random. Models adapted to be "aligned with human intent" perform much better, but still show a significant gap with human performance. We present our findings as the starting point for further research into evaluating how LLMs interpret language in context and to drive the development of more pragmatic and useful models of human discourse.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2022

Testing the Ability of Language Models to Interpret Figurative Language

Figurative and metaphorical language are commonplace in discourse, and f...
research
05/24/2023

Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark

Large language models (LLMs) have been shown to perform well at a variet...
research
10/27/2022

Can language models handle recursively nested grammatical structures? A case study on comparing models and humans

How should we compare the capabilities of language models and humans? He...
research
08/28/2023

Spoken Language Intelligence of Large Language Models for Language Learning

People have long hoped for a conversational system that can assist in re...
research
07/26/2023

Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models

This study introduces and evaluates tiny, mini, small, and medium-sized ...
research
03/06/2023

Towards Zero-Shot Functional Compositionality of Language Models

Large Pre-trained Language Models (PLM) have become the most desirable s...
research
06/24/2022

A Test for Evaluating Performance in Human-Computer Systems

The Turing test for comparing computer performance to that of humans is ...

Please sign up or login with your details

Forgot password? Click here to reset