Can language models handle recursively nested grammatical structures? A case study on comparing models and humans

10/27/2022
by   Andrew Kyle Lampinen, et al.
0

How should we compare the capabilities of language models and humans? Here, I consider a case study: processing of recursively nested grammatical structures. Prior work has suggested that language models cannot handle these structures as reliably as humans can. However, the humans were provided with instructions and training before being evaluated, while the language models were evaluated zero-shot. I therefore attempt to more closely match the evaluation paradigms by providing language models with few-shot prompts. A simple prompt, which contains substantially less content than the human training, allows large language models to consistently outperform the human results. The same prompt even allows extrapolation to more deeply nested conditions than have been tested in humans. Further, a reanalysis of the prior human experiments suggests that the humans may not perform above chance at the difficult structures initially. These results suggest that large language models can in fact process recursively nested grammatical structures comparably to humans. This case study highlights how discrepancies in the quantity of experiment-specific context can confound comparisons of language models and humans. I use this case study to reflect on the broader challenge of comparing human and model capabilities, and to suggest that there is an important difference between evaluating cognitive models of a specific phenomenon and evaluating broadly-trained models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2022

Collateral facilitation in humans and language models

Are the predictions of humans and language models affected by similar th...
research
01/17/2023

Are Language Models Worse than Humans at Following Prompts? It's Complicated

Prompts have been the center of progress in advancing language models' z...
research
06/08/2023

The economic trade-offs of large language models: A case study

Contacting customer service via chat is a common practice. Because emplo...
research
02/15/2021

Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm

Prevailing methods for mapping large generative language models to super...
research
10/26/2022

Large language models are not zero-shot communicators

Despite widespread use of LLMs as conversational agents, evaluations of ...
research
08/23/2023

Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models

AI programs, built using large language models, make it possible to auto...
research
08/06/2023

Why Linguistics Will Thrive in the 21st Century: A Reply to Piantadosi (2023)

We present a critical assessment of Piantadosi's (2023) claim that "Mode...

Please sign up or login with your details

Forgot password? Click here to reset