Event knowledge in large language models: the gap between the impossible and the unlikely

12/02/2022
by   Carina Kauf, et al.
0

People constantly use language to learn about the world. Computational linguists have capitalized on this fact to build large language models (LLMs) that acquire co-occurrence-based knowledge from language corpora. LLMs achieve impressive performance on many tasks, but the robustness of their world knowledge has been questioned. Here, we ask: do LLMs acquire generalized knowledge about real-world events? Using curated sets of minimal sentence pairs (n=1215), we tested whether LLMs are more likely to generate plausible event descriptions compared to their implausible counterparts. We found that LLMs systematically distinguish possible and impossible events (The teacher bought the laptop vs. The laptop bought the teacher) but fall short of human performance when distinguishing likely and unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLMs generalize well across syntactic sentence variants (active vs passive) but less well across semantic sentence variants (synonymous sentences), (iii) some, but not all LLM deviations from ground-truth labels align with crowdsourced human judgments, and (iv) explicit event plausibility information emerges in middle LLM layers and remains high thereafter. Overall, our analyses reveal a gap in LLMs' event knowledge, highlighting their limitations as generalized knowledge bases. We conclude by speculating that the differential performance on impossible vs. unlikely events is not a temporary setback but an inherent property of LLMs, reflecting a fundamental difference between linguistic knowledge and world knowledge in intelligent systems.

READ FULL TEXT

page 13

page 16

page 18

page 40

page 42

research
04/07/2022

Testing the limits of natural language models for predicting human language judgments

Neural network language models can serve as computational hypotheses abo...
research
04/19/2021

Refining Targeted Syntactic Evaluation of Language Models

Targeted syntactic evaluation of subject-verb number agreement in Englis...
research
07/22/2021

Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge

Prior research has explored the ability of computational models to predi...
research
05/24/2023

Drafting Event Schemas using Language Models

Past work has studied event prediction and event language modeling, some...
research
10/21/2022

SLING: Sino Linguistic Evaluation of Large Language Models

To understand what kinds of linguistic knowledge are encoded by pretrain...
research
06/09/2023

Language Models Can Learn Exceptions to Syntactic Rules

Artificial neural networks can generalize productively to novel contexts...
research
01/20/2023

Can Peanuts Fall in Love with Distributional Semantics?

The context in which a sentence appears can drastically alter our expect...

Please sign up or login with your details

Forgot password? Click here to reset