Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?

12/23/2022
by   Byung-Doh Oh, et al.
0

This work presents a detailed linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times. First, regression analyses show a strictly monotonic, positive log-linear relationship between perplexity and fit to reading times for the more recently released five GPT-Neo variants and eight OPT variants on two separate datasets, replicating earlier results limited to just GPT-2 (Oh et al., 2022). Subsequently, analysis of residual errors reveals a systematic deviation of the larger variants, such as underpredicting reading times of named entities and making compensatory overpredictions for reading times of function words such as modals and conjunctions. These results suggest that the propensity of larger Transformer-based models to 'memorize' sequences during training makes their surprisal estimates diverge from humanlike expectations, which warrants caution in using pre-trained language models to study human language processing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2023

Transformer-Based LM Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens

Recent psycholinguistic studies have drawn conflicting conclusions about...
research
05/19/2020

Comparing Transformers and RNNs on predicting human sentence processing data

Recurrent neural networks (RNNs) have long been an architecture of inter...
research
07/07/2023

Testing the Predictions of Surprisal Theory in 11 Languages

A fundamental result in psycholinguistics is that less predictable words...
research
04/12/2021

Multilingual Language Models Predict Human Reading Behavior

We analyze if large language models are able to predict patterns of huma...
research
06/02/2020

On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior

Human reading behavior is tuned to the statistics of natural language: t...
research
12/21/2022

Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal

Transformer-based large language models are trained to make predictions ...
research
09/08/2020

Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling

By positing a relationship between naturalistic reading times and inform...

Please sign up or login with your details

Forgot password? Click here to reset