Individual corpora predict fast memory retrieval during reading

10/20/2020
by   Markus J. Hofmann, et al.
0

The corpus, from which a predictive language model is trained, can be considered the experience of a semantic system. We recorded everyday reading of two participants for two months on a tablet, generating individual corpus samples of 300/500K tokens. Then we trained word2vec models from individual corpora and a 70 million-sentence newspaper corpus to obtain individual and norm-based long-term memory structure. To test whether individual corpora can make better predictions for a cognitive task of long-term memory retrieval, we generated stimulus materials consisting of 134 sentences with uncorrelated individual and norm-based word probabilities. For the subsequent eye tracking study 1-2 months later, our regression analyses revealed that individual, but not norm-corpus-based word probabilities can account for first-fixation duration and first-pass gaze duration. Word length additionally affected gaze duration and total viewing duration. The results suggest that corpora representative for an individual's longterm memory structure can better explain reading performance than a norm corpus, and that recently acquired information is lexically accessed rapidly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2022

The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts

Eye movement recordings from reading are one of the richest signals of h...
research
05/07/2018

Relating Eye-Tracking Measures With Changes In Knowledge on Search Tasks

We conducted an eye-tracking study where 30 participants performed searc...
research
10/15/2020

Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention

A lack of corpora has so far limited advances in integrating human gaze ...
research
02/02/2022

Language Models Explain Word Reading Times Better Than Empirical Predictability

Though there is a strong consensus that word length and frequency are th...
research
12/02/2019

ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation

We recorded and preprocessed ZuCo 2.0, a new dataset of simultaneous eye...
research
12/21/2022

Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal

Transformer-based large language models are trained to make predictions ...
research
04/28/2022

A Spiral into the Mind: Gaze Spiral Visualization for Mobile Eye Tracking

Comparing mobile eye tracking data from multiple participants without in...

Please sign up or login with your details

Forgot password? Click here to reset