Narrative XL: A Large-scale Dataset For Long-Term Memory Models

05/23/2023
by   Arseny Moskvichev, et al.
0

Despite their tremendous successes, most large language models do not have any long-term memory mechanisms, which restricts their applications. Overcoming this limitation would not only require changes to the typical transformer architectures or training procedures, but also a dataset on which these new models could be trained and evaluated. We argue that existing resources lack a few key properties, and that at present, there are no naturalistic datasets of sufficient scale to train (and not only evaluate) long-term memory language models. We then present our solution that capitalizes on the advances in short-term memory language models to create such a dataset. Using GPT 3.5, we summarized each scene in 1500 hand-curated books from Project Gutenberg, which resulted in approximately 150 scene-level summaries per book. We then created a number of reading comprehension questions based on these summaries, including three types of multiple-choice scene recognition questions, as well as free-form narrative reconstruction questions. Each book is thus associated with more than 500 reading comprehension questions. Crucially, most questions have a known “retention demand”, indicating how long-term of a memory is needed to answer it, which should aid long-term memory performance evaluation. We validate our data in three small-scale experiments: one with human labelers, and two with existing language models. We show that our questions 1) adequately represent the source material 2) can be used to diagnose the model's memory capacity 3) are not trivial for modern language models even when the memory demand does not exceed those models' context lengths. Lastly, we provide our code which can be used to further expand the dataset in an automated manner.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2023

Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

Most open-domain dialogue systems suffer from forgetting important infor...
research
03/30/2023

Recognition, recall, and retention of few-shot memories in large language models

The training of modern large language models (LLMs) takes place in a reg...
research
11/09/2017

Large-scale Cloze Test Dataset Designed by Teachers

Cloze test is widely adopted in language exams to evaluate students' lan...
research
04/26/2023

Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System

Large-scale Language Models (LLMs) are constrained by their inability to...
research
05/17/2023

MemoryBank: Enhancing Large Language Models with Long-Term Memory

Revolutionary advancements in Large Language Models have drastically res...
research
06/11/2019

Calibration, Entropy Rates, and Memory in Language Models

Building accurate language models that capture meaningful long-term depe...
research
04/27/2022

Can deep learning match the efficiency of human visual long-term memory in storing object details?

Humans have a remarkably large capacity to store detailed visual informa...

Please sign up or login with your details

Forgot password? Click here to reset