Augmenting Language Models with Long-Term Memory

06/12/2023
by   Weizhi Wang, et al.
0

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k tokens and thus cache many-shot extra demonstration examples as long-form memory for in-context learning. Experiments show that our method outperforms strong long-context models on ChapterBreak, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. The results demonstrate that the proposed method is effective in helping language models to memorize and utilize long-form contents. Our code is open-sourced at https://aka.ms/LongMem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2021

Adaptive Semiparametric Language Models

We present a language model that combines a large parametric neural netw...
research
05/25/2022

Training Language Models with Memory Augmentation

Recent work has improved language models remarkably by equipping them wi...
research
04/26/2023

Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System

Large-scale Language Models (LLMs) are constrained by their inability to...
research
05/17/2023

MemoryBank: Enhancing Large Language Models with Long-Term Memory

Revolutionary advancements in Large Language Models have drastically res...
research
09/21/2023

Memory-Augmented LLM Personalization with Short- and Long-Term Memory Coordination

Large Language Models (LLMs), such as GPT3.5, have exhibited remarkable ...
research
04/15/2022

LaMemo: Language Modeling with Look-Ahead Memory

Although Transformers with fully connected self-attentions are powerful ...
research
03/30/2023

Recognition, recall, and retention of few-shot memories in large language models

The training of modern large language models (LLMs) takes place in a reg...

Please sign up or login with your details

Forgot password? Click here to reset