Adaptive Semiparametric Language Models

02/04/2021
by   Dani Yogatama, et al.
0

We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component in an integrated architecture. Our model uses extended short-term context by caching local hidden states – similar to transformer-XL – and global long-term memory by retrieving a set of nearest neighbor tokens at each timestep. We design a gating function to adaptively combine multiple information sources to make a prediction. This mechanism allows the model to use either local context, short-term memory, or long-term memory (or any combination of them) on an ad hoc basis depending on the context. Experiments on word-based and character-based language modeling datasets demonstrate the efficacy of our proposed method compared to strong baselines.

READ FULL TEXT
research
06/12/2023

Augmenting Language Models with Long-Term Memory

Existing large language models (LLMs) can only afford fix-sized inputs d...
research
09/01/2021

∞-former: Infinite Memory Transformer

Transformers struggle when attending to long contexts, since the amount ...
research
07/31/2023

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Recently, integrating video foundation models and large language models ...
research
11/07/2017

Unbounded cache model for online language modeling with open vocabulary

Recently, continuous cache models were proposed as extensions to recurre...
research
09/22/2021

Palimpsest Memories Stored in Memristive Synapses

Biological synapses store multiple memories on top of each other in a pa...
research
08/09/2016

A deep language model for software code

Existing language models such as n-grams for software code often fail to...
research
04/15/2022

LaMemo: Language Modeling with Look-Ahead Memory

Although Transformers with fully connected self-attentions are powerful ...

Please sign up or login with your details

Forgot password? Click here to reset