Unbounded cache model for online language modeling with open vocabulary

11/07/2017
by   Edouard Grave, et al.
0

Recently, continuous cache models were proposed as extensions to recurrent neural network language models, to adapt their predictions to local changes in the data distribution. These models only capture the local context, of up to a few thousands tokens. In this paper, we propose an extension of continuous cache models, which can scale to larger contexts. In particular, we use a large scale non-parametric memory component that stores all the hidden activations seen in the past. We leverage recent advances in approximate nearest neighbor search and quantization algorithms to store millions of representations while searching them efficiently. We conduct extensive experiments showing that our approach significantly improves the perplexity of pre-trained language models on new distributions, and can scale efficiently to much larger contexts than previously proposed local cache models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2016

Improving Neural Language Models with a Continuous Cache

We propose an extension to neural network language models to adapt their...
research
01/07/2023

Why do Nearest Neighbor Language Models Work?

Language models (LMs) compute the probability of a text by sequentially ...
research
02/04/2021

Adaptive Semiparametric Language Models

We present a language model that combines a large parametric neural netw...
research
09/24/2018

Information-Weighted Neural Cache Language Models for ASR

Neural cache language models (LMs) extend the idea of regular cache lang...
research
05/16/2020

MicroNet for Efficient Language Modeling

It is important to design compact language models for efficient deployme...
research
05/08/2023

HistAlign: Improving Context Dependency in Language Generation by Aligning with History

Language models (LMs) can generate hallucinations and incoherent outputs...
research
05/26/2023

Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time

Large language models(LLMs) have sparked a new wave of exciting AI appli...

Please sign up or login with your details

Forgot password? Click here to reset