Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

05/12/2018
by   Urvashi Khandelwal, et al.
0

We know very little about how neural language models (LM) use prior linguistic context. In this paper, we investigate the role of context in an LSTM LM, through ablation studies. Specifically, we analyze the increase in perplexity when prior context words are shuffled, replaced, or dropped. On two standard datasets, Penn Treebank and WikiText-2, we find that the model is capable of using about 200 tokens of context on average, but sharply distinguishes nearby context (recent 50 tokens) from the distant history. The model is highly sensitive to the order of words within the most recent sentence, but ignores word order in the long-range context (beyond 50 tokens), suggesting the distant past is modeled only as a rough semantic field or topic. We further find that the neural caching model (Grave et al., 2017b) especially helps the LSTM to copy words from within this distant context. Overall, our analysis not only provides a better understanding of how neural LMs use their context, but also sheds light on recent success from cache-based models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2021

Do Long-Range Language Models Actually Use Long-Range Context?

Language models are generally trained on short, truncated input sequence...
research
10/24/2022

Characterizing Verbatim Short-Term Memory in Neural Language Models

When a language model is trained to predict natural language sequences, ...
research
11/24/2016

Learning Python Code Suggestion with a Sparse Pointer Network

To enhance developer productivity, all modern integrated development env...
research
09/24/2018

Information-Weighted Neural Cache Language Models for ASR

Neural cache language models (LMs) extend the idea of regular cache lang...
research
04/21/2021

Adapting Long Context NLM for ASR Rescoring in Conversational Agents

Neural Language Models (NLM), when trained and evaluated with context sp...
research
04/02/2019

Data Augmentation for Context-Sensitive Neural Lemmatization Using Inflection Tables and Raw Text

Lemmatization aims to reduce the sparse data problem by relating the inf...
research
12/08/2019

Cost-Sensitive Training for Autoregressive Models

Training autoregressive models to better predict under the test metric, ...

Please sign up or login with your details

Forgot password? Click here to reset