Do Long-Range Language Models Actually Use Long-Range Context?

09/19/2021
by   Simeng Sun, et al.
0

Language models are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions. Recent efforts to improve the efficiency of self-attention have led to a proliferation of long-range Transformer language models, which can process much longer sequences than models of the past. However, the ways in which such models take advantage of the long-range context remain unclear. In this paper, we perform a fine-grained analysis of two long-range Transformer language models (including the Routing Transformer, which achieves state-of-the-art perplexity on the PG-19 long-sequence LM benchmark dataset) that accept input sequences of up to 8K tokens. Our results reveal that providing long-range context (i.e., beyond the previous 2K tokens) to these models only improves their predictions on a small set of tokens (e.g., those that can be copied from the distant context) and does not help at all for sentence-level prediction tasks. Finally, we discover that PG-19 contains a variety of different document types and domains, and that long-range context helps most for literary novels (as opposed to textbooks or magazines).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

What Context Features Can Transformer Language Models Use?

Transformer-based language models benefit from conditioning on contexts ...
research
04/22/2022

ChapterBreak: A Challenge Dataset for Long-Range Language Models

While numerous architectures for long-range language models (LRLMs) have...
research
06/27/2023

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

Genomic (DNA) sequences encode an enormous amount of information for gen...
research
05/12/2018

Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

We know very little about how neural language models (LM) use prior ling...
research
02/16/2022

The NLP Task Effectiveness of Long-Range Transformers

Transformer models cannot easily scale to long sequences due to their O(...
research
09/02/2022

Extend and Explain: Interpreting Very Long Language Models

While Transformer language models (LMs) are state-of-the-art for informa...
research
06/26/2023

LongCoder: A Long-Range Pre-trained Language Model for Code Completion

In this paper, we introduce a new task for code completion that focuses ...

Please sign up or login with your details

Forgot password? Click here to reset