Exploring Length Generalization in Large Language Models

07/11/2022
by   Cem Anil, et al.
0

The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring the length generalization capabilities of transformer-based language models. We first establish that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale. We then show that combining pretrained large language models' in-context learning abilities with scratchpad prompting (asking the model to output solution steps before producing an answer) results in a dramatic improvement in length generalization. We run careful failure analyses on each of the learning modalities and identify common sources of mistakes that highlight opportunities in equipping language models with the ability to generalize to longer problems.

READ FULL TEXT

page 3

page 18

research
05/26/2023

Randomized Positional Encodings Boost Length Generalization of Transformers

Transformers have impressive generalization capabilities on tasks with a...
research
09/30/2020

Measuring Systematic Generalization in Neural Proof Generation with Transformers

We are interested in understanding how well Transformer language models ...
research
08/30/2023

LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

In recent years, there have been remarkable advancements in the performa...
research
05/30/2023

What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization

In this paper, we conduct a comprehensive study of In-Context Learning (...
research
01/09/2019

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer networks have a potential of learning longer-term dependency...
research
12/16/2021

Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Investigating the reasoning abilities of transformer models, and discove...
research
08/15/2021

Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning

To quantitatively and intuitively explore the generalization ability of ...

Please sign up or login with your details

Forgot password? Click here to reset