Improving language models by retrieving from trillions of tokens

12/08/2021
by   Sebastian Borgeaud, et al.
8

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.

READ FULL TEXT

page 17

page 18

page 33

page 38

page 39

page 40

page 41

page 42

research
10/01/2020

ISAAQ – Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention

Textbook Question Answering is a complex task in the intersection of Mac...
research
10/24/2022

Characterizing Verbatim Short-Term Memory in Neural Language Models

When a language model is trained to predict natural language sequences, ...
research
02/23/2023

On the Generalization Ability of Retrieval-Enhanced Transformers

Recent work on the Retrieval-Enhanced Transformer (RETRO) model has show...
research
08/07/2023

Trusting Language Models in Education

Language Models are being widely used in Education. Even though modern d...
research
03/16/2022

Memorizing Transformers

Language models typically need to be trained or finetuned in order to ac...
research
11/16/2022

RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

To better support retrieval applications such as web search and question...
research
09/15/2023

Cure the headache of Transformers via Collinear Constrained Attention

As the rapid progression of practical applications based on Large Langua...

Please sign up or login with your details

Forgot password? Click here to reset