Long-range Language Modeling with Self-retrieval

06/23/2023
by   Ohad Rubin, et al.
0

Retrieval-augmented language models (LMs) have received much attention recently. However, typically the retriever is not trained jointly as a native component of the LM, but added to an already-pretrained LM, which limits the ability of the LM and the retriever to adapt to one another. In this work, we propose the Retrieval-Pretrained Transformer (RPT), an architecture and training procedure for jointly training a retrieval-augmented LM from scratch for the task of modeling long texts. Given a recently generated text chunk in a long document, the LM computes query representations, which are then used to retrieve earlier chunks in the document, located potentially tens of thousands of tokens before. Information from retrieved chunks is fused into the LM representations to predict the next target chunk. We train the retriever component with a semantic objective, where the goal is to retrieve chunks that increase the probability of the next chunk, according to a reference LM. We evaluate RPT on four long-range language modeling tasks, spanning books, code, and mathematical writing, and demonstrate that RPT improves retrieval quality and subsequently perplexity across the board compared to strong baselines.

READ FULL TEXT
research
01/30/2023

REPLUG: Retrieval-Augmented Black-Box Language Models

We introduce REPLUG, a retrieval-augmented language modeling framework t...
research
01/02/2021

Cross-Document Language Modeling

We introduce a new pretraining approach for language models that are gea...
research
05/11/2023

Active Retrieval Augmented Generation

Despite the remarkable ability of large language models (LMs) to compreh...
research
01/31/2023

In-Context Retrieval-Augmented Language Models

Retrieval-Augmented Language Modeling (RALM) methods, that condition a l...
research
10/31/2019

A neural document language modeling framework for spoken document retrieval

Recent developments in deep learning have led to a significant innovatio...
research
05/24/2023

Adapting Language Models to Compress Contexts

Transformer-based language models (LMs) are powerful and widely-applicab...
research
06/01/2023

Exposing Attention Glitches with Flip-Flop Language Modeling

Why do large language models sometimes output factual inaccuracies and e...

Please sign up or login with your details

Forgot password? Click here to reset