Meet in the Middle: A New Pre-training Paradigm

03/13/2023
by   Anh Nguyen, et al.
0

Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, assuming that the next token only depends on the preceding ones. However, this assumption ignores the potential benefits of using the full sequence information during training, and the possibility of having context from both sides during inference. In this paper, we propose a new pre-training paradigm with techniques that jointly improve the training data efficiency and the capabilities of the LMs in the infilling task. The first is a training objective that aligns the predictions of a left-to-right LM with those of a right-to-left LM, trained on the same data but in reverse order. The second is a bidirectional inference procedure that enables both LMs to meet in the middle. We show the effectiveness of our pre-training paradigm with extensive experiments on both programming and natural language models, outperforming strong baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2023

BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained Transformer

BatGPT is a large-scale language model designed and trained jointly by W...
research
10/08/2020

Masked ELMo: An evolution of ELMo towards fully contextual RNN language models

This paper presents Masked ELMo, a new RNN-based model for language mode...
research
07/28/2022

Efficient Training of Language Models to Fill in the Middle

We show that autoregressive language models can learn to infill text aft...
research
09/11/2018

Limitations in learning an interpreted language with recurrent models

In this submission I report work in progress on learning simplified inte...
research
06/17/2022

Evolution through Large Models

This paper pursues the insight that large language models (LLMs) trained...
research
02/12/2021

On Efficient Training, Controllability and Compositional Generalization of Insertion-based Language Generators

Auto-regressive language models with the left-to-right generation order ...
research
10/07/2021

Beam Search with Bidirectional Strategies for Neural Response Generation

Sequence-to-sequence neural networks have been widely used in language-b...

Please sign up or login with your details

Forgot password? Click here to reset