Efficient Training of Language Models to Fill in the Middle

07/28/2022
by   Mohammad Bavarian, et al.
2

We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document to its end. While this data augmentation has garnered much interest in recent years, we provide extensive evidence that training models with a large fraction of data transformed in this way does not harm the original left-to-right generative capability, as measured by perplexity and sampling evaluations across a wide range of scales. Given the usefulness, simplicity, and efficiency of training models to fill-in-the-middle (FIM), we suggest that future autoregressive language models be trained with FIM by default. To this end, we run a series of ablations on key hyperparameters, such as the data transformation frequency, the structure of the transformation, and the method of selecting the infill span. We use these ablations to prescribe strong default settings and best practices to train FIM models. We have released our best infilling model trained with best practices in our API, and release our infilling benchmarks to aid future research.

READ FULL TEXT
research
02/03/2023

Mitigating Data Scarcity for Large Language Models

In recent years, pretrained neural language models (PNLMs) have taken th...
research
03/13/2023

Meet in the Middle: A New Pre-training Paradigm

Most language models (LMs) are trained and applied in an autoregressive ...
research
10/26/2022

Autoregressive Structured Prediction with Language Models

Recent years have seen a paradigm shift in NLP towards using pretrained ...
research
10/18/2021

Deep Transfer Learning Beyond: Transformer Language Models in Information Systems Research

AI is widely thought to be poised to transform business, yet current per...
research
05/30/2023

Likelihood-Based Diffusion Language Models

Despite a growing interest in diffusion-based language models, existing ...
research
06/16/2022

Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models

Large language models produce human-like text that drive a growing numbe...
research
05/21/2022

Scaling Laws and Interpretability of Learning from Repeated Data

Recent large language models have been trained on vast datasets, but als...

Please sign up or login with your details

Forgot password? Click here to reset