The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

09/20/2023
by   Aleksandar Stanić, et al.
0

The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the model's throughput and the chosen compute class. Notably, this approach avoids constraints on critical hyperparameters which affect total parameters or floating-point operations. For evaluation, we pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. On it, we compare methods based on their empirical scaling trends which are estimated through experiments at various levels of compute. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput. While the GPT baseline achieves better perplexity throughout all our levels of compute, our LSTM baseline exhibits a predictable and more favourable scaling law. This is due to the improved throughput and the need for fewer training tokens to achieve the same decrease in test perplexity. Extrapolating the scaling laws leads of both models results in an intersection at roughly 50,000 accelerator hours. We hope this work can serve as the foundation for meaningful and reproducible language modelling research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2023

Pretraining on the Test Set Is All You Need

Inspired by recent work demonstrating the promise of smaller Transformer...
research
08/17/2022

Understanding Scaling Laws for Recommendation Models

Scale has been a major driving force in improving machine learning perfo...
research
01/10/2023

Scaling Laws for Generative Mixed-Modal Language Models

Generative language models define distributions over sequences of tokens...
research
04/14/2023

Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

As language models scale up, it becomes increasingly expensive to verify...
research
10/16/2021

PAGnol: An Extra-Large French Generative Model

Access to large pre-trained models of varied architectures, in many diff...
research
04/09/2021

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition

We introduce Lookup-Table Language Models (LookupLM), a method for scali...

Please sign up or login with your details

Forgot password? Click here to reset