TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

05/12/2023
by   Ronen Eldan, et al.
0

Language models (LMs) are powerful tools for natural language processing, but they often struggle to produce coherent and fluent text when they are small. Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training. This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention). In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities. We also introduce a new paradigm for the evaluation of language models: We suggest a framework which uses GPT-4 to grade the content generated by these models as if those were stories written by students and graded by a (human) teacher. This new paradigm overcomes the flaws of standard benchmarks which often requires the model's output to be very structures, and moreover provides a multidimensional score for the model, providing scores for different capabilities such as grammar, creativity and consistency. We hope that TinyStories can facilitate the development, analysis and research of LMs, especially for low-resource or specialized domains, and shed light on the emergence of language capabilities in LMs.

READ FULL TEXT

page 7

page 8

page 13

page 14

page 15

page 17

page 21

research
07/17/2023

COLLIE: Systematic Construction of Constrained Text Generation Tasks

Text generation under constraints have seen increasing interests in natu...
research
08/31/2023

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

We present Belebele, a multiple-choice machine reading comprehension (MR...
research
09/11/2023

Textbooks Are All You Need II: phi-1.5 technical report

We continue the investigation into the power of smaller Transformer-base...
research
05/03/2023

Exploring the Protein Sequence Space with Global Generative Models

Recent advancements in specialized large-scale architectures for trainin...
research
11/28/2022

Controlled Language Generation for Language Learning Items

This work aims to employ natural language generation (NLG) to rapidly ge...
research
03/06/2023

Spelling convention sensitivity in neural language models

We examine whether large neural language models, trained on very large c...
research
12/20/2022

Do language models have coherent mental models of everyday things?

When people think of everyday things like an "egg," they typically have ...

Please sign up or login with your details

Forgot password? Click here to reset