Emergent inabilities? Inverse scaling over the course of pretraining

05/24/2023
by   James A. Michaelov, et al.
0

Does inverse scaling only occur as a function of model parameter size, or can it also occur over the course of training? We carry out an exploratory study investigating whether, over the course of training on the language modeling task, the performance of language models at specific tasks can decrease while general performance remains high. We find that for two tasks from the Inverse Scaling Challenge - quote-repetition and redefine-math - this is indeed the case. Specifically, we find that for Pythia (Biderman et al., 2023) models with a higher number of parameters, performance decreases over the course of training at these two tasks, despite these models showing standard (positive) scaling overall. This highlights the importance of testing model performance at all relevant benchmarks any time they are trained on additional data, even if their overall performance improves.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2022

Inverse scaling can become U-shaped

Although scaling language models improves performance on a range of task...
research
05/27/2023

Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models

Language models have been shown to exhibit positive scaling, where perfo...
research
06/15/2023

Inverse Scaling: When Bigger Isn't Better

Work on scaling laws has found that large language models (LMs) show pre...
research
12/16/2022

'Rarely' a problem? Language models exhibit inverse scaling in their predictions following 'few'-type quantifiers

Language Models appear to perform poorly on quantification. We ask how b...
research
04/03/2023

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

How do large language models (LLMs) develop and evolve over the course o...
research
12/28/2022

Cramming: Training a Language Model on a Single GPU in One Day

Recent trends in language modeling have focused on increasing performanc...
research
10/22/2020

Not all parameters are born equal: Attention is mostly what you need

Transformers are widely used in state-of-the-art machine translation, bu...

Please sign up or login with your details

Forgot password? Click here to reset